rbind

rbind(x1,x2,...)
rbind() function is used for combining vector, matrix or data frame by rows.

Example

It only makes sense to mearge if the column of the two datasets are same.

x1,x2: These can vector, matrix or data frames

First, let us create two data frames so that we can run the rbind() function on them.


 #Preparing a data frame
 firstName <- c("Ethan", "Captian", "John", "Selina")
 lastName <- c("Hunt", "Jack", "Whick", "Kyle")
 sex <- c("MALE", "MALE", "MALE", "FEMALE")
 score <- c(97, 88, 85, 92)
 stu_df1 <- data.frame(firstName, lastName, sex, score)
 stu_df1
 #preparing another data frame
 firstName <- c("Robert", "Bruce")
 lastName <- c("Langdon", "Wayne")
 sex <- c("MALE", "MALE")
 score <- c(98, 79)
 stu_df2 <- data.frame(firstName, lastName, sex, score)
 stu_df2

So in the above code, we have prepared two data frames with 4 columns each. The columns are same in both the data frames (the number of rows can be different though).

First data frame (stu_df1) has 4 rows and the second data frame stu_df2 has 2 rows.

If you want to get a better understanding of data frames you can visit the data frame page for more details.

Now let us combine the rows in these two data frames using rbind(). It is a simple one-line operation. We just pass the two data frames to the rbind() function.


 firstName <- c("Ethan", "Captian", "John", "Selina")
 lastName <- c("Hunt", "Jack", "Whick", "Kyle")
 sex <- c("MALE", "MALE", "MALE", "FEMALE")
 score <- c(97, 88, 85, 92)
 stu_df1 <- data.frame(firstName, lastName, sex, score)
 firstName <- c("Robert", "Bruce")
 lastName <- c("Langdon", "Wayne")
 sex <- c("MALE", "MALE")
 score <- c(98, 79)
 stu_df2 <- data.frame(firstName, lastName, sex, score)
 #combining the rows using rbind()
 rbind(stu_df1,stu_df2)

You can change the order in which they are combined by changing the order in which you pass the data frames to rbind(). If you instead call the function in this manner - rbind(stu_df2,stu_df1) then the 2 rows from stu_df2 will appear first.

What do you think will happen if both data frames have the same number of columns but different column names? (They may represent same data. For eg one data frame has a column 'lastName' and other has 'surName'). Let's find out. In the following example, we will change the column name from 'lastName' to 'surName' for the second data frame.

 firstName <- c("Ethan", "Captian", "John", "Selina")
 lastName <- c("Hunt", "Jack", "Whick", "Kyle")
 sex <- c("MALE", "MALE", "MALE", "FEMALE")
 score <- c(97, 88, 85, 92)
 stu_df1 <- data.frame(firstName, lastName, sex, score)
 firstName <- c("Robert", "Bruce")
 surName <- c("Langdon", "Wayne")
 sex <- c("MALE", "MALE")
 score <- c(98, 79)
 stu_df2 <- data.frame(firstName, surName, sex, score)
 rbind(stu_df1,stu_df2)

The above code throws an error that the column names must match. So, the column names in both the data frames must be the same if you want to use rbind().

Let us see what happens if one of the data frames has one less column.

We will now remove the score column from the second data frame.


 firstName <- c("Ethan", "Captian", "John", "Selina")
 lastName <- c("Hunt", "Jack", "Whick", "Kyle")
 sex <- c("MALE", "MALE", "MALE", "FEMALE")
 score <- c(97, 88, 85, 92)
 stu_df1 <- data.frame(firstName, lastName, sex, score)
 #preparing another data frame without score coluumn
 firstName <- c("Robert", "Bruce")
 lastName <- c("Langdon", "Wayne")
 sex <- c("MALE", "MALE")
 stu_df2 <- data.frame(firstName, lastName, sex)
 rbind(stu_df1,stu_df2)

You will get an error saying Error in rbind(deparse.level, ...) :

numbers of columns of arguments do not match

The error is expected, isn't it?

Now to handle this error, you can do two things

a)Drop the extra variable from the other data frame too(in this case stu_df1$score) and get the result without that column.

For doing so, simply assigning the column a NULL value does the job.

 firstName <- c("Ethan", "Captian", "John", "Selina")
 lastName <- c("Hunt", "Jack", "Whick", "Kyle")
 sex <- c("MALE", "MALE", "MALE", "FEMALE")
 score <- c(97, 88, 85, 92)
 stu_df1 <- data.frame(firstName, lastName, sex, score)
 firstName <- c("Robert", "Bruce")
 lastName <- c("Langdon", "Wayne")
 sex <- c("MALE", "MALE")
 stu_df2 <- data.frame(firstName, lastName, sex)
 # dropping the extra column from the first data frame. Simply assign NULL to it.
 stu_df1$score <- NULL
 rbind(stu_df1,stu_df2)

b)The other way is to create a variable with missing values in the incomplete dataset (in this case stu_df2).

 firstName <- c("Ethan", "Captian", "John", "Selina")
 lastName <- c("Hunt", "Jack", "Whick", "Kyle")
 sex <- c("MALE", "MALE", "MALE", "FEMALE")
 score <- c(97, 88, 85, 92)
 stu_df1 <- data.frame(firstName, lastName, sex, score)
 firstName <- c("Robert", "Bruce")
 lastName <- c("Langdon", "Wayne")
 sex <- c("MALE", "MALE")
 stu_df2 <- data.frame(firstName, lastName, sex)
 # Creating a missing variable in the second data frame.
 stu_df2$score <- NA
 rbind(stu_df1,stu_df2)

If you are aware of SQL, then the rbind() function is analogous to the UNION operation in SQL.