Data frame

A data frame is used for storing data tables. It is a list of vectors having equal length. Each row of these grids corresponds to measurements or values of an instance, while each column is a vector containing data for a specific variable. Data frames are different from matrices for the fact that their data values don't need to be of the same type, while matrices require the data value to be of the same type.

Example

Let us create a basic data frame and then gradually build a more complex data frame which uses the various parameters that the data.frame() function allows.

Let us create a simple data frame that is empty.


 #creating an empty data frame and assiging it to variable df.
 df <- data.frame()
 df

Now that we have created the basic data frame with 0 rows and 0 columns, let us look at how can we create a bit more sophisticated data frames.

Let us first create some vectors that we will subsequently use to create the data frame (After all, data frame is a list of vectors having equal length).

 firstName <- c("Ethan", "Captian", "John", "Selina")
 lastName <- c("Hunt", "Jack", "Whick", "Kyle")
 sex <- c("MALE", "MALE", "MALE", "FEMALE")
 score <- c(97, 88, 85, 92)

With the above code we have created 4 vectors. The vector score is of the numeric type and rest of the three are of the character type. Notice how they all have the same number of elements (4). Let us now combine them into one data frame using the data.frame() function.


 firstName <- c("Ethan", "Captian", "John", "Selina")
 lastName <- c("Hunt", "Jack", "Whick", "Kyle")
 sex <- c("MALE", "MALE", "MALE", "FEMALE")
 score <- c(97, 88, 85, 92)
 #creating a data frame named stu_df
 stu_df <- data.frame(firstName, lastName, sex, score)
 stu_df

Now, that we have created the data frame let us get some more information about the data frame we just created. We can do this by using the str() function.


 firstName <- c("Ethan", "Captian", "John", "Selina")
 lastName <- c("Hunt", "Jack", "Whick", "Kyle")
 sex <- c("MALE", "MALE", "MALE", "FEMALE")
 score <- c(97, 88, 85, 92)
 #creating a data frame named stu_df
 stu_df <- data.frame(firstName, lastName, sex, score)
 stu_df
 #using str() to get more information about the data frame.
 str(stu_df)

If you want a quick glance on the data frame and want to know how the data looks like in the data frame you can always run head() and tail() functions on your data frame. They return the first 5 rows and the last five rows of the data frame respectively.


 firstName <- c("Ethan", "Captian", "John", "Selina")
 lastName <- c("Hunt", "Jack", "Whick", "Kyle")
 sex <- c("MALE", "MALE", "MALE", "FEMALE")
 score <- c(97, 88, 85, 92)
 stu_df <- data.frame(firstName, lastName, sex, score)
 head(stu_df)
 tail(stu_df)

Now, the data.frame() assigns the column names as the names of the vectors you provided while building the data frame. You can retrieve these names using the names() function.

 firstName <- c("Ethan", "Captian", "John", "Selina")
 lastName <- c("Hunt", "Jack", "Whick", "Kyle")
 sex <- c("MALE", "MALE", "MALE", "FEMALE")
 score <- c(97, 88, 85, 92)
 stu_df <- data.frame(firstName, lastName, sex, score)
 names(stu_df)
 firstName <- c("Ethan", "Captian", "John", "Selina")
 lastName <- c("Hunt", "Jack", "Whick", "Kyle")
 sex <- c("MALE", "MALE", "MALE", "FEMALE")
 score <- c(97, 88, 85, 92)
 rowNames <- c("first Row","Second row","Third Row","fourth Row")
 stu_df <- data.frame(firstName, lastName, sex, score, row.names = rowNames)
 #to only access rows 1 through 3 and column 3 through 4.
 stu_df <- stu_df[1:3, 3:4]
 stu_df

Also instead of numbers, we can use names of the columns to subset them. See the example below:

 firstName <- c("Ethan", "Captian", "John", "Selina")
 lastName <- c("Hunt", "Jack", "Whick", "Kyle")
 sex <- c("MALE", "MALE", "MALE", "FEMALE")
 score <- c(97, 88, 85, 92)
 rowNames <- c("first Row","Second row","Third Row","fourth Row")
 stu_df <- data.frame(firstName, lastName, sex, score, row.names = rowNames)
 stu_df <- stu_df[1:3, c("sex", "score")]
 stu_df

If you want to add a new column to the existing data frame, you can do so by using the $ operator and the assignment operator '<-'

 firstName <- c("Ethan", "Captian", "John", "Selina")
 lastName <- c("Hunt", "Jack", "Whick", "Kyle")
 sex <- c("MALE", "MALE", "MALE", "FEMALE")
 score <- c(97, 88, 85, 92)
 rowNames <- c("first Row","Second row","Third Row","fourth Row")
 stu_df <- data.frame(firstName, lastName, sex, score, row.names = rowNames)
 stu_df
 # adding a new column to the existing data frame
 stu_df$age <- c(38, 50, 45, 24)
 stu_df