read.csv()

read.csv() function is used for reading a file that has data fields stored as comma separated values. It can read data from .csv files that are not comma separated too.

Example

CSV stands for comma separated values. So, as the name suggests, read.csv function is used for reading a file that has data fields stored as comma separated values. Such files usually have a .csv extension.

read.csv offers some control over reading the CSV file. Lets us see a few examples. For the purpose of examples, we will be using a .csv file stored at location "https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv".

First, let us simply read the file into a data frame.

 moody_df<- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
 moody_df

We can check the structure of the data frame we just created by running str(moody_df)

 moody_df<- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
 str(moody_df)

Notice that each field has a name. This happened because of the header row in the file.

The default method includes header = TRUE, which means that it is assumed that the file will have a header. If that is not the case then you can put header = FALSE. Our CSV file has a header. Let's now try it with header = FALSE and see what happens.

 moody_df<- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv", header = FALSE)
 str(moody_df)

Notice how the names have changed to V1, V2 and so on. These names are given in case the file does not have a header.

Now there are some other parameters that are not that commonly used with read.csv. The 'sep' parameter represents what character is used for separating the data fields. For CSV files it is a comma, so we do not need to specify this. The 'quote' Parameter is used to specify which character is used to represent quotes and similarly 'dec' is used to specify what character will be representing the decimal point. 'fill', a logical parameter is used to when the rows in the CSV file are of unequal length. In such cases, if fill is set to TRUE, it fills the missing values with blank fields.

This is how they are used in the function.

 moody_df<- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv", header = TRUE, sep=",", dec = ".", fill = TRUE)
 str(moody_df)