So as we said, we use barplots to graphically represent the distribution of a categorical variable. Notice the type of column (categorical) here. We do not use a numerical column here (Unlike Boxplots, where we usually use a numerical column).
The first thing we need to do to make a bar plot is to make a table of the categorical column that we want to plot. Let us plot the column GRADE in the student_performance data frame.
By the way, you can always check what the data looks like by simply running head() function on the data frame you are using. For example head(student_performance) will return all the column names with a few rows of data. Try it!
Now, let's return to barplots. Let's create a table of the GRADE column.
#Reading the CSV file into a data frame using the assignment operator '<-'
student_performance <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
#creating a table
table(student_performance$GRADE)
The results of the above code tell you how many students got an A, how many got a B and so on. Now, using the method we just learned for creating a table, we will plot a bar plot for GRADES. We just need to pass this table to the Barplot function.
#Reading the CSV file into a data frame using the assignment operator '<-'
student_performance <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
#Generating the Barplot
barplot(table(student_performance$GRADE))
Now the above method gives the results we need, but it might look complicated to some (who are not familiar with programming). Specially the barplot(table(student_performance$GRADE)) part might look a little confusing. That line of code has two function calls. First, it creates a table of Grades using the table (student_performance$GRADE). Now it passes the table created to the Barplot function barplot(table(student_performance$GRADE)).
There is a simpler way to do it. Divide it into 2 steps. We can take and store the table in a variable and then can pass this variable to the barplot function (instead of calling two functions in the same line of code).
#Reading the CSV file into a data frame using the assignment operator '<-'
student_performance <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
#creating a table and storing the result in a variable gradeTable
gradeTable <- table(student_performance$GRADE)
#Generating the Barplot using gradeTable
barplot(gradeTable)
Now, wouldn't it be great if the bars in the bar plot were in the decreasing order of their heights? (Unless of course, you want an inherent order for the categorical variable). This can be achieved by passing enough information to the barplot function by using - order(gradeTable, decreasing= T). The first parameter i.e gradeTable signifies the parameter by which you want the barplots to be ordered. The second parameter i.e decreasing=T sorts those bars in the decreasing order.
#Reading the CSV file into a data frame using the assignment operator '<-'
student_performance <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
#creating a table and storing the result in a variable gradeTable
gradeTable <- table(student_performance$GRADE)
#Generating the Barplot. We use the order function to order the bars.
barplot(gradeTable[order(gradeTable, decreasing= T)])
Try to remove the decreasing= T parameter from the above code and see what happens. In a similar way, we can plot the bars in an increasing order.
Let us now make barplots that are horizontal. We just need to make one modification to the above code. We need to specify in the boxplot function that we want the plots in a horizontal order and this can be done by using horiz=T parameter.
#Reading the CSV file into a data frame using the assignment operator '<-'
student_performance <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
#creating a table and storing the result in a variable gradeTable
gradeTable <- table(student_performance$GRADE)
#Generating the Barplot. Horizontal
barplot(gradeTable[order(gradeTable)], horiz= T)
The barplot is coming up nicely now! Let's make it colorful! To do this, we pass a vector with colour information to the barplot function. We pass a vector a vector(more commonly known as an array) of colours to the col parameter.
#Reading the CSV file into a data frame using the assignment operator '<-'
student_performance <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
#creating a table and storing the result in a variable gradeTable
gradeTable <- table(student_performance$GRADE)
#Generating the Barplot with colors
barplot(gradeTable, horiz= T, col=c("Green","Blue","Grey","yellow","red"))
Doesn't it look more informative now?
Let us give the plot a name/title now. This can be done using the 'main' parameter.
#Reading the CSV file into a data frame using the assignment operator '<-'
student_performance <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
#creating a table and storing the result in a variable gradeTable
gradeTable <- table(student_performance$GRADE)
#Generating the Barplot. With title.
barplot(gradeTable, horiz= T, main="Bar plot for grades")
Let us now give the X-axis a name using the xlab parameter.
#Reading the CSV file into a data frame using the assignment operator '<-'
student_performance <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
#creating a table and storing the result in a variable gradeTable
gradeTable <- table(student_performance$GRADE)
#Generating the Barplot. With axis name.
barplot(gradeTable, horiz= T, main="Bar plot for grades", xlab="Number of Students")
If we want to change the range of values on the X-axis, we can do so using the xlim parameter. We pass the lower limit and the upper limit in a vector.
#Reading the CSV file into a data frame using the assignment operator '<-'
student_performance <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
#creating a table and storing the result in a variable gradeTable
gradeTable <- table(student_performance$GRADE)
#Generating the Barplot. With custom value range for X axis.
barplot(gradeTable, horiz= T, xlim=c(0, 800))
Notice how the range changed from 0-600 to 0-800
Let us now remove the borders from around the bars now. We can do this by simply using border=NA
#Reading the CSV file into a data frame using the assignment operator '<-'
student_performance <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
#creating a table and storing the result in a variable gradeTable
gradeTable <- table(student_performance$GRADE)
#Generating the Barplot. Removing the borders from around the bars.
barplot(gradeTable, horiz= T, border=NA)
Notice the subtle difference? The bars don't have a border anymore!
What if we only want to plot a subset of values of the categorical variable? Say, if we only want to plot the grades of students who never used cell phones in the class. We can filter these values while we create our table so that our table contains only such students who never used the cell phone in the class.
#Reading the CSV file into a data frame using the assignment operator '<-'
student_performance <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
#creating a table and storing the result in a variable gradeTable
gradeTable <- table(student_performance$GRADE[student_performance$ON_SMARTPHONE=="never"])
#Generating the Barplot. With only those students who never used the cell phone in class.
barplot(gradeTable, horiz= T, border=NA)
Do you see how this can be used to do some interesting things with data? Try to plot bars for students who are always late for class.
If you don't like space between your bars and would like them to be placed close together, you can use the 'space=F' parameter. This means space is false, and it will remove the spaces in between.
#Reading the CSV file into a data frame using the assignment operator '<-'
student_performance <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
#creating a table and storing the result in a variable gradeTable
gradeTable <- table(student_performance$GRADE)
#Generating the Barplot. With no space between bars.
barplot(gradeTable, horiz= T, border=NA, space=F)
Now suppose you want to give the bars some custom name(other than the ones that are automatically displayed on the x-axis). For example, if you want to name A=Very Good, B=Good and so on. To achieve this we can use the names.arg parameter.
#Reading the CSV file into a data frame using the assignment operator '<-'
student_performance <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
#creating a table and storing the result in a variable gradeTable
gradeTable <- table(student_performance$GRADE)
#Generating the Barplot with custom bar names.
barplot(gradeTable, main="Bar plot for grades", ylab="Number of Students",names.arg=c("Very Good","Good","Satisfactory","Bad","Failed"))
Can you guess what will happen if we use another parameter border="red"?
#Reading the CSV file into a data frame using the assignment operator '<-'
student_performance <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
#creating a table and storing the result in a variable gradeTable
gradeTable <- table(student_performance$GRADE)
#Generating the Barplot with border.
barplot(gradeTable, main="Bar plot for grades", ylab="Number of Students",border="red")
Let's make our plot cooler! Now I am going to introduce a new parameter called "density". Using it we can fill our bars with custom density shades. We provide a vector containing the density we want (a vector of numerical values) for each bar.
#Reading the CSV file into a data frame using the assignment operator '<-'
student_performance <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
#creating a table and storing the result in a variable gradeTable
gradeTable <- table(student_performance$GRADE)
#Generating the Barplot with density values.
barplot(gradeTable, main="Bar plot for grades", ylab="Number of Students",border="red", density=c(20,40,60,80,100))
Try playing with the density values a bit. See what happens at 0 or very high values. Put values 1, 2, 3, 4 etc and observe the result carefully. You will notice that the values of density that you provide are approximately equal to the number of parts the bar gets divided into.
If you want to change the angles at which the shading lines are at, this can be done by providing the angle parameter. angle = angle in degrees.
#Reading the CSV file into a data frame using the assignment operator '<-'
student_performance <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
#creating a table and storing the result in a variable gradeTable
gradeTable <- table(student_performance$GRADE)
#Generating the Barplot. With angle for the shade lines.
barplot(gradeTable, main="Bar plot for grades", ylab="Number of Students", density=c(20,40,60,80,100), angle=45)
You can change the size of the numeric axes labels and the bar labels using the cex parameter. Setting cex.main=2 will double the size of the title and setting cex.main=0.5 will reduce the text size by half. Same goes for bar labels.
#Reading the CSV file into a data frame using the assignment operator '<-'
student_performance <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
#creating a table and storing the result in a variable gradeTable
gradeTable <- table(student_performance$GRADE)
#Generating the Barplot
barplot(gradeTable, main="Bar plot for grades", ylab="Number of Students",names.arg=c("Very Good","Good","Satisfactory","Bad","Failed"), cex.main=2,cex.names=0.5 )
In case you want some bars to have an offset over the others on the Y axis, you can do that by specifying the offset vector.
#Reading the CSV file into a data frame using the assignment operator '<-'
student_performance <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
#creating a table and storing the result in a variable gradeTable
gradeTable <- table(student_performance$GRADE)
#Generating the Barplot. With offset values.
barplot(gradeTable, main="Bar plot for grades", ylab="Number of Students",offset=c(20,50,100,0,300) )
What happens if you only provide one value in the vector? What happens when you provide only two values in the vector? (They get applied alternatively)
Sometimes we might have a large number of bars for the barplot. In such cases, we can have to make a few changes to the bar labels and then export the plot as a 'pdf' file using R Studio.
#Reading the CSV file into a data frame using the assignment operator '<-'
student_performance <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
#creating a table and storing the result in a variable gradeTable
gradeTable <- table(student_performance$GRADE)
#Generate a barplot using specific parameters for smaller display size
barplot(gradeTable, las =2, main = "Bar plot for grades", cex.names = 0.45, xlab = "Grades", ylab = "Number of Students", col = rainbow(155))