If you have 2 categorical variables to plot, you use a mosaic plot.
Below is an example of a basic mosaic plot.
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
mosaicplot(moody$GRADE~moody$ON_SMARTPHONE)
Let's change the alignment of the text in order to leave more space and for better visibility.
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
mosaicplot(moody$GRADE~moody$ON_SMARTPHONE,las=1)
#"las" rotates the text.
You can also interchange the X-axis and Y-axis.
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
mosaicplot(t(table(moody$GRADE,moody$ON_SMARTPHONE)))
#t() means transfer X and Y axis. The table function should be used because t() can only transfer a matrix.
#Interchanging X and Y-axis does not seem like a good idea in this case.
Let's add some colour to distinguish those blocks.
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
mosaicplot(moody$GRADE~moody$ON_SMARTPHONE,color=c("blue","red"))
Sometimes there are too many blocks for you to fill the colors up in each of them one by one, so we may do that in an easier way.
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
mosaicplot <-mosaicplot(moody$ASKS_QUESTIONS~moody$GRADE,col=c(1:5))
#c(1:5) means automatically fill color to blocks using colour No.1 to No.5.
We can also add a title and a subtitle to it.
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
mosaicplot(moody$GRADE~moody$ON_SMARTPHONE,color=c("blue","red"),main="Grade vs Smartphone",sub="5 categories")
#"main" means the title, "sub" means the subtitle
Let's change the labels of X-axis and Y-axis.
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
mosaicplot(moody$GRADE~moody$ON_SMARTPHONE,xlab="Grade",ylab="Frequency")
What if we only want to observe the behavior of students who got an "A" or an "F"?
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
grade1 <- subset(moody,moody$GRADE=="A" | moody$GRADE=="F")
#select the data which satisfies "GRADE=A or F"
mosaicplot(grade1$GRADE~grade1$ON_SMARTPHONE)
#Observe the plot carefully. The variables "B", "C" "D" still exist even though they are 0, which makes the plot confusing.
So we need to add some more code.
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
grade1 <- subset(moody,moody$GRADE=="A"|moody$GRADE=="F")
grade1$GRADE <- factor(grade1$GRADE)
#change the factor level of GRADE in grade1
mosaicplot(grade1$GRADE~grade1$ON_SMARTPHONE)
Let's do a more complex task: Apart from selecting grades A and F, select the frequencies "never" and "always".
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
grade1 <- subset(moody,moody$GRADE=="A"|moody$GRADE=="F")
grade2 <- subset(grade1,grade1$ASKS_QUESTIONS=="never"|grade1$ASKS_QUESTIONS=="always")
question <- factor(grade2$ASKS_QUESTIONS)
grade <- factor(grade2$GRADE)
mosaicplot(table(grade,question))
#Generate a table(matrix) for the grade vector and the question vector.