sapply is very similar to lapply, and just like lapply, you can also use sapply to calculate the average value of each given dataset.
For example, we want to calculate the average of the SCORE and FINALEXAM columns in the MOODY dataset.
Step 1: Create a dataframe to get data from multiple columns.
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
score_exam <- data.frame(moody$SCORE,moody$FINALEXAM)
#First we created a dataframe to collect data, and we can input as much as numerical data as we want into this dataframe.
score_exam[1:20,]
#This shows a sample of the first 20 groups of data.
Step 2: Calculate the average of the score and final exam columns.
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
score_exam <- data.frame(moody$SCORE,moody$FINALEXAM)
average_score_exam <- sapply(score_exam,mean)
average_score_exam
How can we apply quantile to each data set?
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
score_exam <- data.frame(moody$SCORE,moody$FINALEXAM)
score_quantile <- sapply(score_exam,quantile)
score_quantile
#The default quantile function will separate the values into four groups with the same amount of data in each quantile, in ascending order.
What if we want to divide the quantile into more groups?
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
score_exam <- data.frame(moody$SCORE,moody$FINALEXAM)
score_quantile <- sapply(score_exam,quantile,probs = 2:6/8)
#Here, "probs=2:6/8" means separating the whole dataset into 8 equal groups in ascending order, and we choose the quantile from 2/8, 3/8...upto 6/8.
score_quantile
Now we can compare the difference between using pure data and applying the quantile function.
1.Before using quantile
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
plot(moody$SCORE,moody$FINALEXAM)
#the plot is a little bit messy
2.after using quantile
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
score_exam <- data.frame(moody$SCORE,moody$FINALEXAM)
score_quantile <- sapply(score_exam,quantile,probs = 2:6/8)
plot(score_quantile)
Sometimes the existing functions like "mean", "quantile" etc. can't satisfy us, and we may want to invent new functions on purpose.
For example, we want to know the integer value of the score and exam.
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
score_exam <- data.frame(moody$SCORE,moody$FINALEXAM)
score_exam_integer <- sapply(score_exam,function(x) x%/%1)
#"x in the function(x) refers to the data score_exam, x%/%1 means the value in the data is divided by integer 1, which would ignore the decimal number because they are smaller than one.
score_exam_integer[1:10]
#A sample of it, the exam value didn't show up but it has already been translated into an integer.