advanced category---levels

group a series of factors into a more general factor, or separate a general factor into some more specific factors.

Example

Sometimes, we want to classify some categorical data which has some common features into a big group. For example, the grade "A" "B" "C" has the basic trait "pass", and the grade "F" means "fail". How can we rename the characters in a given dataset?

Here is the easiest way to do so:

First, you can create a new column.

 moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
 moody[,8] <- ""
 #moody[row number, column number]. MOODY dataset has only 7 columns, so here we create the 8th column in the MOODY dataset. I didn't fill in any content into the cells.
 moody[1:10,1:8]
 #a small part of the data

Second, let's fill the cells with new words: "pass" "fail".

1. fill the word "fail"

 moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
 moody[,8] <- ""
 moody[moody$GRADE=="F",8] <- "fail"
 #"==" is a judgement sign, if grade is equal to "F", then we consider it as "fail"
 moody[1:20,1:8]
 #some of the cells are filled with words "fail", right?

2. fill the word "pass"

 moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
 moody[,8] <- ""
 moody[moody$GRADE!="F",8] <- "pass"
 #"!=" means "not equal to", which means that if the grade is not "F", then the person has passed.
 moody[1:20,1:8]

3. Now you just need to combine step 1 and 2 together and rename the column if necessary

 moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
 moody[,8] <- ""
 moody[moody$GRADE!="F",8] <- "pass"
 moody[moody$GRADE=="F",8] <- "fail"
 colnames(moody)[8] <- "passed or not"
 # The format is "colnames(dataset)[the column number that you want to change the name of]".
 moody[1:20,1:8]

Another way is to change the values in the dataset directly, but be careful when you want to use this method.

Here is a WRONG way to do that:

 moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
 moody$GRADE[moody$GRADE=="F"] <- "fail"
 moody[1:10,]

You can see when using the code above, you cannot change the letter "F", but instead also generate "NA" values.

This happened because in the column"GRADE", you only have A B C and F - four factors, so if you input any characters other than those four characters, they will be considered as NA values.

Then how to add new factors?

 moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
 levels(moody$GRADE)
 #so far we have four factors
 levels(moody$GRADE) <- c(levels(moody$GRADE),"pass","fail")
 #except for the existing factors, we manually add "pass" and "fail" factors.

Now we can rename the characters!

 moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
 levels(moody$GRADE) <- c(levels(moody$GRADE),"pass","fail")
 moody$GRADE[moody$GRADE=="F"] <- "fail"
 moody$GRADE[moody$GRADE!="F"] <- "pass"
 moody[1:10,]

You may want to change the value of one cell based on a particular value in another column. For example, we want to consider the person whose score is above 90 to be "excellent".

 moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
 levels(moody$GRADE) <- c(levels(moody$GRADE),"excellent")
 moody[moody$SCORE>=90,3] <- "excellent"
 #We judge whether the grade is excellent by checking whether the score is more than 90. If so, we fill "excellent" in the 3rd column.
 summary(moody$GRADE)

You may also want to set multiple conditions. For example, you consider people whose score is smaller than 90 but still got an A as "lucky".

We now have two conditions:

1. score<90

2. grade="A"

 moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
 levels(moody$GRADE) <- c(levels(moody$GRADE),"lucky")
 moody[moody$SCORE<90&moody$GRADE=="A",3] <- "lucky"
 #"&" means "and", which means only a person whose score is smaller than 90 and grade A can be considered as "lucky".
 summary(moody$GRADE)