Sometimes, we want to classify some categorical data which has some common features into a big group. For example, the grade "A" "B" "C" has the basic trait "pass", and the grade "F" means "fail". How can we rename the characters in a given dataset?
Here is the easiest way to do so:
First, you can create a new column.
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
moody[,8] <- ""
#moody[row number, column number]. MOODY dataset has only 7 columns, so here we create the 8th column in the MOODY dataset. I didn't fill in any content into the cells.
moody[1:10,1:8]
#a small part of the data
Second, let's fill the cells with new words: "pass" "fail".
1. fill the word "fail"
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
moody[,8] <- ""
moody[moody$GRADE=="F",8] <- "fail"
#"==" is a judgement sign, if grade is equal to "F", then we consider it as "fail"
moody[1:20,1:8]
#some of the cells are filled with words "fail", right?
2. fill the word "pass"
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
moody[,8] <- ""
moody[moody$GRADE!="F",8] <- "pass"
#"!=" means "not equal to", which means that if the grade is not "F", then the person has passed.
moody[1:20,1:8]
3. Now you just need to combine step 1 and 2 together and rename the column if necessary
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
moody[,8] <- ""
moody[moody$GRADE!="F",8] <- "pass"
moody[moody$GRADE=="F",8] <- "fail"
colnames(moody)[8] <- "passed or not"
# The format is "colnames(dataset)[the column number that you want to change the name of]".
moody[1:20,1:8]
Another way is to change the values in the dataset directly, but be careful when you want to use this method.
Here is a WRONG way to do that:
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
moody$GRADE[moody$GRADE=="F"] <- "fail"
moody[1:10,]
You can see when using the code above, you cannot change the letter "F", but instead also generate "NA" values.
This happened because in the column"GRADE", you only have A B C and F - four factors, so if you input any characters other than those four characters, they will be considered as NA values.
Then how to add new factors?
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
levels(moody$GRADE)
#so far we have four factors
levels(moody$GRADE) <- c(levels(moody$GRADE),"pass","fail")
#except for the existing factors, we manually add "pass" and "fail" factors.
Now we can rename the characters!
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
levels(moody$GRADE) <- c(levels(moody$GRADE),"pass","fail")
moody$GRADE[moody$GRADE=="F"] <- "fail"
moody$GRADE[moody$GRADE!="F"] <- "pass"
moody[1:10,]
You may want to change the value of one cell based on a particular value in another column. For example, we want to consider the person whose score is above 90 to be "excellent".
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
levels(moody$GRADE) <- c(levels(moody$GRADE),"excellent")
moody[moody$SCORE>=90,3] <- "excellent"
#We judge whether the grade is excellent by checking whether the score is more than 90. If so, we fill "excellent" in the 3rd column.
summary(moody$GRADE)
You may also want to set multiple conditions. For example, you consider people whose score is smaller than 90 but still got an A as "lucky".
We now have two conditions:
1. score<90
2. grade="A"
moody <- read.csv("https://raw.githubusercontent.com/kunal0895/RDatasets/master/Moody2018.csv")
levels(moody$GRADE) <- c(levels(moody$GRADE),"lucky")
moody[moody$SCORE<90&moody$GRADE=="A",3] <- "lucky"
#"&" means "and", which means only a person whose score is smaller than 90 and grade A can be considered as "lucky".
summary(moody$GRADE)