Data 101 Portal

The input format for the test is crucial here. There are 6 possible parameters.

1)The dataset that can be used. In this case, it is the traffic dataset.

2) The CATEGORICAL VARIABLE that contains 2 categories or groups that you are doing the test for. In this case, it is Lincoln and Holland. Remember, if there is no categorical column present, you need to create one. Basically, the 2 groups of data that you are testing for.

For the Athlete case, you had to test whether athletes have higher GPA, or non-athletes do. You did not have a categorical variable (Athletes and Non-Athletes) here. You created that based on the Athletic Ability. You cannot directly pass Athletic Ability as a parameter here. Create a column based on whatever condition you want, and now since it is a categorical column, pass this column, say Category.

3) The third is the NUMERICAL parameter that you are interested in, for those 2 categorical values. For the tunnels, it is traffic. For Athletes, it is the GPA. For countries, it is the happiness index.

4) The number of permutations that you want. At least 2 required, or it will give you an error.

5) The lower categorical variable. By lower, I mean the one that has a lower average value for the numerical column (Traffic, GPA, or Happiness index respectively as mentioned in (3).)

How do you know if the average is lower? You can check that by using tapply on the 2 categorical variables. You will get the average of the numerical columns for those 2 categorical variables and hence you get to know, which is lower.

6) The higher categorical variable.

Remember, this is important. The function is designed to do the mean of the last variable - the mean of the second last variable. If you interchange, you get a negative Z-score. This might give you a P-value of 0 and as a result, make you incorrectly believe that due to the P-value being so low, you can accept the alternative hypothesis, which infact might not be true if the Z-score is negative.

Hence, a permutation test does not hold valid for a negative Z-score.

 #ATTENTION: It will ONLY run on your R studio. We cannot install our own packages on this site since the R service we use - R fiddle prohibits it. Thus it will NOT run here. It is just for you to copy it and run it on your R studio.
 
 #Enter the following commands into your R console to #install the required packages:
 install.packages("devtools")
 devtools::install_github("devanshagr/PermutationTestSecond")
 
 #Make sure that you import your dataset using #read.csv and not read_csv. If you are importing it via the dialog box (Import Dataset), make sure you change read_csv to read.csv
 
 #Then, run the permutation function as follows:
 
 #example
 
 PermutationTestSecond::Permutation(TRAFFIC, "TUNNEL", "VOLUME_PER_MINUTE",1000,"Holland", "Lincoln")
 
 #Note that everything is in quotes except the name of #the data frame and the number of permutations. Also note that you can straight away write the attribute name in #the function without using $ (i.e. "TUNNEL" is right. "TRAFFIC$TUNNEL" will give an error.)
 
 #To run the permutation test once and see what happens after only one permutation.
 
 devtools::install_github("devanshagr/PermutationTestManual")
 
 PermutationTestManual::permute_test(read.csv('https://raw.githubusercontent.com/kunal0895/RDatasets/master/TRAFFIC.csv'), "TUNNEL", "VOLUME_PER_MINUTE", "Holland", "Lincoln")