  Developed by Prana Ugiana Gio & Rezzy Eko Caraka Statistical Calculator (STATCAL)Easy Statistical Application Program to Educate     # Case: Pearson and Spearman Correlation Using STATCAL, R, SPSS and Jamovi (Part 1)

### 1. Data

Suppose given data as follows (Figure 1).

Figure 1 Data The data above presents height and weight of 10 person. Based on the data above:

=> We will make a scatter plot to know the trend or direction of data distribution.

=> Perform normality test. If normality assumption is satisfied, Pearson correlation will be used. If not, Spearman correlation will be used.

=> Test whether or not significant correlation between height and weight.

### 2. Scatter Plot

The following are various scatter plot based on STATCAL (R), SPSS and Jamovi.

Figure 2 Scatter Plot Based on STATCAL (R) Figure 3 Scatter Plot Based on SPSS Figure 4 Scatter Plot Based on Jamovi Based on the scatter plot above, we can see the trend of data distribution is getting increase (positive trend). It means that the higher height, the higher weight. We can say that there is a positive relationship between height and weight.

Scatter Plot Based On STATCAL

Scatter Plot Based On SPSS

Scatter Plot Based On Jamovi

#This is R code to create scatter plot

height = c(167.32,163.18,153.17,189.42,174.32,157.36,185.32,167.43,157.42,174.23)

weight = c(69.43,59.17,50.74,90.43,77.43,53.35,80.23,63.35,58.42,67.22)

library(ggplot2)
ggplot(data = NULL, aes(x = height, y = weight)) + geom_point(size = 3) +
geom_smooth(method=lm)


Scatter Plot Based On R

# Pearson and Spearman Correlation with R, SPSS & Minitab

### 1. Brief Overview of Pearson and Spearman Correlation

Pearson correlation is a value which can be used to measure linear relationship between two variables. Pearson correlation value also notifies dirrection of relationship. Pearson correlation value is lied between -1 and 1. The Pearson linear correlation is also called Pearson product-moment correlation. Notation of Pearson linear correlation for sample is "r", while for population is $$\rho$$ (we read "rho"). If the value of Pearson correlation is so close with -1 or +1, it means that the linear relationship between two variables is so strong.

To use Pearson correlation, the scale of data is continuous or numeric. Michael J. Crawley (2015:108) in his book "Statistics, An Introduction Using R, Second Edition":

"With two continuous variables, x and y, the question naturally arises as to whether their values are correlated with each other (remembering, of course, that correlation does not imply causation). Correlation is defined in terms of the variance of x, the variance of y, and the covariance of x and y (the way the two vary together; the way they co-vary) on the assumption that both variables are normally distributed. We have symbols already for the two variances, $$s_x^2$$ and $$s_y^2$$. We denote the covariance of x and y by cov(x, y), after which the correlation coefficient r is defined as $$r = \frac{cov(x,y)}{\sqrt{s_x^2 s_y^2} }"$$

Spearman correlation is an alternative of Pearson correlation when normality assumption is not satisfied. Peter Dalgaard (2008:123) in his "Introductory Statistics with R, Second Edition":

"As with the one-and two-sample problems, you may be interested in nonparametric variants. These have the advantage of not depending on the normal distribution and, indeed, being invariant to monotone transformations of the coordinates. The main disadvantage is that its interpretation is not quite clear. A popular and simple choice is Spearman's rank correlation coefficient $$\rho$$. This is obtained quite simply by replacing the observations by their rank and computing the correlation. Under the null hypothesis of independence between the two variables, the exact distribution of $$\rho$$ can be calculated."

### 2. Dataset

The following is R code. You can copy this R code and run in R.


Performance = c(75,65,72,84,74,59,83,55,65,73)

Motivation = c(71,60,78,79,69,55,80,45,78,77)

dframe = data.frame(Performance,Motivation)

dframe Based on the data above, there two variables, namely performance and motivation.

### 3. Scatter Plot

The following R code displays scatter plot between motivation and performance.


Performance = c(75,65,72,84,74,59,83,55,65,73)

Motivation = c(71,60,78,79,69,55,80,45,78,77)

dframe = data.frame(Performance,Motivation)

library(ggplot2)
ggplot(data = dframe, aes(x = Motivation, y = Performance)) + geom_point(size = 3) + geom_smooth(method=lm) Based on the scatter plot above, we can see the trend of data distribution is getting increase (positive trend). It means that the higher motivation, the higher performance. We can say that there is a positive relationship.

### 4. Pearson Correlation

Now, we will calculate Pearson correlation with R. The following is R code to calculate Pearson correlation.



Performance = c(75,65,72,84,74,59,83,55,65,73)

Motivation = c(71,60,78,79,69,55,80,45,78,77)

cor.test(Motivation, Performance) Based on the result above, we obtain Pearson correlation is 0.8239. Note that the value of Pearson correlation is positive. It means that there is postive relationship between motivation and performance. The higher motivation, the higher performance. Beside that, the p-value is 0.003377 < level of significance 0.05, so we can conclude that there is significant relationship between motivation and performance.

The following are SPSS and Minitab result for Pearson correlation which give same result with R.

Table 1 SPSS Result for Pearson Correlation Table 2 Minitab Result for Pearson Correlation Based on SPSS result in Table 1, the value of Pearson correlation is 0,824, with p-value is 0,003. While based on Minitab result in Table 2, the value of Pearson correlation is 0,824, with p-value 0,003, which give same result with R.

### 5. Size an Effect

Andy Field (2009:170) in his book "Discovering Statistics Using SPSS, Third Edition":

"We also saw in section 2.6.4 that because the correlation coefficient is a standardized measure of an observed effect, it is a commonly used measure of the size of an effect and that values of $$\pm{0.1}$$ represent a small effect, $$\pm{0.3}$$ is a medium effect and $$\pm{0.5}$$ is a large effect (although I re-emphasize my caveat that these canned effect sizes are no substitute for interpreting the effect size within the context of the research literature)."

Based on explanation above: $$Value \ of \ Pearson \ correlation \ is \ \pm{0.1} \ represent \ a \ small \ effect.$$ $$Value \ of \ Pearson \ correlation \ is \ \pm{0.3} \ represent \ a \ medium \ effect.$$ $$Value \ of \ Pearson \ correlation \ is \ \pm{0.5} \ represent \ a \ large \ effect.$$

Based on the preceding explanation, correlation between motivation and performance is 0,824 which is greater than 0,5. It means that motivation and performance is strong correlated.

### 6. Normality Assumption

One of assumption in Pearson correlation is normality assumption, namely both samples are assumed from normal distribution populations. Andy Field (2009:178) in his book "Discovering Statistics Using SPSS, Third Edition":

"However, if you want to establish whether the correlation coefficient is significant, then more assumptions are required: for the test statistic to be valid the sampling distribution has to be normally distributed and as we saw in Chapter 5 we assume that it is if our sample data are normally distributed (or if we have a large sample). Although typically, to assume that the sampling distribution is normal, we would want both variables to be normally distributed. "

Based on explanation above, When normality assumption shoud be tested?

The answer is when we will do the significant test of Pearson correlation.

Why?

It is because, fulfilled or not of normality assumption will affect the sampling distribution of Pearson correlation statistics. When normality assumption is satisfied, so the sampling distribution of Pearson correlation statistic will form t distribution, so that the rule of t distribution can be used. When normality assumption is not satisfied, so the sampling distribution of Pearson correlation statistic will so far from t distribution, so that the conclucion become not accurate (misleading conclucion).

The following is R code to test normality assumption using Komogorov-Smirnov test.


Performance = c(75,65,72,84,74,59,83,55,65,73)

Motivation = c(71,60,78,79,69,55,80,45,78,77)

ks.test(Motivation,"pnorm", mean(Motivation), sd(Motivation), exact = FALSE  )

ks.test(Performance,"pnorm", mean(Performance), sd(Performance), exact = FALSE  ) Based on Kolmogorov-Smirnov test above, we obtain p-value of motivation is 0.604 > level of significance 0.05. So we conclude that normality assumption (population) for motivation data is satisfied.

We also obtain p-value of performance is 0.9539 > level of significance 0.05. So we conclude that normality assumption (population) for performance data is satisfied.

the following is Kolmogorov-Smirnov result for normality test based on SPSS.

Table 3 SPSS Result for Kolmogorov-Smirnov Test Based on the result of Kolmogorov-Smirnov test above, we obtain p-value for motivation is 0,954 and p-value for motivation is 0,604. Both p-value > significant level 0,05. It means that normality assumption (population) for motivation and performance data are satisfied.

### 7. Spearman Correlation

Spearman correlation is an alternative for Pearson correlation when normality assumption is not satisfied. The following is R code for performing Spearman correlation.


Performance = c(75,65,72,84,74,59,83,55,65,73)

Motivation = c(71,60,78,79,69,55,80,45,78,77)

cor.test(Motivation,Performance, method = c("spearman")) Based on the result above, we obtain Spearman correlation is 0.7408537. Note that the value of Spearman correlation is positive. It means that there is postive relationship between motivation and performance. The higher motivation, the higher performance. Beside that, the p-value is 0.01423 < level of significance 0.05, so we can conclude that there is significant relationship between motivation and performance.

The following is Spearman correlation result based on SPSS.

Table 4 SPSS Result for Spearman Test Based on SPSS result for Spearman correlation above, we obtain Spearman correlation is 0,741 with p-value is 0,014 which give same result with R.

# Paired-Samples T Test and Wilcoxon Test with R, SPSS and Minitab

### 1. Brief Overview of Paired-Samples t Test

Paired-samples t test (dependent t test) and Wilcoxon test can be used to test whether or not significant difference (statistically) based on 2 related-samples (2 paired-samples).

### 2. Data

The following is R code that show our data.


#COPY THIS R CODE AND RUN IN R TO SEE THE DATA

weight_before = c(85,79,83,77,85,78)

weight_after = c(84,74,80,76,83,77)

dframe = data.frame(weight_before, weight_after)
colnames(dframe) = c("weight before consuming diet medicine","weight after consuming diet medicine")
dframe

print("Average of weight before consuming diet medicine:")
mean(weight_before) #calculate mean
print("Average of weight after consuming diet medicine:")
mean(weight_after) #calculate mean



Figure 1 Our Data  Based on the data above is presented weight of 6 persons, before and after consuming diet medicine XYZ in a week. The average of weight before consuming diet medicine XYZ is 81,16667, while the average of weight after consuming diet medicine XYZ is 79. On average, there is a reduction of weight after consuming diet medicine XYZ in a week.

In this case, we will use paired-samples t test to test whether or not significant difference in weight before and after consuming diet medicine in a week.

### 3. Paired-Samples t Test Using R Language

The following is R code to perform paired-samples t test.


#COPY THIS R CODE AND RUN IN R TO PERFORM PAIRED-SAMPLES T TEST

weight_before = c(85,79,83,77,85,78)

weight_after = c(84,74,80,76,83,77)

t.test(weight_before,weight_after,paired=TRUE)



Figure 2 Result of Paired-Samples t Test Using R Based on result of paired-samples t test above, we obtain p-value 0,02118 < level of significance 0,05, so we can conclude that there is significant difference in weight before and after consuming diet medicine in a week.

The following is result of paired-samples t test using SPSS and Minitab, which give the same result with R.

Figure 3 Paired-Samples t Test Using SPSS Figure 4 Paired-Samples t Test Using Minitab Based on SPSS result in Figure 3, we obtain the p-value is 0,021177, while p-value in Minitab (Figure 4) is 0,021. Both p-value are respectively same with p-value of R.

### 4. Normality Assumption

One of assumption in paired-samples t test is normality assumption, namely population of difference of paired-data is assumed to be normally distributed.

Andy Field (2009:329) in his book "Discovering Statistics Using SPSS, 3rd Edition":

"9.4.3. The dependent t-test and the assumption of normality
We talked about the assumption of normality in Chapter 5 and discovered that parametric tests (like the dependent t-test) assume that the sampling distribution is normal. This should be true in large samples, but in small samples people often check the normality of their data because if the data themselves are normal then the sampling distribution is likley to be also. With the dependent t-test we analyse the differences between scores because we’re interested in the sampling distribution of these differences (not the raw data). Therefore, if you want to test for normality before a dependent t-test then what you should do is compute the differences between scores, and then check if this new variable is normally distributed (or use a big sample and not worry about normality!). It is possible to have two measures that are highly non-normal that produce beautifully distributed differences!"

The following is R code to test normality assumption. I use Kolmogorov-Smirnov test to test normality assumption.


#THIS IS R CODE TO TEST NORMALITY ASSUMPTION USING DIFFERENCE OF PAIRED-DATA

weight_before = c(85,79,83,77,85,78)

weight_after = c(84,74,80,76,83,77)

difference = weight_after - weight_before

dframe = data.frame(weight_before, weight_after, difference)
colnames(dframe) = c("weight before consuming diet medicine","weight after consuming diet medicine","difference")

dframe

ks.test(difference,"pnorm", mean(difference), sd(difference), exact = FALSE  )



Figure 5 Testing of Normality Assumption using Kolmogorov-Smirnov Test Based on R Language  Based on the result above (Figure 5), we obtain p-value 0,7876 > level of significance 0,05, so we can conclude that normality assumption of difference paired-data is satisfied.

The following is SPSS result to perform normality test using Kolmogorov-Smirnov test.

Figure 6 Testing of Normality Assumption using Kolmogorov-Smirnov Test Based on SPSS  Based on SPSS result above, we obtain p-value (Asymp. Sig. (2-tailed)) 0,787 > level of significance 0,05, so we can conclude that normality assumption of difference paired-data is satisfied.

### 5. Normality Assumption Can Be Ignored

In paired-samples t test, normality assumption can be ignored. Andy Field (2009:329) in his book "Discovering Statistics Using SPSS, 3rd Edition":

"9.4.3. The dependent t-test and the assumption of normality
We talked about the assumption of normality in Chapter 5 and discovered that parametric tests (like the dependent t-test) assume that the sampling distribution is normal. This should be true in large samples..."

Based on explanation above, if size of sample is large, normality assumption can be ignored, because the sampling distribution is normally distributed.

When the sample size is considered to be large?

Murray R. Spiegel dand Larry J. Stephens (2008:275-276) in their book "Statistics 4th Edition":

"In previous chapters we often made use of the fact that for samples of size N > 30, called large samples, the sampling distributions of many statistics are approximately normal, the approximation becoming better with increasing N."

Based on the explanation above, sample size is considered to be large if > 30.

### 6. Wilcoxon Test

Wilcoxon test is an alternative of paired-samples t test. When sample size is < 30 and normality assumption is not satisfied, Wilcoxon test can be an alternative for paired-samples t test. The following is R code to perform Wilcoxon test.


#THIS IS R CODE TO PERFORM WILCOXON TEST

weight_before = c(85,79,83,77,85,78)

weight_after = c(84,74,80,76,83,77)

wilcox.test(weight_after, weight_before, paired = TRUE, correct = FALSE)



Figure 7 Result of Wilcoxon Test Using R Based on result of Wilcoxon test above, we obtain p-value 0,02601 < level of significance 0,05, so we can conclude that there is significant difference in weight before and after consuming diet medicine in a week. The following is result of Wilcoxon test test using SPSS, which give the same result with R.

Figure 8 Result of Wilcoxon Test Using SPSS  Based on result of Wilcoxon test above, we obtain p-value is 0,02601 < level of significance 0,05, so we can conclude that there is significant difference in weight before and after consuming diet medicine in a week.

## Repeated Measures ANOVA and Friedman Test with STATCAL and SPSS

Repeated-measures ANOVA and Friedman Test using SPSS

Repeated-measures ANOVA and Friedman Test using STATCAL # 6 Steps Run Your Shiny Apps with Shortcut

In this article, i will explain, step by step how to make your shiny apps can be run with a shortcut (without run RStudio). I formulate 6 steps to do it. Here, i assume you have had shiny apps.

### 1. Create a Folder for Working

The first step is creating a folder for working. Here, i make a folder with name "MYAPPS". The location of this folder at C:.

Figure 1 Create a Folder for Working at C: ### 2. Save Your UI and SERVER File in MYAPPS Folder

The second step, i save ui.r and server.r file in MYAPPS folder. Figure 3 and Figure 4 are respectively ui.r and server.r file.

Figure 2 Save ui.r and server.r File at MYAPPS Folder Figure 3 ui.r File Figure 4 server.r File ### 3. Create run.r File

The third step is creating run.r file (Figure 5).

Figure 5 Create run.r File ### 4. Get Location of R.exe

The fourth step is get location of R.exe (Figure 6).

Figure 6 Get Location of R.exe Based on Figure 6, we obtain location of R.exe at C:\Program Files\R\R-3.4.3\bin.

### 5. Create MYAPPS.bat File

The fifth step is making a MYAPPS.bat file (Figure 7).

Figure 7 Create MYAPPS.bat File   ### 6. Create Shortcut and Run Your Shiny Application

The last step is creating a shortcut and run shiny application with this shortcut (Figure 8).

Figure 8 Create Shortcut and Run Your Shiny Application  You Can See This Video (Bottom)
This Video Explains Step by Step to Make Shiny Apps Can be Launched with a Shortcut