I was wondering if you knew of a way to analyze correlational data with treatments in R? I’ve written the code for without treatments but I’m unsure of how to make R analyze it via different treatments. The treatments (3) are in the column “Salt”
library(readxl)
library(ggplot2)
library(psych)
licor <- read_excel("Licor Data for R.xlsx")
Subset data for variables of interest and include treatment variable
d <- licor[,c("A","Ca","Ci","gsw","Salt")]
summary(d) # check for missings
## A Ca Ci gsw
## Min. :-5.105 Min. :419.0 Min. :178.5 Min. :0.02251
## 1st Qu.: 1.416 1st Qu.:420.0 1st Qu.:288.4 1st Qu.:0.08258
## Median : 5.487 Median :420.0 Median :331.2 Median :0.11984
## Mean : 6.224 Mean :420.0 Mean :337.6 Mean :0.13571
## 3rd Qu.:10.461 3rd Qu.:420.0 3rd Qu.:386.0 3rd Qu.:0.15878
## Max. :30.418 Max. :421.5 Max. :531.9 Max. :0.69817
## Salt
## Length:593
## Class :character
## Mode :character
##
##
##
table(d$Salt) # sample sizes within treatment groups
##
## control high low
## 191 200 202
Get all combinations of two variables
var_pairs <- combn(x = c("A","Ca","Ci","gsw"), m = 2)
var_pairs
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] "A" "A" "A" "Ca" "Ca" "Ci"
## [2,] "Ca" "Ci" "gsw" "Ci" "gsw" "gsw"
Write a function to help us make the plots for all three levels of Salt.
f <- function(df, x, y){
ggplot(df) +
aes(x = .data[[x]], y = .data[[y]]) +
geom_point() +
geom_smooth() +
facet_wrap(~Salt) +
ggtitle(paste(y, "vs", x))
}
Apply the function to the variable pairs.
apply(var_pairs, 2, function(x)f(df = d, x = x[1], y = x[2]))
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
##
## [[5]]
##
## [[6]]
It looks like Ci vs A and gsw vs A are the only two pairs that exhibit any correlation, though it’s not clear there are any differences between the three levels of Salt.
The code in this section and the next come from the help page for the
r.test()
function in the {psych} package.
Perform a single correlation test for gsw vs A where Salt = control
and Salt = high. When calculating correlations, I set
method = "spearman"
since the plots showed some
non-linearity.
# correlation between A and gsw given Salt == "control"
r1 <- cor.test(~ A + gsw, data = d, subset = Salt == "control",
method = "spearman")
# correlation between A and gsw given Salt == "high"
r2 <- cor.test(~ A + gsw, data = d, subset = Salt == "high",
method = "spearman")
# view the correlations
cbind("Salt = control" = r1$estimate, "Salt = high" = r2$estimate)
## Salt = control Salt = high
## rho 0.5547844 0.5133893
Now run test to see if the difference in correlations is plausible
using the r.test()
function.
r.test(n = sum(d$Salt == "control"), r12 = r1$estimate, r34 = r2$estimate,
n2 = sum(d$Salt == "high"))
## Correlation tests
## Call:r.test(n = sum(d$Salt == "control"), r12 = r1$estimate, r34 = r2$estimate,
## n2 = sum(d$Salt == "high"))
## Test of difference between two independent correlations
## z value 0.57 with probability 0.57
The probability of seeing a difference this big (or bigger) between the correlations, assuming there is no real difference between the two, is pretty high (p = 0.57). We fail to conclude there is any important difference between the two correlations.
Compare all pairs of variables where Salt = control and Salt = high. This is probably a little more efficient than doing two at a time.
First we create the two correlation matrices and view the
correlations. To do this I use the lowerUpper()
function
from the {psych} package. The R1 correlations are displayed below the
diagonal, while the R2 correlations are displayed above the
diagonal.
# correlation matrix given Salt == "control"
R1 <- cor(d[d$Salt == "control", -5])
# correlation matrix given Salt == "high"
R2 <- cor(d[d$Salt == "high", -5])
# show the two correlations
round(lowerUpper(lower = R1, upper = R2), 2)
## A Ca Ci gsw
## A NA -0.16 -0.74 0.62
## Ca -0.16 NA 0.11 -0.17
## Ci -0.63 0.00 NA -0.08
## gsw 0.64 -0.20 0.03 NA
For example, we see the correlation between Ci vs A is -0.63 when salt = control (lower corner) and -0.74 when salt = high (upper corner)
Now run the test to compare all correlations at once.
# compare the correlation matrices
test <- r.test(n=sum(d$Salt == "control"), r12 = R1, r34 = R2,
n2 = sum(d$Salt == "high"))
# view the p-values, rounded to 3 decimal places
round(test$p, 3)
## A Ca Ci gsw
## A NaN 0.974 0.030 0.722
## Ca 0.974 NaN 0.274 0.784
## Ci 0.030 0.274 NaN 0.278
## gsw 0.722 0.784 0.278 NaN
The difference in correlations between Ci vs A may be of interest (p = 0.03). However, given that we ran six different hypothesis tests we may want to adjust the p-values to protect against false positives.
Below we extract the six p-values and use p.adjust()
to
adjust the p-values using the default Holm method. We then insert the
adjusted p-values into the upper corner of the p-value matrix, but leave
the original p-values in the lower corner.
# show the p values of the difference between the two matrices
adjusted <- p.adjust(test$p[upper.tri(test$p)])
both <- test$p
both[upper.tri(both)] <- adjusted
# The lower off diagonal are the raw p-values,
# the upper off diagonal are the adjusted p-values
round(both,digits=2)
## A Ca Ci gsw
## A NaN 1.00 0.18 1
## Ca 0.97 NaN 1.00 1
## Ci 0.03 0.27 NaN 1
## gsw 0.72 0.78 0.28 NaN
After adjustment, the p-value comparing the correlations for Ci vs A is now 0.18. The code above can be re-run to compare Salt = “control” to Salt = “low”, or Salt = “low” to Salt = “high”. Judging from the plots, it seems unlikely any important differences between correlations will be detected.