Correlational Analysis in R

Email from Jemima

I was wondering if you knew of a way to analyze correlational data with treatments in R? I’ve written the code for without treatments but I’m unsure of how to make R analyze it via different treatments. The treatments (3) are in the column “Salt”

Read in data

library(readxl)
library(ggplot2)
library(psych)

licor <- read_excel("Licor Data for R.xlsx")

Subset data for variables of interest and include treatment variable

d <- licor[,c("A","Ca","Ci","gsw","Salt")]
summary(d) # check for missings

##        A                Ca              Ci             gsw         
##  Min.   :-5.105   Min.   :419.0   Min.   :178.5   Min.   :0.02251  
##  1st Qu.: 1.416   1st Qu.:420.0   1st Qu.:288.4   1st Qu.:0.08258  
##  Median : 5.487   Median :420.0   Median :331.2   Median :0.11984  
##  Mean   : 6.224   Mean   :420.0   Mean   :337.6   Mean   :0.13571  
##  3rd Qu.:10.461   3rd Qu.:420.0   3rd Qu.:386.0   3rd Qu.:0.15878  
##  Max.   :30.418   Max.   :421.5   Max.   :531.9   Max.   :0.69817  
##      Salt          
##  Length:593        
##  Class :character  
##  Mode  :character  
##                    
##                    
##

table(d$Salt) # sample sizes within treatment groups

## 
## control    high     low 
##     191     200     202

Visualize with scatterplots and trend lines

Get all combinations of two variables

var_pairs <- combn(x = c("A","Ca","Ci","gsw"), m = 2)
var_pairs

##      [,1] [,2] [,3]  [,4] [,5]  [,6] 
## [1,] "A"  "A"  "A"   "Ca" "Ca"  "Ci" 
## [2,] "Ca" "Ci" "gsw" "Ci" "gsw" "gsw"

Write a function to help us make the plots for all three levels of Salt.

f <- function(df, x, y){
  ggplot(df) +
    aes(x = .data[[x]], y = .data[[y]]) +
    geom_point() +
    geom_smooth() +
    facet_wrap(~Salt) +
    ggtitle(paste(y, "vs", x))
}

Apply the function to the variable pairs.

apply(var_pairs, 2, function(x)f(df = d, x = x[1], y = x[2]))

## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

It looks like Ci vs A and gsw vs A are the only two pairs that exhibit any correlation, though it’s not clear there are any differences between the three levels of Salt.

Single correlation test

The code in this section and the next come from the help page for the r.test() function in the {psych} package.

Perform a single correlation test for gsw vs A where Salt = control and Salt = high. When calculating correlations, I set method = "spearman" since the plots showed some non-linearity.

# correlation between A and gsw given Salt == "control"
r1 <- cor.test(~ A + gsw, data = d, subset = Salt == "control", 
               method = "spearman")

# correlation between A and gsw given Salt == "high"
r2 <- cor.test(~ A + gsw, data = d, subset = Salt == "high", 
         method = "spearman")

# view the correlations
cbind("Salt = control" = r1$estimate, "Salt = high" = r2$estimate)

##     Salt = control Salt = high
## rho      0.5547844   0.5133893

Now run test to see if the difference in correlations is plausible using the r.test() function.

r.test(n = sum(d$Salt == "control"), r12 = r1$estimate, r34 = r2$estimate, 
       n2 = sum(d$Salt == "high"))

## Correlation tests 
## Call:r.test(n = sum(d$Salt == "control"), r12 = r1$estimate, r34 = r2$estimate, 
##     n2 = sum(d$Salt == "high"))
## Test of difference between two independent correlations 
##  z value 0.57    with probability  0.57

The probability of seeing a difference this big (or bigger) between the correlations, assuming there is no real difference between the two, is pretty high (p = 0.57). We fail to conclude there is any important difference between the two correlations.

Compare two correlation matrices

Compare all pairs of variables where Salt = control and Salt = high. This is probably a little more efficient than doing two at a time.

First we create the two correlation matrices and view the correlations. To do this I use the lowerUpper() function from the {psych} package. The R1 correlations are displayed below the diagonal, while the R2 correlations are displayed above the diagonal.

# correlation matrix given Salt == "control"
R1 <- cor(d[d$Salt == "control", -5])

# correlation matrix given Salt == "high"
R2 <- cor(d[d$Salt == "high", -5])

# show the two correlations 
round(lowerUpper(lower = R1, upper = R2), 2)

##         A    Ca    Ci   gsw
## A      NA -0.16 -0.74  0.62
## Ca  -0.16    NA  0.11 -0.17
## Ci  -0.63  0.00    NA -0.08
## gsw  0.64 -0.20  0.03    NA

For example, we see the correlation between Ci vs A is -0.63 when salt = control (lower corner) and -0.74 when salt = high (upper corner)

Now run the test to compare all correlations at once.

# compare the correlation matrices
test <- r.test(n=sum(d$Salt == "control"), r12 = R1, r34 = R2, 
       n2 = sum(d$Salt == "high"))

# view the p-values, rounded to 3 decimal places
round(test$p, 3)

##         A    Ca    Ci   gsw
## A     NaN 0.974 0.030 0.722
## Ca  0.974   NaN 0.274 0.784
## Ci  0.030 0.274   NaN 0.278
## gsw 0.722 0.784 0.278   NaN

The difference in correlations between Ci vs A may be of interest (p = 0.03). However, given that we ran six different hypothesis tests we may want to adjust the p-values to protect against false positives.

Below we extract the six p-values and use p.adjust() to adjust the p-values using the default Holm method. We then insert the adjusted p-values into the upper corner of the p-value matrix, but leave the original p-values in the lower corner.

# show the p values of the difference between the two matrices
adjusted <- p.adjust(test$p[upper.tri(test$p)])
both <- test$p
both[upper.tri(both)] <- adjusted

# The lower off diagonal are the raw p-values, 
# the upper off diagonal are the adjusted p-values
round(both,digits=2)

##        A   Ca   Ci gsw
## A    NaN 1.00 0.18   1
## Ca  0.97  NaN 1.00   1
## Ci  0.03 0.27  NaN   1
## gsw 0.72 0.78 0.28 NaN

After adjustment, the p-value comparing the correlations for Ci vs A is now 0.18. The code above can be re-run to compare Salt = “control” to Salt = “low”, or Salt = “low” to Salt = “high”. Judging from the plots, it seems unlikely any important differences between correlations will be detected.

References

R Core Team (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
Revelle, W. (2024). psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University, Evanston, Illinois. R package version 2.4.12, https://CRAN.R-project.org/package=psych.
Wickham, H. (2016 ) ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.