ANCOVA stands for Analysis of Covariance. It’s simply linear regression with one numeric predictor and one categorical predictor. It can help us assess how the linear association between the numeric variable and our dependent variable changes between the groups.
Read and prepare Prism data. I believe this data corresponds to Fig 2D.
library(pzfx)
library(tidyverse)
H2SO4_d1 <- read_pzfx("21.09.21 Six Month H2SO4 Study.pzfx")
# reshape data to "long format"
d <- H2SO4_d1 %>%
pivot_longer(cols = -ROWTITLE, names_to = "g", values_to = "hue") %>%
rename(month = ROWTITLE) %>%
mutate(g = factor(str_remove(g, "_[0-9]$")),
month = as.numeric(str_remove(month, "Month ")))
head(d)
## # A tibble: 6 x 3
## month g hue
## <dbl> <fct> <dbl>
## 1 1 (-) BPB H2O 59.2
## 2 1 (-) BPB H2O 67.0
## 3 1 (-) BPB H2O 44.0
## 4 1 (-) BPB H2O 52.4
## 5 1 (-) BPB H2O 50.3
## 6 1 (-) BPB H2O 61.5
Raw data.
ggplot(d) +
aes(x = month, y = hue, color = g) +
geom_point(position = position_dodge(width = 0.4)) +
scale_x_continuous(breaks = 1:6)
Raw data with smooth trend lines.
ggplot(d) +
aes(x = month, y = hue, color = g) +
geom_point(position = position_dodge(width = 0.4)) +
geom_smooth() +
scale_x_continuous(breaks = 1:6)
Means over time +/- 1 SD.
ggplot(d) +
aes(x = month, y = hue, color = g, group = g) +
stat_summary(fun.data = "mean_sdl",
position = position_dodge(width = 0.4)) +
scale_x_continuous(breaks = 1:6)
Often want to show +/- 1 standard error (SE) (ie, \(\frac{SD}{\sqrt n}\)).
ggplot(d) +
aes(x = month, y = hue, color = g, group = g) +
stat_summary(fun.data = "mean_se",
position = position_dodge(width = 0.4)) +
scale_x_continuous(breaks = 1:6)
We use the lm()
function to fit a “linear model”. Since month is numeric and g is categorical, this is an ANCOVA. The formula hue ~ g * month
means model hue as a function of g, month, and their interaction.
m <- lm(hue ~ g * month, data = d)
anova(m)
## Analysis of Variance Table
##
## Response: hue
## Df Sum Sq Mean Sq F value Pr(>F)
## g 3 582704 194235 3418.9677 < 2.2e-16 ***
## month 1 39 39 0.6857 0.4090848
## g:month 3 1117 372 6.5513 0.0003596 ***
## Residuals 136 7726 57
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The interaction appears to be significant. This says the effect of month depends on the group and vice versa. How so? Let’s follow up with some plots and comparisons.
The ggeffects package allows us to quickly visualize our ANCOVA model.
library(ggeffects)
plot(ggpredict(m, terms = c("month", "g")))
This shows the fitted trajectory of the effect of month on hue for the different groups. We can see the significant interaction appears to be due to the different slopes for (+) BPB H20
(green) and (-) BPB H20
(red).
Are these fitted slopes a “good” fit for the data? We can add data to the plot to assess.
plot(ggpredict(m, terms = c("month", "g")), add.data = TRUE)
The slope doesn’t seem to fit month 3 very well. There’s also much more variability around the fitted line for (-) BPB H20
(red). One of the assumptions of ANCOVA is that the variance around the fitted line is the same for each group.
The emtrends
function in the emmeans package can help us quickly assess this.
In the first section (“emtrends”) we get the slope for each group in the “month.trend” column as well as 95% confidence intervals for each slope. We see that (-) BPB H2O
and (+) BPB H2O
are the two groups with confidence intervals that do not contain 0. Therefore we’re confident these slopes are different from 0. (Are these estimated slopes practically significant?) The slopes for the acid groups are small and indistinguishable from 0.
In the second section (“contrasts”) we get Tukey pairwise comparisons between the slopes. The slope for (-) BPB H2O)
appears to be different from all other groups.
library(emmeans)
emtrends(m, pairwise ~ g, var = "month")
## $emtrends
## g month.trend SE df lower.CL upper.CL
## (-) BPB H2O -2.881 0.736 136 -4.336 -1.43
## (-) BPB H2SO4 0.188 0.736 136 -1.267 1.64
## (+) BPB H2O 1.627 0.736 136 0.172 3.08
## (+) BPB H2SO4 -0.152 0.736 136 -1.606 1.30
##
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## ((-) BPB H2O) - ((-) BPB H2SO4) -3.069 1.04 136 -2.950 0.0193
## ((-) BPB H2O) - ((+) BPB H2O) -4.508 1.04 136 -4.333 0.0002
## ((-) BPB H2O) - ((+) BPB H2SO4) -2.730 1.04 136 -2.624 0.0471
## ((-) BPB H2SO4) - ((+) BPB H2O) -1.439 1.04 136 -1.383 0.5122
## ((-) BPB H2SO4) - ((+) BPB H2SO4) 0.339 1.04 136 0.326 0.9879
## ((+) BPB H2O) - ((+) BPB H2SO4) 1.778 1.04 136 1.709 0.3229
##
## P value adjustment: tukey method for comparing a family of 4 estimates
We can plot each of these results if we like.
em <- emtrends(m, pairwise ~ g, var = "month")
plot(em$emtrends)
plot(em$contrasts)