Let’s generate data that would require a mixed-effect model. Don’t worry too much about the R code here. What’s important to note is the structure of the data. We have multiple observations per mouse for each region. You’ll also notice that when we generate “angle”, we set mutant mice to have have angles about 20 units higher on average.
# ensure all random data is reproducible
set.seed(7)
# number of mice
n <- 10
# id number of mouse
id <- 1:n
# region
region <- paste0("R", 1:12)
# generate all combinations of id and region
d <- expand.grid(id = id, region = region)
d$id <- factor(d$id)
d$region <- factor(d$region)
# number of samples per id per region
n_i <- sample(1:12, size = n * 12, replace = TRUE)
# expand data frame according to n_i
library(tidyr)
d <- uncount(d, n_i)
# add trt variable. Mice 1-5 are control, mice 6 - 10 are mutant
d$trt <- ifelse(d$id %in% 1:5, "control", "mutant")
# generate angle
e <- rnorm(nrow(d), mean = 0, sd = 5) # within mouse random variation
z <- rnorm(n, mean = 0, sd = 10) # between mouse random variation
d$angle <- (30 + z[d$id]) + (d$trt == "mutant")*20 +
(d$region == "R2")*45 +
(d$region == "R3")*60 +
(d$region == "R4")*1 +
(d$region == "R5")*45 +
(d$region == "R6")*90 +
(d$region == "R7")*120 +
(d$region == "R8")*45 +
(d$region == "R9")*120 +
(d$region == "R10")*45 +
(d$region == "R11")*90 +
(d$region == "R12")*60 +
e
# order data by id and reset row numbers
d <- d[order(d$id),]
rownames(d) <- NULL
# first 12 rows of data
head(d, n = 12)
## id region trt angle
## 1 1 R1 control 12.14341
## 2 1 R1 control 13.80058
## 3 1 R1 control 16.27658
## 4 1 R1 control 28.70206
## 5 1 R1 control 18.30152
## 6 1 R1 control 16.60983
## 7 1 R1 control 16.73469
## 8 1 R1 control 14.62147
## 9 1 R1 control 12.62507
## 10 1 R1 control 15.03008
## 11 1 R2 control 61.23030
## 12 1 R2 control 57.14395
Compare control and mutant angle by region.
library(ggplot2)
ggplot(d) +
aes(x = trt, y = angle, color = trt) +
geom_jitter(height = 0, width = 0.1) +
facet_wrap(~region)
Compare regions by treatment. Notice the variability within mice as well as between mice. There are too many mice to be using colors here, but I think it helps show the variability within each mouse.
ggplot(d) +
aes(x = region, y = angle, color = id) +
geom_jitter(height = 0, width = 0.1) +
facet_wrap(~trt)
Calculate mean angles by region and trt. Notice mutant is always higher, which is how we simulated the data. In fact, eyeballing the values shows the means between the two treatment groups to differ by about 20.
tapply(d$angle, list(d$region, d$trt), mean)
## control mutant
## R1 27.14998 50.84019
## R2 75.81143 91.84096
## R3 88.17262 109.74498
## R4 32.08946 45.57595
## R5 73.49191 93.93902
## R6 117.46275 137.47093
## R7 148.15314 168.42542
## R8 77.39310 95.14166
## R9 148.64733 169.25768
## R10 76.08697 95.99950
## R11 118.14566 138.87903
## R12 89.22379 111.00149
Calculate standard deviation by region and trt. This is fairly constant, just as we simulated the data.
tapply(d$angle, list(d$region, d$trt), sd)
## control mutant
## R1 10.340450 10.91496
## R2 12.160859 13.88580
## R3 8.377962 13.54453
## R4 9.712599 11.89154
## R5 11.557692 12.15290
## R6 10.491266 10.79479
## R7 11.865344 10.30607
## R8 9.159816 11.99433
## R9 11.275020 11.91152
## R10 11.659532 16.14214
## R11 10.853121 11.77691
## R12 10.812055 11.90849
Count of observations by mouse and region.
xtabs(~ region + id, data = d)
## id
## region 1 2 3 4 5 6 7 8 9 10
## R1 10 3 12 7 2 10 6 8 8 3
## R2 8 8 6 8 11 10 12 3 4 8
## R3 2 6 6 11 5 4 11 6 10 7
## R4 2 4 6 8 11 8 12 1 11 4
## R5 9 3 5 8 9 5 6 8 9 2
## R6 12 5 8 10 8 10 10 8 11 2
## R7 7 5 6 3 5 9 7 8 12 2
## R8 2 8 6 10 8 7 9 9 8 6
## R9 12 6 4 1 11 12 7 12 7 2
## R10 1 1 1 12 11 2 9 10 1 8
## R11 9 11 9 6 4 11 8 4 12 6
## R12 8 8 9 2 3 2 2 5 11 7
Treat observations as independent. Notice the Residuals degrees of freedom is way too high.
library(car)
m0 <- lm(angle ~ trt + region, data = d)
Anova(m0)
## Anova Table (Type II tests)
##
## Response: angle
## Sum Sq Df F value Pr(>F)
## trt 79879 1 596.71 < 2.2e-16 ***
## region 1174406 11 797.54 < 2.2e-16 ***
## Residuals 109905 821
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Notice how precise the estimate of trt is with a standard error of 0.8039. That’s too good to be true.
summary(m0)
##
## Call:
## lm(formula = angle ~ trt + region, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.493 -8.740 1.538 8.859 27.456
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.2059 1.4513 20.123 <2e-16 ***
## trtmutant 19.6370 0.8039 24.428 <2e-16 ***
## regionR2 44.8942 1.9123 23.476 <2e-16 ***
## regionR3 60.0482 1.9775 30.366 <2e-16 ***
## regionR4 -0.4212 1.9846 -0.212 0.832
## regionR5 44.6657 2.0082 22.242 <2e-16 ***
## regionR6 88.4380 1.8799 47.044 <2e-16 ***
## regionR7 119.3244 2.0091 59.391 <2e-16 ***
## regionR8 47.1783 1.9428 24.284 <2e-16 ***
## regionR9 119.9675 1.9365 61.952 <2e-16 ***
## regionR10 47.0286 2.0811 22.598 <2e-16 ***
## regionR11 89.5016 1.9009 47.083 <2e-16 ***
## regionR12 61.0319 2.0711 29.469 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.57 on 821 degrees of freedom
## Multiple R-squared: 0.9203, Adjusted R-squared: 0.9191
## F-statistic: 789.7 on 12 and 821 DF, p-value: < 2.2e-16
Aggregate angle by mouse and region and then run a 2-way ANOVA. Again, don’t worry too much about the R code. Just notice the final data has 120 rows, one observation per mouse per region.
d$angle_mean <- ave(d$angle, d$id, d$region, FUN = mean)
d2 <- d[,c("id", "region", "trt", "angle_mean")]
d2 <- d2[!duplicated(d2),]
head(d2)
## id region trt angle_mean
## 1 1 R1 control 16.48453
## 11 1 R2 control 60.71537
## 19 1 R3 control 76.37301
## 21 1 R4 control 19.01390
## 23 1 R5 control 61.38754
## 32 1 R6 control 107.71647
nrow(d2)
## [1] 120
Now do the 2-way ANOVA with the aggregated data. The degrees of freedom looks good.
m1 <- lm(angle_mean ~ trt + region, data = d2)
Anova(m1)
## Anova Table (Type II tests)
##
## Response: angle_mean
## Sum Sq Df F value Pr(>F)
## trt 12955 1 103.54 < 2.2e-16 ***
## region 167492 11 121.70 < 2.2e-16 ***
## Residuals 13388 107
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The estimate for trt is good and has a more realistic standard error, but the standard errors for all region coefficients are the same. That’s not realistic.
summary(m1)
##
## Call:
## lm(formula = angle_mean ~ trt + region, data = d2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23.794 -8.883 2.343 8.129 16.455
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.293 3.682 8.228 4.96e-13 ***
## trtmutant 20.781 2.042 10.176 < 2e-16 ***
## regionR2 44.149 5.002 8.826 2.29e-14 ***
## regionR3 58.998 5.002 11.794 < 2e-16 ***
## regionR4 -1.213 5.002 -0.242 0.809
## regionR5 44.643 5.002 8.924 1.38e-14 ***
## regionR6 88.881 5.002 17.768 < 2e-16 ***
## regionR7 118.307 5.002 23.650 < 2e-16 ***
## regionR8 45.637 5.002 9.123 4.91e-15 ***
## regionR9 118.707 5.002 23.730 < 2e-16 ***
## regionR10 45.806 5.002 9.157 4.12e-15 ***
## regionR11 88.774 5.002 17.747 < 2e-16 ***
## regionR12 57.578 5.002 11.510 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.19 on 107 degrees of freedom
## Multiple R-squared: 0.9309, Adjusted R-squared: 0.9232
## F-statistic: 120.2 on 12 and 107 DF, p-value: < 2.2e-16
Use all the data and specify a random intercept for each
mouse. That’s what the (1|id)
syntax does. We’re
essentially fitting a separate model to each mouse. Notice the
test of trt effect is much more modest now, but still “significant”.
library(lmerTest)
m2 <- lmer(angle ~ trt + region + (1|id), data = d)
Anova(m2, test.statistic = "F")
## Analysis of Deviance Table (Type II Wald F tests with Kenward-Roger df)
##
## Response: angle
## F Df Df.res Pr(>F)
## trt 8.1422 1 8.00 0.02137 *
## region 4291.0832 11 813.06 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Notice the coefficient for trt is about 21 (close to the true value) and has a larger standard error, about 8. Also notice all the region coefficients have different standard errors instead of unrealistically estimating them to be the same. In the section titled “Random effects”, we see two estimates of variability:
id
: between mice, estimated to be about 11.7Residual
: within mice, estimated to be about 5Compare these values to what we used to simulate the variation above.
summary(m2, corr = F)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: angle ~ trt + region + (1 | id)
## Data: d
##
## REML criterion at convergence: 5052.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.03149 -0.67567 -0.01745 0.66591 2.86289
##
## Random effects:
## Groups Name Variance Std.Dev.
## id (Intercept) 137.19 11.713
## Residual 24.25 4.924
## Number of obs: 834, groups: id, 10
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 30.1668 5.2750 8.1913 5.719 0.000408 ***
## trtmutant 21.1611 7.4160 7.9996 2.853 0.021367 *
## regionR2 44.4334 0.8224 813.0788 54.027 < 2e-16 ***
## regionR3 59.5754 0.8501 813.0756 70.084 < 2e-16 ***
## regionR4 -0.4560 0.8552 813.0868 -0.533 0.593985
## regionR5 44.3914 0.8593 813.0386 51.661 < 2e-16 ***
## regionR6 89.3182 0.8026 813.0236 111.281 < 2e-16 ***
## regionR7 118.6060 0.8579 813.0244 138.250 < 2e-16 ***
## regionR8 45.2716 0.8344 813.0693 54.257 < 2e-16 ***
## regionR9 118.5484 0.8315 813.0640 142.566 < 2e-16 ***
## regionR10 44.8701 0.9073 813.1777 49.453 < 2e-16 ***
## regionR11 88.8288 0.8134 813.0434 109.209 < 2e-16 ***
## regionR12 57.7760 0.8889 813.0651 65.000 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Above I said we fit a separate model to each mouse. We can see those
models by calling coef()
on the model object. Notice that
each mouse gets their own intercept but all other coefficients
are the same. Hence the reason we call this a random-intercept
model. This is the correct model, because that is precisely how I
simulated the data above! You’ll notice the between mouse variance was
added to the intercept to create the “random” intercepts.
coef(m2)
## $id
## (Intercept) trtmutant regionR2 regionR3 regionR4 regionR5 regionR6
## 1 17.07787 21.16109 44.43342 59.57542 -0.4560428 44.39145 89.31816
## 2 35.49528 21.16109 44.43342 59.57542 -0.4560428 44.39145 89.31816
## 3 36.06902 21.16109 44.43342 59.57542 -0.4560428 44.39145 89.31816
## 4 21.06847 21.16109 44.43342 59.57542 -0.4560428 44.39145 89.31816
## 5 41.12350 21.16109 44.43342 59.57542 -0.4560428 44.39145 89.31816
## 6 28.96830 21.16109 44.43342 59.57542 -0.4560428 44.39145 89.31816
## 7 10.41599 21.16109 44.43342 59.57542 -0.4560428 44.39145 89.31816
## 8 38.14600 21.16109 44.43342 59.57542 -0.4560428 44.39145 89.31816
## 9 28.95802 21.16109 44.43342 59.57542 -0.4560428 44.39145 89.31816
## 10 44.34583 21.16109 44.43342 59.57542 -0.4560428 44.39145 89.31816
## regionR7 regionR8 regionR9 regionR10 regionR11 regionR12
## 1 118.606 45.27161 118.5484 44.87011 88.82884 57.77602
## 2 118.606 45.27161 118.5484 44.87011 88.82884 57.77602
## 3 118.606 45.27161 118.5484 44.87011 88.82884 57.77602
## 4 118.606 45.27161 118.5484 44.87011 88.82884 57.77602
## 5 118.606 45.27161 118.5484 44.87011 88.82884 57.77602
## 6 118.606 45.27161 118.5484 44.87011 88.82884 57.77602
## 7 118.606 45.27161 118.5484 44.87011 88.82884 57.77602
## 8 118.606 45.27161 118.5484 44.87011 88.82884 57.77602
## 9 118.606 45.27161 118.5484 44.87011 88.82884 57.77602
## 10 118.606 45.27161 118.5484 44.87011 88.82884 57.77602
##
## attr(,"class")
## [1] "coef.mer"
Now we can create post-hoc plots. The difference between control and mutant is the same in each region (about 21) because that’s how we specified our model.
library(ggeffects)
plot(ggpredict(m2, terms = c("region", "trt")))
If we thought the effect of trt varied by region then we
could fit an interaction. We do so below by adding
trt:region
to the model. We didn’t simulate this data with
interactions, so the interaction is not important.
m3 <- lmer(angle ~ trt + region + trt:region + (1|id), data = d)
Anova(m3, test.statistic = "F")
## Analysis of Deviance Table (Type II Wald F tests with Kenward-Roger df)
##
## Response: angle
## F Df Df.res Pr(>F)
## trt 8.1097 1 8.00 0.02156 *
## region 4305.1717 11 802.06 < 2e-16 ***
## trt:region 1.2398 11 802.07 0.25602
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The model summary is much more verbose because of the interaction:
summary(m3, corr = F)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: angle ~ trt + region + trt:region + (1 | id)
## Data: d
##
## REML criterion at convergence: 5012.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.72609 -0.68147 -0.01033 0.65661 3.01090
##
## Random effects:
## Groups Name Variance Std.Dev.
## id (Intercept) 137.74 11.736
## Residual 24.17 4.916
## Number of obs: 834, groups: id, 10
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 29.65144 5.31792 8.39301 5.576 0.000443 ***
## trtmutant 22.17945 7.51851 8.38333 2.950 0.017506 *
## regionR2 45.65578 1.15525 802.09678 39.520 < 2e-16 ***
## regionR3 58.65510 1.24798 802.10177 47.000 < 2e-16 ***
## regionR4 -0.07122 1.24264 802.12931 -0.057 0.954311
## regionR5 45.21531 1.20375 802.07088 37.562 < 2e-16 ***
## regionR6 89.82184 1.13464 802.04307 79.163 < 2e-16 ***
## regionR7 118.57880 1.28713 802.03731 92.127 < 2e-16 ***
## regionR8 46.32020 1.21233 802.12584 38.208 < 2e-16 ***
## regionR9 118.71904 1.21354 802.12417 97.829 < 2e-16 ***
## regionR10 46.10092 1.32647 802.25756 34.755 < 2e-16 ***
## regionR11 88.91500 1.16250 802.06304 76.486 < 2e-16 ***
## regionR12 59.36710 1.23909 802.04973 47.912 < 2e-16 ***
## trtmutant:regionR2 -2.49499 1.64384 802.08216 -1.518 0.129465
## trtmutant:regionR3 1.55056 1.70375 802.07899 0.910 0.363049
## trtmutant:regionR4 -0.73334 1.71108 802.09364 -0.429 0.668342
## trtmutant:regionR5 -1.64453 1.71782 802.04148 -0.957 0.338687
## trtmutant:regionR6 -0.96385 1.60324 802.02589 -0.601 0.547885
## trtmutant:regionR7 -0.07428 1.72807 802.02432 -0.043 0.965725
## trtmutant:regionR8 -2.00481 1.66918 802.07557 -1.201 0.230078
## trtmutant:regionR9 -0.33940 1.66439 802.07029 -0.204 0.838471
## trtmutant:regionR10 -2.35730 1.81689 802.19004 -1.297 0.194853
## trtmutant:regionR11 -0.17059 1.62464 802.04439 -0.105 0.916401
## trtmutant:regionR12 -3.31690 1.77694 802.06898 -1.867 0.062318 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The post-hoc plot is barely distinguishable from the first plot.
plot(ggpredict(m3, terms = c("region", "trt")))