Below I simulate data with 2-way and 3-way interactions and a random
intercept. The total sample size is 200 with 50 observations per group
(univ
).
# simulate predictors
set.seed(1)
n <- 200 # total sample size
g <- 4 # number of groups
univ <- gl(n = g, k = n/4)
x1 <- factor(sample(0:1, size = n, replace = TRUE))
x2 <- factor(sample(letters[1:g], size = n, replace = TRUE))
p <- round(runif(n), 2)
# simulate noise
e <- rnorm(n, mean = 0, sd = 0.1) # within groups (Residual)
z <- rnorm(g, mean = 0, sd = 0.1) # between groups (Intercept)
# simulate outcome
y <- (0.2 + z[univ]) + 0.5 * p +
0.4 * (x1 == "1") +
-0.2 * (x2 == "b") + 0.2 * (x2 == "c") + 0.4 * (x2 == "d") +
# 2 way interactions
-0.3 * p * (x1 == "1") +
0.3 * p * (x2 == "b") + 0.5 * p * (x2 == "c") + 0.1 * p * (x2 == "d") +
-0.2 * (x1 == "1") * (x2 == "b") +
0.2 * (x1 == "1") * (x2 == "c") +
0.4 * (x1 == "1") * (x2 == "d") +
# 3 way interactions
0.7 * p * (x1 == "1") * (x2 == "b") +
0.5 * p * (x1 == "1") * (x2 == "c") +
-0.5 * p * (x1 == "1") * (x2 == "d") + e
# combine into data frame
d <- data.frame(y, x1, x2, p, univ)
summary(d)
## y x1 x2 p univ
## Min. :-0.09062 0:102 a:58 Min. :0.0000 1:50
## 1st Qu.: 0.50811 1: 98 b:48 1st Qu.:0.2900 2:50
## Median : 0.78409 c:41 Median :0.5150 3:50
## Mean : 0.83211 d:53 Mean :0.5194 4:50
## 3rd Qu.: 1.14042 3rd Qu.:0.7600
## Max. : 2.26147 Max. :1.0000
This is not quite like your data but allows me to demonstrate the {ggeffects} and {emmeans} packages.
This is the correct model to fit since this is the model we used to simulate the data.
library(rstanarm)
m <- stan_lmer(y ~ p * x1 * x2 + (1|univ), data = d, refresh = 0)
The model coefficients are pretty similar to the “true” values we used to simulate the data.
m
## stan_lmer
## family: gaussian [identity]
## formula: y ~ p * x1 * x2 + (1 | univ)
## observations: 200
## ------
## Median MAD_SD
## (Intercept) 0.2 0.1
## p 0.5 0.1
## x11 0.3 0.1
## x2b -0.2 0.1
## x2c 0.1 0.1
## x2d 0.3 0.1
## p:x11 -0.1 0.1
## p:x2b 0.3 0.1
## p:x2c 0.6 0.1
## p:x2d 0.2 0.1
## x11:x2b -0.2 0.1
## x11:x2c 0.3 0.1
## x11:x2d 0.5 0.1
## p:x11:x2b 0.7 0.1
## p:x11:x2c 0.3 0.2
## p:x11:x2d -0.7 0.1
##
## Auxiliary parameter(s):
## Median MAD_SD
## sigma 0.1 0.0
##
## Error terms:
## Groups Name Std.Dev.
## univ (Intercept) 0.10
## Residual 0.11
## Num. levels: univ 4
##
## ------
## * For help interpreting the printed output see ?print.stanreg
## * For info on the priors used see ?prior_summary.stanreg
The p
coefficient is about 0.5 with MAD_SD of 0.1. This
tells us the main effect of p
is reliably positive. However
notice that several interactions involving p
appear to be
“significant” in the sense that their coefficients are much
larger than their MAD_SD. Therefore it’s hard to draw any
conclusions about the main effect of p
.
Effect plots and estimated marginal means allow us to dig deeper.
Effect plots visualize the model. The {ggeffects} package makes this fairly easy to do.
library(ggeffects)
ggpredict(m, terms = c("p", "x1", "x2"), interval = "prediction") |> plot()
This plot helps us understand the nature of the interactions. For
example, in lower right plot, when x2
= “d” and
x1
= 0, the effect of p
is positive
(red line). But when x2
= d and x1
= 1, the
effect of p
is slightly negative (blue line). This
demonstrates why we should be careful about interpreting the main effect
of p
in the presence of interactions.
The plot above shows the slope (ie, effect) of p
differing depending on x1
and x2
, especially
in the lower right plot. But how can we quantify that difference? We can
use the emtrends()
function from the {emmeans} package to
help answer this.
library(emmeans)
emtrends(m, ~ x1 | x2, var = "p")
## x2 = a:
## x1 p.trend lower.HPD upper.HPD
## 0 0.453 0.326 0.5782
## 1 0.346 0.197 0.4747
##
## x2 = b:
## x1 p.trend lower.HPD upper.HPD
## 0 0.707 0.547 0.8812
## 1 1.284 1.137 1.4352
##
## x2 = c:
## x1 p.trend lower.HPD upper.HPD
## 0 1.025 0.846 1.1954
## 1 1.195 1.021 1.3725
##
## x2 = d:
## x1 p.trend lower.HPD upper.HPD
## 0 0.654 0.515 0.8059
## 1 -0.179 -0.318 -0.0324
##
## Point estimate displayed: median
## HPD interval probability: 0.95
When x2
= “d” and x1
= 0, the slope of p is
0.654 [0.846, 1.195], but when x2
= “d” and x1
= 1, the slope of p is -0.179 [-0.318, -0.0324].
If we like, we can quantify the difference in slopes by piping the
result into the emmeans::pairs()
function:
emtrends(m, ~ x1 | x2, var = "p") |> pairs()
## x2 = a:
## contrast estimate lower.HPD upper.HPD
## x10 - x11 0.109 -0.0868 0.2836
##
## x2 = b:
## contrast estimate lower.HPD upper.HPD
## x10 - x11 -0.578 -0.7985 -0.3517
##
## x2 = c:
## contrast estimate lower.HPD upper.HPD
## x10 - x11 -0.171 -0.4324 0.0842
##
## x2 = d:
## contrast estimate lower.HPD upper.HPD
## x10 - x11 0.834 0.6371 1.0477
##
## Point estimate displayed: median
## HPD interval probability: 0.95
This tells us the difference in slopes (ie, difference in the
p
coefficient) is about 0.834 [0.637, 1.047].
All of this can be done in a Frequentist framework as well.
sess <- sessionInfo()
sess$R.version$version.string
## [1] "R version 4.4.1 (2024-06-14 ucrt)"
sess$running
## [1] "Windows 10 x64 (build 19045)"
packageVersion("ggeffects")
## [1] '1.7.0'
packageVersion("emmeans")
## [1] '1.10.3'
packageVersion("rstanarm")
## [1] '2.32.1'