Simulate Data

Let’s simulate data. Below we simulate 30 id numbers (representing subjects), and then we repeat the subject ids 80 times each. Then we randomly assign subjects to a trust category.

set.seed(1)
N <- 30 # 30 subjects
id <- gl(n = N, k = 80)
trust <- factor(sample(1:5, size = 30, replace = TRUE))
trust <- trust[id]

Check assignment. Eight subjects have trust = 1, eight subjects have trust = 2, etc.

table(trust)/80
## trust
## 1 2 3 4 5 
## 8 8 4 3 7

Now generate dependent variable, y. Notice there are two sources of variation:

  1. subject noise/variance (between subjects)
  2. observation noise/variance (within subjects)

There is nothing special about the numbers I’m using. I just made them up.

set.seed(2)
subj_noise <- rnorm(30, mean = 0, sd = 0.3)
obs_noise <- rnorm(30*80, mean = 0, sd = 0.2)
y <- (0.5 + subj_noise[id]) + 
  0.4*(trust == "2") +
  0.8*(trust == "3") +
  1.2*(trust == "4") + 
  1.6*(trust == "5") + 
  obs_noise

Put data into data frame:

d <- data.frame(id, y, trust)
head(d)
##   id          y trust
## 1  1 0.37871336     1
## 2  1 0.29471772     1
## 3  1 0.44615851     1
## 4  1 0.17409409     1
## 5  1 0.07559058     1
## 6  1 0.11179354     1

Analysis

Now let’s run a mixed-effect ANOVA model. Notice we are fitting the correct model. This is the exact model we used to simulate the data. The syntax y ~ trust + (1|id) says “model y as a function of trust and let the intercept be conditional on subject id.” In other words, we fit a model to each subject where the effect of trust is the same, but each subject gets their own intercept. This allows the random variation due to each subject to get accounted for in the model.

library(lme4)
library(car) # for Anova()
m <- lmer(y ~ trust + (1|id), data = d)
Anova(m)
## Analysis of Deviance Table (Type II Wald chisquare tests)
## 
## Response: y
##        Chisq Df Pr(>Chisq)    
## trust 78.694  4  3.293e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVA test tells us we have evidence that trust explains some of the variability in the outcome, y.

A quick diagnostic plot to check the constant variance assumption. You can see each subject as a vertical line of points. We hope to see a constant spread around 0. This looks good.

# residual versus fitted value plot
plot(m)

We can also create a QQ plot if we like, though it’s less important than the previous plot. This looks perfect because we simulated the data. Notice we have to use the qqmath() function from the {lattice} package to create this plot.

lattice::qqmath(m)

Post-hoc comparisons

The ANOVA test was significant. Where are the differences in means? The {emmeans} package can help us investigate.

library(emmeans)
em_out <- emmeans(m, pairwise ~ trust)
em_out$contrasts
##  contrast        estimate    SE df t.ratio p.value
##  trust1 - trust2   -0.292 0.191 25  -1.530  0.5536
##  trust1 - trust3   -0.776 0.233 25  -3.326  0.0209
##  trust1 - trust4   -1.153 0.258 25  -4.466  0.0013
##  trust1 - trust5   -1.592 0.197 25  -8.071  <.0001
##  trust2 - trust3   -0.485 0.233 25  -2.076  0.2612
##  trust2 - trust4   -0.861 0.258 25  -3.336  0.0204
##  trust2 - trust5   -1.301 0.197 25  -6.593  <.0001
##  trust3 - trust4   -0.376 0.291 25  -1.293  0.6980
##  trust3 - trust5   -0.816 0.239 25  -3.415  0.0169
##  trust4 - trust5   -0.440 0.263 25  -1.671  0.4687
## 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: tukey method for comparing a family of 5 estimates

We can see several “significant” differences. Plotting makes it a little easier to see which are different. Any interval overlapping 0 is uncertain.

plot(em_out$contrasts) +
  ggplot2::geom_vline(xintercept = 0)