5 Mixed-effect models

5.1 Random intercept model: continuous slope

In this model the slope coefficient represents a trend.

For power estimation based on sample size, we need the following:

Between group standard deviation (Intercept random effect)
Within group standard deviation (residual standard deviation)
Sample size (n) in each group (N)
Effect size (slope coefficient)
Intercept coefficient
Significance level of test
Direction of test: one or two-sided

5.1.1 Example

We are planning an experiment where we track the BMI of college students over their four years at school. We’ll measure their BMI at the beginning of each school year in August. We would like sufficient power to detect a year trend of at least 0.25 if we recruit 15 students to the study. The null hypothesis we’ll test is that the year (slope) coefficient is 0. Rejecting this hypothesis provides evidence of a change in BMI over time. Assume the following:

Between group standard deviation of 3
Within group standard deviation 1
Sample size in each group: 4 obs each of 15 students
Effect size of 0.25 (slope coefficient)
Intercept coefficient: 20
Significance level of test: 0.01
Direction of test: two-sided

We can use simulation to estimate the power of a hypothesis test on a specific coefficient in a model. This example uses the simr package (Green and MacLeod 2016).

First we create a data frame of 4 observations each of 15 students.

library(simr)
year <- 1:4
student <- 1:15
d <- expand.grid(year = year, student = student)
head(d, n = 8)

  year student
1    1       1
2    2       1
3    3       1
4    4       1
5    1       2
6    2       2
7    3       2
8    4       2

Now define our between and within group variance. Note that the simr package requires the between group variance be expressed as variance and the within group variance be expressed as standard deviation.

between <- 3^2 # random intercept variance
within <- 1 # residual standard deviation

Next define our effects. We need to provide intercept and slope coefficients. The slope coefficient is the effect we will test. The intercept can be interpreted as the average BMI for the students’ first year of college.

b <- c(20, 0.25) # fixed intercept and slope

Now define our model using the makeLmer() function. This function will simulate BMI values based on our hypothesized coefficients and variances.

m <- makeLmer(bmi ~ year + (1|student), 
              fixef = b, 
              VarCorr = between, 
              sigma = within, 
              data= d)
print(m)

Linear mixed model fit by REML ['lmerMod']
Formula: bmi ~ year + (1 | student)
   Data: d
REML criterion at convergence: 206.2153
Random effects:
 Groups   Name        Std.Dev.
 student  (Intercept) 3       
 Residual             1       
Number of obs: 60, groups:  student, 15
Fixed Effects:
(Intercept)         year  
      20.00         0.25

Finally use the powerSim() function to estimate the power of the test on the slope using 100 simulations. Setting progress = FALSE suppresses a progress bar which we don’t want to output in this online book. The seed argument makes the following result reproducible. The fixed() function dictates what coefficient we want to test. In this case we want to test the “year” coefficient using a “t” test.

# power of test on slope
powerSim(m, 
         test = fixed("year", "t"),
         seed = 1,
         nsim = 100, 
         alpha = 0.01, 
         progress = FALSE)

Power for predictor 'year', (95% confidence interval):
      37.00% (27.56, 47.24)

Test: t-test with Satterthwaite degrees of freedom (package lmerTest)
      Effect size for year is 0.25

Based on 100 simulations, (0 warnings, 0 errors)
alpha = 0.01, nrow = 60

Time elapsed: 0 h 0 m 6 s

Our power is estimated to be about 0.37. The confidence interval is based on a simple binomial test.

round(binom.test(x = 37, n = 100)$conf.int * 100, 2)

[1] 27.56 47.24
attr(,"conf.level")
[1] 0.95

This power is low. Let’s try increasing the sample size. We can do that with the extend() function. Below we increase the sample size to 30 and re-run the simulation.

m2 <- extend(m, along = "student", n = 30)
powerSim(m2, 
         test = fixed("year", "t"),
         seed = 2,
         nsim = 100, 
         alpha = 0.01, 
         progress = FALSE)

Power for predictor 'year', (95% confidence interval):
      64.00% (53.79, 73.36)

Test: t-test with Satterthwaite degrees of freedom (package lmerTest)
      Effect size for year is 0.25

Based on 100 simulations, (0 warnings, 0 errors)
alpha = 0.01, nrow = 120

Time elapsed: 0 h 0 m 6 s

Power is higher but still too low at 0.64.

We can generate a power curve with varying sample sizes using the powerCurve() function. First extend the data set by the maximum number of students we’re willing to entertain and then call the powerCurve() function. Below we specify a maximum of 40 students. Note this can take a while to run for large data sets and complex models. We have lowered the default number of simulations from 1000 to 100. The breaks argument says we want to try sample sizes ranging from 30 to 40 in steps of 2.

m3 <- extend(m, along = "student", n = 40)
pc <- powerCurve(m3, 
                 test = fixed("year", "t"), 
                 seed = 5,
                 along = "student",
                 breaks = seq(30,40,2),
                 nsim = 100, 
                 alpha = 0.01,
                 progress = FALSE)
plot(pc)

It appears we need about 38 students to achieve power of at least 0.80.

5.2 Random intercept and random slope model: continuous slope

In this model the slope coefficient represents a trend.

For power estimation based on sample size, we need the following:

Intercept random effect
Slope random effect
Intercept and slope random effects covariance
Within group standard deviation (residual standard deviation)
Sample size (n) in each group (N)
Effect size (slope coefficient)
Intercept coefficient
Significance level of test
Direction of test: one or two-sided

5.2.1 Example

Intercept random effect of 3
Slope random effect of 0.2
Intercept and slope random effects covariance of 0.5
Within group standard deviation 1
Sample size in each group: 4 obs each of 15 students
Effect size of 0.25 (slope coefficient)
Intercept coefficient: 20
Significance level of test: 0.01
Direction of test: two-sided

We can use simulation to estimate the power of a hypothesis test on a specific coefficient in a model. This example uses the simr package (Green and MacLeod 2016).

First we create a data frame of 4 observations each of 15 students.

year <- 1:4
student <- 1:15
d <- expand.grid(year = year, student = student)
head(d, n = 8)

  year student
1    1       1
2    2       1
3    3       1
4    4       1
5    1       2
6    2       2
7    3       2
8    4       2

Now define our intercept and slope random effects. Note that the simr package requires this be entered as a matrix. We also need to enter the random effects as variances, so we square each entry.

re <- matrix(c(3^2, 0.5^2, 0.5^2, 0.2^2), 2) 
rownames(re) <- colnames(re) <- c("intercept", "slope")
re

          intercept slope
intercept      9.00  0.25
slope          0.25  0.04

Then we define the residual standard deviation.

resid_sd <- 1 # residual standard deviation

b <- c(20, 0.25) # fixed intercept and slope

Now define our model using the makeLmer() function. This function will simulate BMI values based on our hypothesized coefficients and variances.

m <- makeLmer(bmi ~ year + (year|student), 
              fixef = b, 
              VarCorr = re, 
              sigma = resid_sd, 
              data= d)
print(m)

Linear mixed model fit by REML ['lmerMod']
Formula: bmi ~ year + (year | student)
   Data: d
REML criterion at convergence: 232.0793
Random effects:
 Groups   Name        Std.Dev. Corr
 student  (Intercept) 3.0          
          year        0.2      0.42
 Residual             1.0          
Number of obs: 60, groups:  student, 15
Fixed Effects:
(Intercept)         year  
      20.00         0.25

# power of test on slope
powerSim(m, 
         test = fixed("year", "t"),
         seed = 1,
         nsim = 100, 
         alpha = 0.01, 
         progress = FALSE)

Power for predictor 'year', (95% confidence interval):
      19.00% (11.84, 28.07)

Test: t-test with Satterthwaite degrees of freedom (package lmerTest)
      Effect size for year is 0.25

Based on 100 simulations, (1 warning, 0 errors)
alpha = 0.01, nrow = 60

Time elapsed: 0 h 0 m 6 s

Our power is estimated to be about 0.19. The confidence interval is based on a simple binomial test.

round(binom.test(x = 19, n = 100)$conf.int * 100, 2)

[1] 11.84 28.07
attr(,"conf.level")
[1] 0.95

This power is low. Let’s try increasing the sample size. We can do that with the extend() function. Below we increase the sample size to 30 and re-run the simulation.

m2 <- extend(m, along = "student", n = 30)
powerSim(m2, 
         test = fixed("year", "t"),
         seed = 2,
         nsim = 100, 
         alpha = 0.01, 
         progress = FALSE)

Power for predictor 'year', (95% confidence interval):
      55.00% (44.73, 64.97)

Test: t-test with Satterthwaite degrees of freedom (package lmerTest)
      Effect size for year is 0.25

Based on 100 simulations, (1 warning, 0 errors)
alpha = 0.01, nrow = 120

Time elapsed: 0 h 0 m 7 s

Power is higher but still too low at 0.55.

We can generate a power curve with varying sample sizes using the powerCurve() function. First extend the data set by the maximum number of students we’re willing to entertain and then call the powerCurve() function. Below we specify a maximum of 50 students. Note this can take a while to run for large data sets and complex models. We have lowered the default number of simulations from 1000 to 100. The breaks argument says we want to try sample sizes ranging from 40 to 50 in steps of 2.

m3 <- extend(m, along = "student", n = 50)
pc <- powerCurve(m3, 
                 test = fixed("year", "t"), 
                 seed = 5,
                 along = "student",
                 breaks = seq(40,50,2),
                 nsim = 100, 
                 alpha = 0.01,
                 progress = FALSE)
plot(pc)

It appears we need about 44 students to achieve power of at least 0.80.

5.3 Random intercept model: binary slope

In this model the slope coefficient represents a population treatment effect. In other words, we’re comparing means between two groups.

For power estimation based on sample size, we need the following:

Between group standard deviation (Intercept random effect)
Within group standard deviation (residual standard deviation)
Sample size (n) in each group (N)
Effect size (slope coefficient)
Intercept coefficient
Significance level of test
Direction of test: one or two-sided

5.3.1 Example: closed-form expression

We are planning a multisite experiment where we will compare two types of bug traps, an older model (control) and a newer model (treatment). At each site we will place 5 traps of each type (10 total) and return in 30 days to weigh the amount of invasive bugs captured. We think it would be meaningful if our new trap captured 3 grams of additional bugs on average. We assume there is a between site standard deviation of 2 and a within site standard deviation of 6. How powerful is our experiment if we select 10 sites to run this experiment? Assume a two-sided test and a significance level of 0.05.

Crespi (2025) describes how to estimate power for a design such as this using a noncentral t distribution. The multisite.cont() function in the powertools package (Crespi and Liu 2025) implements this formula.

Note the multisite.cont() function requires the total standard deviation of the outcome variable and the intraclass correlation coefficient (ICC) of the random intercept. We can derive both of these from the information given above.

The total standard deviation of the outcome is the square root of the sum of the between and within variances:

\[\sqrt{2^2 + 6^2} = \sqrt{40}\]

The ICC of the random intercept is the between group variance divided by total variance:

\[\text{ICC} = 4/40 = 0.1 \]

The ICC for the random intercept is provided to the function using the icc0 argument. The total standard deviation of the outcome is provided using the sd argument.

library(powertools)
multisite.cont(m = 10,          # number of traps (5 in each condition)
               J = 10,          # number of sites
               delta = 3,       # treatment effect
               sd = sqrt(40),   # total sd of outcome
               icc0 = 0.1,     # icc of random effect
               icc1 = 0,        # icc of random slope (0 in this example)
               alpha = 0.05,    # significance level of test
               power = NULL,    # set to NULL since we want power
               v = TRUE)        # verbose, return more detail


     Power for test of average treatment effect in multisite trials 

         m1, m2 = 5, 5
              J = 10
          delta = 3
             sd = 6.324555
     icc0, icc1 = 0.1, 0.0
          alpha = 0.05
          power = 0.6061384
          sides = 2

NOTE: m1, m2 are the number of subjects within site in condition 1, condition 2
      (total of m1 + m2 per site). m1, m2 OR m2, m1 produce equivalent results.

The estimated power is about 0.61.

We can try multiple sample sizes using the sapply() function. We apply the multisite.cont() function to sample sizes ranging from 10 - 20. Notice that sample size in this case is the number of sites.

power <- sapply(10:20, function(x)multisite.cont(m = 10,          
                                 J = x,
                                 delta = 3,
                                 sd = sqrt(40),
                                 icc0 = 0.1,
                                 icc1 = 0,
                                 alpha = 0.05,
                                 power = NULL))
data.frame(sample_size = 10:20, power)

   sample_size     power
1           10 0.6061384
2           11 0.6574890
3           12 0.7035244
4           13 0.7444653
5           14 0.7806200
6           15 0.8123479
7           16 0.8400344
8           17 0.8640709
9           18 0.8848413
10          19 0.9027126
11          20 0.9180288

A sample size of 15 gets us to power over 0.80, and sample size of about 19 gets us up to 0.90.

5.3.2 Example: simulation

We can also estimate power using simulation via the simr package (Green and MacLeod 2016).

Let’s restate the setup: we are planning a multisite experiment where we will compare two types of bug traps, an older model (control) and a newer model (treatment). At each site we will place 5 traps of each type (10 total) and return in 30 days to weigh the amount of invasive bugs captured. We think it would be meaningful if our new trap captured 3 grams of additional bugs on average. We assume there is a between site standard deviation of 2 and a within site standard deviation of 6. How powerful is our experiment if we select 10 sites to run this experiment? Assume a two-sided test and a significance level of 0.05.

First we create a data frame of 10 observations each at 10 sites. Then we add a sequence of five zeroes and five ones to indicate treatment. A zero means control and one means treated.

library(simr)
trap <- 1:10
site <- 1:10
d <- expand.grid(trap = trap, site = site)
d$trt <- rep(c(rep(0, 5), rep(1, 5)), 10)
head(d, n = 12)

   trap site trt
1     1    1   0
2     2    1   0
3     3    1   0
4     4    1   0
5     5    1   0
6     6    1   1
7     7    1   1
8     8    1   1
9     9    1   1
10   10    1   1
11    1    2   0
12    2    2   0

between <- 2^2 # random intercept variance
within <- 6 # residual standard deviation

Next define our effects. We need to provide intercept and slope coefficients. The slope coefficient is the effect we will test. The intercept can be interpreted as the average mass of bugs captured in the control trap.

b <- c(10, 3) # fixed intercept and slope

Now define our model using the makeLmer() function. This function will simulate mass values based on our hypothesized coefficients and variances.

m <- makeLmer(mass ~ trt + (1|site), 
              fixef = b, 
              VarCorr = between, 
              sigma = within, 
              data = d)
print(m)

Linear mixed model fit by REML ['lmerMod']
Formula: mass ~ trt + (1 | site)
   Data: d
REML criterion at convergence: 714.4632
Random effects:
 Groups   Name        Std.Dev.
 site     (Intercept) 2       
 Residual             6       
Number of obs: 100, groups:  site, 10
Fixed Effects:
(Intercept)          trt  
         10            3

# power of test on slope
powerSim(m, 
         test = fixed("trt", "t"),
         seed = 1,
         nsim = 100, 
         alpha = 0.05, 
         progress = FALSE)

Power for predictor 'trt', (95% confidence interval):
      65.00% (54.82, 74.27)

Test: t-test with Satterthwaite degrees of freedom (package lmerTest)
      Effect size for trt is 3.0

Based on 100 simulations, (0 warnings, 0 errors)
alpha = 0.05, nrow = 100

Time elapsed: 0 h 0 m 5 s

Power based on simulation is estimated to be about 0.65.

We can generate a power curve with varying sample sizes using the powerCurve() function. First extend the data set by the maximum number of sites we’re willing to entertain and then call the powerCurve() function. Below we specify a maximum of 20 sites. Note this can take a while to run for large data sets and complex models. We have lowered the default number of simulations from 1000 to 100. The breaks argument says we want to try sample sizes ranging from 10 to 20 in steps of 1.

m2 <- extend(m, along = "site", n = 20)
pc <- powerCurve(m2, 
                 test = fixed("trt", "t"), 
                 seed = 5,
                 along = "site",
                 breaks = 10:20,
                 nsim = 100, 
                 alpha = 0.05,
                 progress = FALSE)
plot(pc)

According to simulation, we need to sample 16 sites to reliably achieve power of at least 0.80.

5.4 Random intercept and random slope model: binary slope

In this model the slope coefficient represents a population treatment effect. In other words, we’re comparing means between two groups. The random slope allows the effect of the treatment to depend on the random group.

For power estimation based on sample size, we need the following:

Intercept random effect
Slope random effect
Intercept and slope random effects covariance (optional)
Within group standard deviation (residual standard deviation)
Sample size (n) in each group (N)
Effect size (slope coefficient)
Intercept coefficient
Significance level of test
Direction of test: one or two-sided

5.4.1 Example: closed-form expression

We are planning a multisite experiment where we will compare two types of bug traps, an older model (control) and a newer model (treatment). At each site we will place 5 traps of each type (10 total) and return in 30 days to weigh the amount of invasive bugs captured. We think it would be meaningful if our new trap captured 3 grams of additional bugs on average. We assume there is an Intercept random effect of 2 (between site standard deviation), a slope random effect of \(\sqrt{8}\), and a within site standard deviation of 6. How powerful is our experiment if we select 10 sites to run this experiment? Assume a two-sided test and a significance level of 0.05.

Note the multisite.cont() function requires the total standard deviation of the outcome variable and two intraclass correlation coefficients (ICC):

icc0: proportion of variance due to random groups
icc1: proportion of variance due to random slope

The total standard deviation of the outcome is calculated as follows (Crespi, page 221):

\[\sqrt{Var(Y_{ij})} = \sqrt{2^2 + \frac{1}{4}8 + 6^2} = \sqrt{42}\]

The ICC of the random intercept is the between group variance divided by total variance:

\[\text{ICC}_0 = 4/42 \approx 0.095\]

The ICC of the random slope is one-fourth the random slope variance divided by the total variance.

\[\text{ICC}_1 = \frac{1/4 * 8}{42} \approx 0.048\]

The ICC for the random intercept is provided to the function using the icc0 argument. The ICC for the random slope is provided using the icc1 argument. The total standard deviation of the outcome is provided using the sd argument.

multisite.cont(m = 10, J = 10, delta = 3, sd = sqrt(42), 
               icc0 = 0.095,
               icc1 = 0.048, 
               power = NULL, v = TRUE)


     Power for test of average treatment effect in multisite trials 

         m1, m2 = 5, 5
              J = 10
          delta = 3
             sd = 6.480741
     icc0, icc1 = 0.095, 0.048
          alpha = 0.05
          power = 0.4319213
          sides = 2

NOTE: m1, m2 are the number of subjects within site in condition 1, condition 2
      (total of m1 + m2 per site). m1, m2 OR m2, m1 produce equivalent results.

The estimated power is about 0.43.

We can try multiple sample sizes using the sapply() function. We apply the multisite.cont() function to sample sizes ranging from 12 - 30 in increments of 3. Notice that sample size in this case is the number of sites.

power <- sapply(seq(12,30,3), function(x)multisite.cont(m = 10,          
                                 J = x,
                                 delta = 3,
                                 sd = sqrt(42),
                                 icc0 = 0.095,
                                 icc1 = 0.048, 
                                 alpha = 0.05,
                                 power = NULL))
data.frame(sample_size = seq(12,30,3), power)

  sample_size     power
1          12 0.5160467
2          15 0.6260654
3          18 0.7162145
4          21 0.7879091
5          24 0.8435886
6          27 0.8859939
7          30 0.9177628

A sample size of 24 gets us to power over 0.80, and sample size of 30 gets us up to 0.90.

5.4.2 Example: simulation

We can also estimate power using simulation via the simr package.

Let’s restate the setup: We are planning a multisite experiment where we will compare two types of bug traps, an older model (control) and a newer model (treatment). At each site we will place 5 traps of each type (10 total) and return in 30 days to weigh the amount of invasive bugs captured. We think it would be meaningful if our new trap captured 3 grams of additional bugs on average. We assume there is an Intercept random effect of 2 (between site standard deviation), a slope random effect of \(\sqrt{8}\), and a within site standard deviation of 6. How powerful is our experiment if we select 10 sites to run this experiment? Assume a two-sided test and a significance level of 0.05.

First we create a data frame of 10 observations each at 10 sites. Then we add a sequence of five zeroes and five ones to indicate treatment. A zero means control and one means treated.

site <- 1:10
trap <- 1:10
d <- expand.grid(trap = trap, site = site)
d$trt <- rep(c(rep(0, 5), rep(1, 5)), 10)
head(d)

  trap site trt
1    1    1   0
2    2    1   0
3    3    1   0
4    4    1   0
5    5    1   0
6    6    1   1

Now define our intercept and slope random effects. Note that the simr package requires these be expressed as variances in a matrix. The within group variance needs to be expressed as standard deviation. To replicate the example above using a closed-form expression, we set the covariance of the random effects to 0.

between <- matrix(c(4,0,0,8), nrow = 2)
within <- 6

b <- c(10, 3) # fixed intercept and slope

Now define our model using the makeLmer() function. This function will simulate mass values based on our hypothesized coefficients and variances.

m <- makeLmer(mass ~ trt + (trt|site), 
              fixef = b, 
              VarCorr = between, 
              sigma = within, 
              data= d)
print(m)

Linear mixed model fit by REML ['lmerMod']
Formula: mass ~ trt + (trt | site)
   Data: d
REML criterion at convergence: 755.1639
Random effects:
 Groups   Name        Std.Dev. Corr
 site     (Intercept) 2.000        
          trt         2.828    0.00
 Residual             6.000        
Number of obs: 100, groups:  site, 10
Fixed Effects:
(Intercept)          trt  
         10            3

powerSim(m, 
         test = fixed("trt", "t"),
         seed = 2000,
         nsim = 100, 
         alpha = 0.05, 
         progress = FALSE)

Power for predictor 'trt', (95% confidence interval):
      41.00% (31.26, 51.29)

Test: t-test with Satterthwaite degrees of freedom (package lmerTest)
      Effect size for trt is 3.0

Based on 100 simulations, (32 warnings, 0 errors)
alpha = 0.05, nrow = 100

Time elapsed: 0 h 0 m 7 s

Power based on simulation is estimated to be about 0.41.

We can generate a power curve with varying sample sizes using the powerCurve() function. First extend the data set by the maximum number of sites we’re willing to entertain and then call the powerCurve() function. Below we specify a maximum of 30 sites. Note this can take a while to run for large data sets and complex models. We have lowered the default number of simulations from 1000 to 100. The breaks argument says we want to try sample sizes ranging from 12 to 30 in steps of 3.

m2 <- extend(m, along = "site", n = 30)
pc <- powerCurve(m2, 
                 test = fixed("trt", "t"), 
                 seed = 5,
                 along = "site",
                 breaks = seq(12, 30, 3),
                 nsim = 100, 
                 alpha = 0.05,
                 progress = FALSE)
plot(pc)

According to simulation, we need to sample 30 sites to reliably achieve power of at least 0.80. We see that the assumption of random slopes requires larger sample sizes to achieve sufficient levels of power.