library(simr)
<- 1:4
year <- 1:15
student <- expand.grid(year = year, student = student)
d head(d, n = 8)
year student
1 1 1
2 2 1
3 3 1
4 4 1
5 1 2
6 2 2
7 3 2
8 4 2
For power estimation based on sample size, we need the following:
We are planning an experiment where we track the BMI of college students over their four years at school. We’ll measure their BMI at the beginning of each school year in August. We would like sufficient power to detect a year trend of at least 0.25 if we recruit 15 students to the study. The null hypothesis we’ll test is that the year (slope) coefficient is 0. Rejecting this hypothesis provides evidence of a change in BMI over time. Assume the following:
We can use simulation to estimate the power of a hypothesis test on a specific coefficient in a model. This example uses the simr package (Green and MacLeod 2016).
First we create a data frame of 4 observations each of 15 students.
library(simr)
<- 1:4
year <- 1:15
student <- expand.grid(year = year, student = student)
d head(d, n = 8)
year student
1 1 1
2 2 1
3 3 1
4 4 1
5 1 2
6 2 2
7 3 2
8 4 2
Now define our between and within group variance. Note that the simr package requires the between group variance be expressed as variance and the within group variance be expressed as standard deviation.
<- 3^2 # random intercept variance
between <- 1 # residual standard deviation within
Next define our effects. We need to provide intercept and slope coefficients. The slope coefficient is the effect we will test. The intercept can be interpreted at the average BMI for the students’ first year of college.
<- c(20, 0.25) # fixed intercept and slope b
Now define our model using the makeLmer()
function. This function will simulate BMI values based on our hypothesized coefficients and variances.
<- makeLmer(bmi ~ year + (1|student),
m fixef = b,
VarCorr = between,
sigma = within,
data= d)
print(m)
Linear mixed model fit by REML ['lmerMod']
Formula: bmi ~ year + (1 | student)
Data: d
REML criterion at convergence: 210.8396
Random effects:
Groups Name Std.Dev.
student (Intercept) 3
Residual 1
Number of obs: 60, groups: student, 15
Fixed Effects:
(Intercept) year
20.00 0.25
Finally use the powerSim()
function to estimate the power of the test on the slope using 100 simulations. Setting progress = FALSE
suppresses a progress bar which we don’t want to output in this online book. The seed argument makes the following result reproducible. The fixed()
function dictates what coefficient we want to test. In this case we want to test the “year” coefficient using a “t” test.
# power of test on slope
powerSim(m,
test = fixed("year", "t"),
seed = 1,
nsim = 100,
alpha = 0.01,
progress = FALSE)
Power for predictor 'year', (95% confidence interval):
37.00% (27.56, 47.24)
Test: t-test with Satterthwaite degrees of freedom (package lmerTest)
Effect size for year is 0.25
Based on 100 simulations, (0 warnings, 0 errors)
alpha = 0.01, nrow = 60
Time elapsed: 0 h 0 m 6 s
Our power is estimated to be about 0.37. The confidence interval is based on a simple binomial test.
round(binom.test(x = 37, n = 100)$conf.int * 100, 2)
[1] 27.56 47.24
attr(,"conf.level")
[1] 0.95
This power is low. Let’s try increasing the sample size. We can do that with the extend()
function. Below we increase the sample size to 30 and re-run the simulation.
<- extend(m, along = "student", n = 30)
m2 powerSim(m2,
test = fixed("year", "t"),
seed = 2,
nsim = 100,
alpha = 0.01,
progress = FALSE)
Power for predictor 'year', (95% confidence interval):
64.00% (53.79, 73.36)
Test: t-test with Satterthwaite degrees of freedom (package lmerTest)
Effect size for year is 0.25
Based on 100 simulations, (0 warnings, 0 errors)
alpha = 0.01, nrow = 120
Time elapsed: 0 h 0 m 5 s
Power is higher but still too low at 0.64.
We can generate a power curve with varying sample sizes using the powerCurve()
function. First extend the data set by the maximum number of students we’re willing to entertain and then call the powerCurve()
function. Below we specify a maximum of 40 students. Note this can take a while to run for large data sets and complex models. We have lowered the default number of simulations from 1000 to 100. The breaks argument says we want to try sample sizes ranging from 30 to 40 in steps of 2.
<- extend(m, along = "student", n = 40)
m3 <- powerCurve(m3,
pc test = fixed("year", "t"),
seed = 5,
along = "student",
breaks = seq(30,40,2),
nsim = 100,
alpha = 0.01,
progress = FALSE)
plot(pc)
It appears we need about 38 students to achieve power of at least 0.80.