library(pwr)
pwr.f2.test(f2 = 0.82, u = 2, sig.level = 0.01, power = 0.9)
Multiple regression power calculation
u = 2
v = 23.17
f2 = 0.82
sig.level = 0.01
power = 0.9
Suppose there is an annual 5k running race in a city. We are interested if we can predict an individual’s time to complete the race using two variables: total hours spent training the past three months and average pace per mile during training runs. We can perform a multiple regression to model race completion time as a function of training time and pace. We want to sample enough subjects to detect an effect when our two independent (explanatory) variables explains 45% of variation of the dependent (response) variable, race completion time. Assume the following:
hypothesized effect size of (f2 = .45 / (1-.45)), f2 = 0.82; R2 = .45
numerator degrees of freedom u = 2 (two predictors)
significance level = 0.01
desired power = 0.75
Using the pwr package (Champely 2020):
library(pwr)
pwr.f2.test(f2 = 0.82, u = 2, sig.level = 0.01, power = 0.9)
Multiple regression power calculation
u = 2
v = 23.17
f2 = 0.82
sig.level = 0.01
power = 0.9
From above, we know that n = v + u +1, so we need 24 + 2 + 1 = 27 subjects.
Suppose there is an annual 5k running race in a city. We are interested if we can predict an individual’s time to complete the race using two variables: total hours spent training the past three months and average pace per mile during training runs. We can perform a multiple regression to model race completion time as a function of training time and pace. We want to sample enough subjects to detect an effect when our two independent (explanatory) variables explains 45% of variation of the dependent (response) variable, race completion time. We will have 30 participating subjects. Assume the following:
hypothesized effect size of (f2 = .45 / (1-.45)), f2 = 0.82; R2 = .45
numerator degrees of freedom u = 2 (two predictors)
denominator degrees of freedom v = n − u − 1 = 30 - 2 - 1 = 27
significance level = 0.01
Using the pwr package (Champely 2020):
library(pwr)
pwr.f2.test(f2 = 0.82, u = 2, v = 27, sig.level = 0.01)
Multiple regression power calculation
u = 2
v = 27
f2 = 0.82
sig.level = 0.01
power = 0.9491656
The power is about 0.95.
This can be done using the powertools package (Crespi and Liu 2025), we use the ‘random’ argument to specify that our variables are random (as opposed to fixed), and the ‘v’ argument for a verbose output:
library(powertools)
mlrF.overall(N = 30, p = 2, Rsq = 0.45, power = NULL, random = TRUE, v = TRUE)
Power calculation for a multiple linear regression
overall F test assuming random predictors
N = 30
p = 2
Rsq = 0.45
fsq = 0.8181818
alpha = 0.05
power = 0.983175
The power is about 0.98
A partial F test evaluates whether adding one or more predictors to a regression model significantly improves its ability to explain variation in the response variable.
It compares a reduced model (fewer predictors) with a full model (more predictors) and tests whether the additional predictors contribute significantly to ( R2 ).
incremental effect size (f2 = R2full - R2reduced) / (1 - R2full)
number of predictors in full model (sometimes called p)
number of predictors in reduced model (sometimes called q)
significance level of test
desired power
incremental effect size (f2 = R2full - R2reduced) / (1 - R2full)
number of predictors in full model (sometimes called p)
number of predictors in reduced model (sometimes called q)
significance level of test
Suppose there is an annual 5k running race in a city. We are interested if we can predict an individual’s completion time using a statitical model. We already have a model fit with two variables (training hours and average pace) that explains 35% of variation in race completion time. We now want to know how many participants we need to detect whether adding a third variable, weekly distance, improves model performance by an additional 10% of explained variance (so that ( R2full = 0.45 )).
Assume the following:
Using the powertools package (Crespi and Liu 2025):
library(powertools)
mlrF.partial(N = NULL, p = 3, q = 2, Rsq.red = 0.35, Rsq.full = 0.45, power = 0.80)[1] 56.27915
Suppose there is an annual 5k running race in a city. We are interested if we can predict an individual’s completion time using a statitical model. We already have a model fit with two variables (training hours and average pace) that explains 35% of variation in race completion time. If we sample 30 participants, we now want to know how much power we have to detect whether adding a third variable , weekly distance, improves model performance by an additional 10% of explained variance (so that ( R2full = 0.45 )).
Assume the following:
Using the powertools package (Crespi and Liu 2025):
library(powertools)
mlrF.partial(N = 30, p = 3, q = 2, Rsq.red = 0.35, Rsq.full = 0.45, power = NULL)[1] 0.4874955