4  Multiple Regression

4.1 Multiple Regression: F test

An F test assesses whether any of the predictors in a multiple regression model are useful in predicting the response. The null hypothesis is that all of the coefficients (except the intercept) are 0. Rejecting the null implies the model has some explanatory value, though it doesn’t tell you how much.

For sample size estimation based on power, we need the following:

  • hypothesized effect size (f2 = R2 / (1 − R2))
  • numerator degrees of freedom (u = number of predictors being tested, occasionally represented as ‘p’)
  • significance level of test
  • power of test

For power estimation based on sample size, we need the following:

  • hypothesized effect size (f2 = R2 / (1 − R2))
  • numerator degrees of freedom (u = number of predictors being tested, occasionally represented as ‘p’)
  • denominator degrees of freedom (v = n − u − 1, where n is sample size). Hence, n = v + u +1
  • significance level of test

4.1.1 Example: sample size

Suppose there is an annual 5k running race in a city. We are interested if we can predict an individual’s time to complete the race using two variables: total hours spent training the past three months and average pace per mile during training runs. We can perform a multiple regression to model race completion time as a function of training time and pace. We want to sample enough subjects to detect an effect when our two independent (explanatory) variables explains 45% of variation of the dependent (response) variable, race completion time. Assume the following:

  • hypothesized effect size of (f2 = .45 / (1-.45)), f2 = 0.82; R2 = .45

  • numerator degrees of freedom u = 2 (two predictors)

  • significance level = 0.01

  • desired power = 0.90

Using the pwr package (Champely 2020):

library(pwr)
pwr.f2.test(f2 = 0.82, u = 2, sig.level = 0.01, power = 0.9)

     Multiple regression power calculation 

              u = 2
              v = 23.17
             f2 = 0.82
      sig.level = 0.01
          power = 0.9

From above, we know that n = v + u +1, so we need 24 + 2 + 1 = 27 subjects.

4.1.2 Example: power

Suppose there is an annual 5k running race in a city. We are interested if we can predict an individual’s time to complete the race using two variables: total hours spent training the past three months and average pace per mile during training runs. We can perform a multiple regression to model race completion time as a function of training time and pace. We want to sample enough subjects to detect an effect when our two independent (explanatory) variables explains 45% of variation of the dependent (response) variable, race completion time. We will have 30 participating subjects. Assume the following:

  • hypothesized effect size of (f2 = .45 / (1-.45)), f2 = 0.82; R2 = .45

  • numerator degrees of freedom u = 2 (two predictors)

  • denominator degrees of freedom v = n − u − 1 = 30 - 2 - 1 = 27

  • significance level = 0.01

Using the pwr package (Champely 2020):

library(pwr)
pwr.f2.test(f2 = 0.82, u = 2, v = 27, sig.level = 0.01)

     Multiple regression power calculation 

              u = 2
              v = 27
             f2 = 0.82
      sig.level = 0.01
          power = 0.9491656

The power is about 0.95.

This can be done using the powertools package (Crespi and Liu 2025), we use the ‘random’ argument to specify that our variables are random (as opposed to fixed), and the ‘v’ argument for a verbose output:

library(powertools) 
mlrF.overall(N = 30, p = 2, Rsq = 0.45, power = NULL, random = TRUE, v = TRUE)

     Power calculation for a multiple linear regression
     overall F test assuming random predictors 

              N = 30
              p = 2
            Rsq = 0.45
            fsq = 0.8181818
          alpha = 0.05
          power = 0.983175

The power is about 0.98

4.2 Multiple Regression: Partial F Tests

A partial F test evaluates whether adding one or more predictors to a regression model significantly improves its ability to explain variation in the response variable. It compares a reduced model (fewer predictors) with a full model (more predictors) and tests whether the additional predictors contribute significantly to ( R2 ). The null hypothesis is no difference between the models. Note the reduced model should be nested within the full model. In other words, the reduced model shouldn’t have any predictors not present in the full model.

For sample size estimation based on power, we need the following:

  • incremental effect size (f2 = R2full - R2reduced) / (1 - R2full)

  • number of predictors in full model (sometimes called p)

  • number of predictors in reduced model (sometimes called q)

  • significance level of test

  • desired power

For power estimation based on sample size, we need the following:

  • incremental effect size (f2 = R2full - R2reduced) / (1 - R2full)

  • number of predictors in full model (sometimes called p)

  • number of predictors in reduced model (sometimes called q)

  • significance level of test

4.2.1 Example: sample size

Suppose there is an annual 5k running race in a city. We are interested if we can predict an individual’s completion time using a statistical model. We already have a model fit with two variables (training hours and average pace) that explains 35% of variation in race completion time. We now want to know how many participants we need to detect whether adding a third variable, weekly distance, improves model performance by an additional 10% of explained variance (so that ( R2full = 0.45 )).

Assume the following:

  • R2full = 0.45
  • R2reduced = 0.35
  • p = 3 (predictors in full model)
  • q = 2 (predictors in reduced model)
  • significance level ( = 0.05 )
  • desired power ( = 0.80 )

Using the powertools package (Crespi and Liu 2025):

library(powertools)
mlrF.partial(N = NULL, 
             p = 3, 
             q = 2, 
             Rsq.red = 0.35, 
             Rsq.full = 0.45, 
             power = 0.80)
[1] 56.27915

Assuming our effect size is correct, we should plan on sampling at least 57 participants to have a probability of 0.80 of correctly rejecting the null of no difference between the models.

4.2.2 Example: power

Suppose there is an annual 5k running race in a city. We are interested if we can predict an individual’s completion time using a statistical model. We already have a model fit with two variables (training hours and average pace) that explains 35% of variation in race completion time. If we sample 30 participants, we now want to know how much power we have to detect whether adding a third variable , weekly distance, improves model performance by an additional 10% of explained variance (so that ( R2full = 0.45 )).

Assume the following:

  • N = 30
  • R2full = 0.45
  • R2reduced = 0.35
  • p = 3 (predictors in full model)
  • q = 2 (predictors in reduced model)
  • significance level ( = 0.05 )

Using the powertools package (Crespi and Liu 2025):

library(powertools)
mlrF.partial(N = 30, 
             p = 3, 
             q = 2, 
             Rsq.red = 
               0.35, 
             Rsq.full = 0.45, 
             power = NULL)
[1] 0.4874955

Assuming our effect size is correct, sampling 30 participants only provides power of about 0.49 of correctly rejecting the null of no difference between the models.