Understanding the {ggeffects} predict_response() margin argument

Elaborating on the {ggeffects} vignette, Difference Between Marginalization Methods: The margin Argument.

Fit a model for examples using one factor and one numeric predictor.

library(ggeffects)
data(mtcars)
mtcars$cyl <- factor(mtcars$cyl)
m <- lm(mpg ~ wt + cyl, data = mtcars)

Notice the reference level is “4”.

levels(mtcars$cyl)

[1] "4" "6" "8"

Notice the most frequently occurring level is “8”.

table(mtcars$cyl)


 4  6  8 
11  7 14

“mean_reference”

This is the default. Predictions are made holding the non-focal predictor, in this case the factor “cyl”, at its reference level.

predict_response(m, terms = "wt [2.5, 3, 3.5]", margin = "mean_reference")

# Predicted values of mpg

  wt | Predicted |       95% CI
-------------------------------
2.50 |     25.98 | 24.36, 27.59
3.00 |     24.37 | 22.45, 26.30
3.50 |     22.77 | 20.32, 25.22

Adjusted for:
* cyl = 4

Here are the predictor values used to predict the response. The data_grid() function is in the {ggeffects} package.

new_mean <- data_grid(m, terms = "wt [2.5, 3, 3.5]", typical = "mean")
new_mean

   wt cyl
1 2.5   4
2 3.0   4
3 3.5   4

Replicate the result of predict_response() using the base R predict() function:

p <- predict(m, newdata = new_mean) |> 
  round(2)
cbind(wt = c(2.5, 3, 3.5), pred = p)

   wt  pred
1 2.5 25.98
2 3.0 24.37
3 3.5 22.77

“mean_mode”

Predictions are made holding the non-focal predictor, in this case the factor “cyl”, at its mode.

predict_response(m, terms = "wt [2.5, 3, 3.5]", margin = "mean_mode")

# Predicted values of mpg

  wt | Predicted |       95% CI
-------------------------------
2.50 |     19.91 | 17.20, 22.61
3.00 |     18.30 | 16.22, 20.39
3.50 |     16.70 | 15.10, 18.30

Adjusted for:
* cyl = 8

Here are the predictor values used to predict the response. Notice we set typical = "mode".

new_mode <- data_grid(m, terms = "wt [2.5, 3, 3.5]", typical = "mode")
new_mode

   wt cyl
1 2.5   8
2 3.0   8
3 3.5   8

Replicate the result of predict_response() using the base R predict() function:

p <- predict(m, newdata = new_mode) |> 
  round(2)
cbind(wt = c(2.5, 3, 3.5), pred = p)

   wt  pred
1 2.5 19.91
2 3.0 18.30
3 3.5 16.70

“marginalmeans”

Predictions are made at all combinations of the variables, both focal and non-focal, and then the predictions are averaged by the levels of the focal predictor.

predict_response(m, terms = "wt [2.5, 3, 3.5]", margin = "marginalmeans")

# Predicted values of mpg

  wt | Predicted |       95% CI
-------------------------------
2.50 |     22.53 | 21.16, 23.91
3.00 |     20.93 | 19.95, 21.92
3.50 |     19.33 | 18.21, 20.45

To get predictor values, we can use expand.grid() to get all combinations.

new_marginalmeans <- expand.grid(wt = c(2.5, 3, 3.5), 
                                 cyl = factor(c(4,6,8)))
new_marginalmeans

A prediction is made at each combination of predictor variables.

new_marginalmeans$pred <- predict(m, newdata = new_marginalmeans)
new_marginalmeans

   wt cyl     pred
1 2.5   4 25.97676
2 3.0   4 24.37395
3 3.5   4 22.77115
4 2.5   6 21.72118
5 3.0   6 20.11837
6 3.5   6 18.51557
7 2.5   8 19.90590
8 3.0   8 18.30309
9 3.5   8 16.70029

Then we calculate the mean prediction by “wt” level to get the predicted values returned by {ggeffects}.

aggregate(pred ~ wt, data = new_marginalmeans, mean) |> 
  round(2)

   wt  pred
1 2.5 22.53
2 3.0 20.93
3 3.5 19.33

“empirical” (or “counterfactual”)

Predictions are made at each observation of the original data using the observed non-focal predictor and a level of the focal predictor. This is repeated for each level of the focal predictor. And then the predictions are averaged at each level of the focal predictor.

In this example, that means setting all 32 cars in the mtcars data frame to have wt = 2.5 and then calculating the predicted mpg for each car. Repeat this process again setting wt = 3, and then again setting wt = 3.5. When done, take the average of the predictions at each level of wt. This is what happens when we set margin = "empirical".

predict_response(m, terms = "wt [2.5, 3, 3.5]", margin = "empirical")

# Average predicted values of mpg

  wt | Predicted |       95% CI
-------------------------------
2.50 |     22.39 | 20.95, 23.83
3.00 |     20.79 | 19.80, 21.77
3.50 |     19.18 | 18.16, 20.21

To replicate the calculation, we need to first replicate the data frame once for each value of the focal predictor. Below I only keep the “cyl” column since that’s all we need. We add the “wt” variable in the next code chunk.

new_emprical <- do.call(rbind, replicate(3, mtcars[,"cyl", drop = F], 
                                         simplify = FALSE))

Now add the “wt” variable. Add one level to each replicate of the data. Notice we have 96 rows, which is three times the size of the original dataset which has 32 rows.

new_emprical$wt <- rep(c(2.5, 3, 3.5), each = nrow(mtcars))
nrow(new_emprical)

[1] 96

Now make predictions for each row, 96 in all.

new_emprical$pred <- predict(m, newdata = new_emprical)

Finally, calculate the mean prediction by “wt” level to get the predicted values returned by {ggeffects}.

aggregate(pred ~ wt, data = new_emprical, mean) |> 
  round(2)

   wt  pred
1 2.5 22.39
2 3.0 20.79
3 3.5 19.18

Understanding the {ggeffects} `predict_response()` `margin` argument

Clay Ford

2025-01-10

“mean_reference”

“mean_mode”

“marginalmeans”

“empirical” (or “counterfactual”)