predict_response()
margin
argumentElaborating on the {ggeffects} vignette, Difference
Between Marginalization Methods: The margin
Argument.
Fit a model for examples using one factor and one numeric predictor.
library(ggeffects)
data(mtcars)
mtcars$cyl <- factor(mtcars$cyl)
m <- lm(mpg ~ wt + cyl, data = mtcars)
Notice the reference level is “4”.
levels(mtcars$cyl)
[1] "4" "6" "8"
Notice the most frequently occurring level is “8”.
table(mtcars$cyl)
4 6 8
11 7 14
This is the default. Predictions are made holding the non-focal predictor, in this case the factor “cyl”, at its reference level.
predict_response(m, terms = "wt [2.5, 3, 3.5]", margin = "mean_reference")
# Predicted values of mpg
wt | Predicted | 95% CI
-------------------------------
2.50 | 25.98 | 24.36, 27.59
3.00 | 24.37 | 22.45, 26.30
3.50 | 22.77 | 20.32, 25.22
Adjusted for:
* cyl = 4
Here are the predictor values used to predict the response. The
data_grid()
function is in the {ggeffects} package.
new_mean <- data_grid(m, terms = "wt [2.5, 3, 3.5]", typical = "mean")
new_mean
wt cyl
1 2.5 4
2 3.0 4
3 3.5 4
Replicate the result of predict_response()
using the
base R predict()
function:
p <- predict(m, newdata = new_mean) |>
round(2)
cbind(wt = c(2.5, 3, 3.5), pred = p)
wt pred
1 2.5 25.98
2 3.0 24.37
3 3.5 22.77
Predictions are made holding the non-focal predictor, in this case the factor “cyl”, at its mode.
predict_response(m, terms = "wt [2.5, 3, 3.5]", margin = "mean_mode")
# Predicted values of mpg
wt | Predicted | 95% CI
-------------------------------
2.50 | 19.91 | 17.20, 22.61
3.00 | 18.30 | 16.22, 20.39
3.50 | 16.70 | 15.10, 18.30
Adjusted for:
* cyl = 8
Here are the predictor values used to predict the response. Notice we
set typical = "mode"
.
new_mode <- data_grid(m, terms = "wt [2.5, 3, 3.5]", typical = "mode")
new_mode
wt cyl
1 2.5 8
2 3.0 8
3 3.5 8
Replicate the result of predict_response()
using the
base R predict()
function:
p <- predict(m, newdata = new_mode) |>
round(2)
cbind(wt = c(2.5, 3, 3.5), pred = p)
wt pred
1 2.5 19.91
2 3.0 18.30
3 3.5 16.70
Predictions are made at all combinations of the variables, both focal and non-focal, and then the predictions are averaged by the levels of the focal predictor.
predict_response(m, terms = "wt [2.5, 3, 3.5]", margin = "marginalmeans")
# Predicted values of mpg
wt | Predicted | 95% CI
-------------------------------
2.50 | 22.53 | 21.16, 23.91
3.00 | 20.93 | 19.95, 21.92
3.50 | 19.33 | 18.21, 20.45
To get predictor values, we can use expand.grid()
to get
all combinations.
new_marginalmeans <- expand.grid(wt = c(2.5, 3, 3.5),
cyl = factor(c(4,6,8)))
new_marginalmeans
wt cyl
1 2.5 4
2 3.0 4
3 3.5 4
4 2.5 6
5 3.0 6
6 3.5 6
7 2.5 8
8 3.0 8
9 3.5 8
A prediction is made at each combination of predictor variables.
new_marginalmeans$pred <- predict(m, newdata = new_marginalmeans)
new_marginalmeans
wt cyl pred
1 2.5 4 25.97676
2 3.0 4 24.37395
3 3.5 4 22.77115
4 2.5 6 21.72118
5 3.0 6 20.11837
6 3.5 6 18.51557
7 2.5 8 19.90590
8 3.0 8 18.30309
9 3.5 8 16.70029
Then we calculate the mean prediction by “wt” level to get the predicted values returned by {ggeffects}.
aggregate(pred ~ wt, data = new_marginalmeans, mean) |>
round(2)
wt pred
1 2.5 22.53
2 3.0 20.93
3 3.5 19.33
Predictions are made at each observation of the original data using the observed non-focal predictor and a level of the focal predictor. This is repeated for each level of the focal predictor. And then the predictions are averaged at each level of the focal predictor.
In this example, that means setting all 32 cars in the mtcars data
frame to have wt = 2.5 and then calculating the predicted mpg for each
car. Repeat this process again setting wt = 3, and then again setting wt
= 3.5. When done, take the average of the predictions at each level of
wt. This is what happens when we set
margin = "empirical"
.
predict_response(m, terms = "wt [2.5, 3, 3.5]", margin = "empirical")
# Average predicted values of mpg
wt | Predicted | 95% CI
-------------------------------
2.50 | 22.39 | 20.95, 23.83
3.00 | 20.79 | 19.80, 21.77
3.50 | 19.18 | 18.16, 20.21
To replicate the calculation, we need to first replicate the data frame once for each value of the focal predictor. Below I only keep the “cyl” column since that’s all we need. We add the “wt” variable in the next code chunk.
new_emprical <- do.call(rbind, replicate(3, mtcars[,"cyl", drop = F],
simplify = FALSE))
Now add the “wt” variable. Add one level to each replicate of the data. Notice we have 96 rows, which is three times the size of the original dataset which has 32 rows.
new_emprical$wt <- rep(c(2.5, 3, 3.5), each = nrow(mtcars))
nrow(new_emprical)
[1] 96
Now make predictions for each row, 96 in all.
new_emprical$pred <- predict(m, newdata = new_emprical)
Finally, calculate the mean prediction by “wt” level to get the predicted values returned by {ggeffects}.
aggregate(pred ~ wt, data = new_emprical, mean) |>
round(2)
wt pred
1 2.5 22.39
2 3.0 20.79
3 3.5 19.18