First generate data for 10 subjects (g
) with 20
observations each (n
). The x
predictor ranges
from 1:20. The x2 predictor is x - 13
for
x > 13
. The r
, z0
, and
z1
vectors add noise. z0
adds subject-specific
noise to the intercept, z1
adds subject-specific noise to
the slope, and r
adds noise to all observations. All noise
is drawn from a Normal distribution with mean 0 and some fixed standard
deviation, as is assumed by linear mixed-effect models.
n <- 20
g <- 10
x <- rep(1:n, g)
id <- rep(1:g, each = n)
x2 <- ifelse((x - 13) > 0, (x - 13), 0)
set.seed(2)
r <- rnorm(length(x), sd = 0.8)
z0 <- rnorm(g, sd = 0.5)
z1 <- rnorm(g, sd = 0.2)
y <- (10 + z0[id]) + (-4 + z1[id])*x + (6 + z1[id])*x2 + r
d <- data.frame(id = factor(id), y, x)
Quick plot of the data with subjects color-coded.
library(ggplot2)
ggplot(d) +
aes(x, y, color = id, group = id) +
geom_point() +
geom_line()
Quick plot of the data with a smooth trend line, which in this case is a loess curve.
ggplot(d) +
aes(x, y) +
geom_point() +
geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Below we fit a mixed-effect model that says the intercept and slope
are conditional on subject: (x|id)
. We also specify that
x
be modeled a cubic spline using the ns
function from the splines package. df=3
says we think the
trajectory could change direction three times.
library(lme4)
library(splines)
m1 <- lmer(y ~ ns(x, df = 3) + (x|id), data = d)
The summary output is all but impossible to interpret.
summary(m1, corr = FALSE)
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ ns(x, df = 3) + (x | id)
## Data: d
##
## REML criterion at convergence: 772.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2604 -0.5278 0.0792 0.6657 2.3823
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## id (Intercept) 0.03465 0.1862
## x 0.10797 0.3286 -1.00
## Residual 2.27348 1.5078
## Number of obs: 200, groups: id, 10
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 6.0940 0.3331 18.30
## ns(x, df = 3)1 -47.0004 1.2273 -38.30
## ns(x, df = 3)2 -58.9098 2.4632 -23.92
## ns(x, df = 3)3 -17.1021 1.6342 -10.46
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
So we can plot the smooth trend line and visually interpret.
library(ggeffects)
eff <- ggpredict(m1, "x[all]")
plot(eff, add.data = TRUE)
Here we fit something closer to the “correct” model. We use the
segmented package to find the breakpoint. The
segmented package does not work with lme4, but does
(sort of) work with nlme, an older but still very powerful mixed-effect
modeling package. Another quirk is that the variable with the breakpoint
cannot have associated random effects, so we have to fit a
random-intercept model (~1|id
). First we fit the model
using lme
and then use segmented.default
to
estimate the breakpoint. The npsi = 1
argument says we
think there is one breakpoint.
library(segmented)
library(nlme)
m2 <- lme(y ~ x, random = ~ 1|id, data = d)
sm1 <- segmented.default(m2, seg.Z = ~x, npsi = 1)
Use the fixef
function to get the estimated slopes for
each line. The estimated values are very close to the “true” values we
used above to generate the data, -4 and 6.
fixef(sm1)
## (Intercept) x U1.x psi1.x
## 10.027966 -3.949946 6.050022 0.000000
The estimated breakpoint is extracted with the
confint.segmented
function. The true value of 13 falls well
within the 95% CI.
confint.segmented(sm1, "x", .coef=fixef(sm1))
## Est. CI(95%).low CI(95%).up
## psi1.x 13.0328 12.8203 13.2454
We might report this as follows:
\[y = 10.03 + -3.95x \hspace{1cm} x \le 13.03\]
\[y = 10.03 + -3.95x + 6.05(x - 13.03)\hspace{1cm} x > 13.03\]
Finally we can get a basic plot with a confidence ribbon.
plot.segmented(sm1, "x", .coef=fixef(sm1), conf.level=.95)
There’s probably a way to extract the values and make a nicer looking plot with ggplot2, but I’m not pursuing that.