generate data

First generate data for 10 subjects (g) with 20 observations each (n). The x predictor ranges from 1:20. The x2 predictor is x - 13 for x > 13. The r, z0, and z1 vectors add noise. z0 adds subject-specific noise to the intercept, z1 adds subject-specific noise to the slope, and r adds noise to all observations. All noise is drawn from a Normal distribution with mean 0 and some fixed standard deviation, as is assumed by linear mixed-effect models.

n <- 20
g <- 10
x <- rep(1:n, g)
id <- rep(1:g, each = n)
x2 <- ifelse((x - 13) > 0, (x - 13), 0)
set.seed(2)
r <- rnorm(length(x), sd = 0.8)
z0 <- rnorm(g, sd = 0.5)
z1 <- rnorm(g, sd = 0.2)
y <- (10 + z0[id]) + (-4 + z1[id])*x + (6 + z1[id])*x2 + r 
d <- data.frame(id = factor(id), y, x)

plot the data

Quick plot of the data with subjects color-coded.

library(ggplot2)
ggplot(d) +
  aes(x, y, color = id, group = id) +
  geom_point() +
  geom_line()

Quick plot of the data with a smooth trend line, which in this case is a loess curve.

ggplot(d) +
  aes(x, y) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

fit mixed-effect model

Below we fit a mixed-effect model that says the intercept and slope are conditional on subject: (x|id). We also specify that x be modeled a cubic spline using the ns function from the splines package. df=3 says we think the trajectory could change direction three times.

library(lme4)
library(splines)
m1 <- lmer(y ~ ns(x, df = 3) + (x|id), data = d)

The summary output is all but impossible to interpret.

summary(m1, corr = FALSE)
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ ns(x, df = 3) + (x | id)
##    Data: d
## 
## REML criterion at convergence: 772.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2604 -0.5278  0.0792  0.6657  2.3823 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  id       (Intercept) 0.03465  0.1862        
##           x           0.10797  0.3286   -1.00
##  Residual             2.27348  1.5078        
## Number of obs: 200, groups:  id, 10
## 
## Fixed effects:
##                Estimate Std. Error t value
## (Intercept)      6.0940     0.3331   18.30
## ns(x, df = 3)1 -47.0004     1.2273  -38.30
## ns(x, df = 3)2 -58.9098     2.4632  -23.92
## ns(x, df = 3)3 -17.1021     1.6342  -10.46
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

So we can plot the smooth trend line and visually interpret.

library(ggeffects)
eff <- ggpredict(m1, "x[all]")
plot(eff, add.data = TRUE)

piecewise spline

Here we fit something closer to the “correct” model. We use the segmented package to find the breakpoint. The segmented package does not work with lme4, but does (sort of) work with nlme, an older but still very powerful mixed-effect modeling package. Another quirk is that the variable with the breakpoint cannot have associated random effects, so we have to fit a random-intercept model (~1|id). First we fit the model using lme and then use segmented.default to estimate the breakpoint. The npsi = 1 argument says we think there is one breakpoint.

library(segmented)
library(nlme)
m2 <- lme(y ~ x, random = ~ 1|id, data = d)
sm1 <- segmented.default(m2, seg.Z = ~x, npsi = 1) 

Use the fixef function to get the estimated slopes for each line. The estimated values are very close to the “true” values we used above to generate the data, -4 and 6.

fixef(sm1)
## (Intercept)           x        U1.x      psi1.x 
##   10.027966   -3.949946    6.050022    0.000000

The estimated breakpoint is extracted with the confint.segmented function. The true value of 13 falls well within the 95% CI.

confint.segmented(sm1, "x", .coef=fixef(sm1))
##           Est. CI(95%).low CI(95%).up
## psi1.x 13.0328     12.8203    13.2454

We might report this as follows:

\[y = 10.03 + -3.95x \hspace{1cm} x \le 13.03\]

\[y = 10.03 + -3.95x + 6.05(x - 13.03)\hspace{1cm} x > 13.03\]

Finally we can get a basic plot with a confidence ribbon.

plot.segmented(sm1, "x", .coef=fixef(sm1), conf.level=.95)

There’s probably a way to extract the values and make a nicer looking plot with ggplot2, but I’m not pursuing that.