paired t-test as linear model

We’ll use the sleep data that comes with R. Not really pre-post data, but we’ll pretend it is.

sleep2 <- reshape(sleep, direction = "wide",
              idvar = "ID", timevar = "group")
sleep2$ID <- NULL
names(sleep2) <- c("pre", "post")

10 subjects with pre and post data:

sleep2

##     pre post
## 1   0.7  1.9
## 2  -1.6  0.8
## 3  -0.2  1.1
## 4  -1.2  0.1
## 5  -0.1 -0.1
## 6   3.4  4.4
## 7   3.7  5.5
## 8   0.8  1.6
## 9   0.0  4.6
## 10  2.0  3.4

Traditional pairwise t-test:

t.test(sleep2$post, sleep2$pre, paired = TRUE)

## 
##  Paired t-test
## 
## data:  sleep2$post and sleep2$pre
## t = 4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  0.7001142 2.4598858
## sample estimates:
## mean difference 
##            1.58

Now do pairwise t-test as a linear model:

m <- lm(I(post - pre) ~ 1, data = sleep2)
summary(m)

## 
## Call:
## lm(formula = I(post - pre) ~ 1, data = sleep2)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -1.58  -0.53  -0.28   0.12   3.02 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    1.580      0.389   4.062  0.00283 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.23 on 9 degrees of freedom

The Intercept coefficient is the mean difference. The residual standard error is the pooled standard deviation. We can use these to calculate Cohen’s d:

# 1.580/1.23
coef(m)/sigma(m)

## (Intercept) 
##    1.284558

We can check this using the cohens_d() function in the {rstatix} package. Notice the data for this function needs to be in long format

library(rstatix)
cohens_d(data = sleep, formula = extra ~ group, paired = TRUE, 
         ref.group = "2")

## # A tibble: 1 × 7
##   .y.   group1 group2 effsize    n1    n2 magnitude
## * <chr> <chr>  <chr>    <dbl> <int> <int> <ord>    
## 1 extra 2      1         1.28    10    10 large

By the way, there’s nothing fancy about this calculation. It’s just the mean of the differences divided by the standard deviation of the differences. We can calculate it with the raw data:

# cohen's d
mean(sleep2$post - sleep2$pre)/sd(sleep2$post - sleep2$pre)

## [1] 1.284558

Now let’s add some missing data to the sleep data.

sleep2[3,1] <- NA
sleep2[5,2] <- NA

Now do multiple imputation with {mice} and then do paired t-test via lm()

library(mice)
set.seed(12)
imp <- mice(sleep2, print=F)
fit <- with(data = imp, exp = lm(I(post - pre) ~ 1))
summary(pool(fit))

##          term estimate std.error statistic       df     p.value
## 1 (Intercept)    1.708 0.3767605  4.533384 6.943392 0.002744606

The estimated mean difference is -1.708, but notice that the residual standard error is not returned, so we can’t do our clever calculation of Cohen’s d!

However we can do it “by hand” using the raw imputed data. Here’s one way:

# extract all the imputed data sets
imp_d <- complete(imp, action = "all")
# apply cohen's d to all data sets
sapply(imp_d, function(x)mean(x[[2]] - x[[1]])/sd(x[[2]] - x[[1]]))

##        1        2        3        4        5 
## 1.340594 1.612120 1.411150 1.612120 1.463188

Take the mean of all those to get a single estimated Cohen’s d:

cd <- sapply(imp_d, function(x)mean(x[[2]] - x[[1]])/sd(x[[2]] - x[[1]]))
mean(cd)

## [1] 1.487834

paired t-test as linear model

Clay Ford

2024-09-05