We’ll use the sleep data that comes with R. Not really pre-post data, but we’ll pretend it is.
sleep2 <- reshape(sleep, direction = "wide",
idvar = "ID", timevar = "group")
sleep2$ID <- NULL
names(sleep2) <- c("pre", "post")
10 subjects with pre and post data:
sleep2
## pre post
## 1 0.7 1.9
## 2 -1.6 0.8
## 3 -0.2 1.1
## 4 -1.2 0.1
## 5 -0.1 -0.1
## 6 3.4 4.4
## 7 3.7 5.5
## 8 0.8 1.6
## 9 0.0 4.6
## 10 2.0 3.4
Traditional pairwise t-test:
t.test(sleep2$post, sleep2$pre, paired = TRUE)
##
## Paired t-test
##
## data: sleep2$post and sleep2$pre
## t = 4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## 0.7001142 2.4598858
## sample estimates:
## mean difference
## 1.58
Now do pairwise t-test as a linear model:
m <- lm(I(post - pre) ~ 1, data = sleep2)
summary(m)
##
## Call:
## lm(formula = I(post - pre) ~ 1, data = sleep2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.58 -0.53 -0.28 0.12 3.02
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.580 0.389 4.062 0.00283 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.23 on 9 degrees of freedom
The Intercept coefficient is the mean difference. The residual standard error is the pooled standard deviation. We can use these to calculate Cohen’s d:
# 1.580/1.23
coef(m)/sigma(m)
## (Intercept)
## 1.284558
We can check this using the cohens_d()
function in the
{rstatix} package. Notice the data for this function needs to be in long
format
library(rstatix)
cohens_d(data = sleep, formula = extra ~ group, paired = TRUE,
ref.group = "2")
## # A tibble: 1 × 7
## .y. group1 group2 effsize n1 n2 magnitude
## * <chr> <chr> <chr> <dbl> <int> <int> <ord>
## 1 extra 2 1 1.28 10 10 large
By the way, there’s nothing fancy about this calculation. It’s just the mean of the differences divided by the standard deviation of the differences. We can calculate it with the raw data:
# cohen's d
mean(sleep2$post - sleep2$pre)/sd(sleep2$post - sleep2$pre)
## [1] 1.284558
Now let’s add some missing data to the sleep data.
sleep2[3,1] <- NA
sleep2[5,2] <- NA
Now do multiple imputation with {mice} and then do paired t-test via
lm()
library(mice)
set.seed(12)
imp <- mice(sleep2, print=F)
fit <- with(data = imp, exp = lm(I(post - pre) ~ 1))
summary(pool(fit))
## term estimate std.error statistic df p.value
## 1 (Intercept) 1.708 0.3767605 4.533384 6.943392 0.002744606
The estimated mean difference is -1.708, but notice that the residual standard error is not returned, so we can’t do our clever calculation of Cohen’s d!
However we can do it “by hand” using the raw imputed data. Here’s one way:
# extract all the imputed data sets
imp_d <- complete(imp, action = "all")
# apply cohen's d to all data sets
sapply(imp_d, function(x)mean(x[[2]] - x[[1]])/sd(x[[2]] - x[[1]]))
## 1 2 3 4 5
## 1.340594 1.612120 1.411150 1.612120 1.463188
Take the mean of all those to get a single estimated Cohen’s d:
cd <- sapply(imp_d, function(x)mean(x[[2]] - x[[1]])/sd(x[[2]] - x[[1]]))
mean(cd)
## [1] 1.487834