Simulate three groups of data by sampling from three normal
distributions with the same standard deviation (1) but different means.
The set.seed(123)
line makes this data reproducible.
set.seed(123)
a <- rnorm(n = 20, mean = 3, sd = 1)
b <- rnorm(n = 20, mean = 3.5, sd = 1)
c <- rnorm(n = 20, mean = 4, sd = 1)
d <- data.frame(grp = rep(c("a", "b", "c"), each = 20),
y = c(a, b, c))
Calculate the means for each group:
means <- aggregate(y ~ grp, data = d, mean)
means
## grp y
## 1 a 3.141624
## 2 b 3.448743
## 3 c 4.106485
Calculate difference in means:
ba <- means$y[means$grp == "b"] - means$y[means$grp == "a"]
ca <- means$y[means$grp == "c"] - means$y[means$grp == "a"]
cb <- means$y[means$grp == "c"] - means$y[means$grp == "b"]
results <- data.frame(comparison = c("b-a", "c-a", "c-b"),
diff = c(ba, ca, cb))
results
## comparison diff
## 1 b-a 0.3071190
## 2 c-a 0.9648614
## 3 c-b 0.6577424
Run t-tests for all three comparisons:
ba <- t.test(y ~ grp, data = d, subset = grp != "c")
ca <- t.test(y ~ grp, data = d, subset = grp != "b")
cb <- t.test(y ~ grp, data = d, subset = grp != "a")
Put p-values in a table:
results$pvalue <- c(ba$p.value, ca$p.value, cb$p.value)
results
## comparison diff pvalue
## 1 b-a 0.3071190 0.289679919
## 2 c-a 0.9648614 0.003078154
## 3 c-b 0.6577424 0.025826189
Notice two are “significant” at the 0.05 level.
Now analyzing using a one-way ANOVA and conduct pairwise comparisons using Tukey’s HSD. Notice all p-values are higher than their t-test counterparts and only one is “significant” at 0.05 level.
m <- aov(y ~ grp, data = d)
TukeyHSD(m)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = y ~ grp, data = d)
##
## $grp
## diff lwr upr p adj
## b-a 0.3071190 -0.39465579 1.008894 0.5468419
## c-a 0.9648614 0.26308661 1.666636 0.0045735
## c-b 0.6577424 -0.04403243 1.359517 0.0705975