library(emmeans)
library(ggplot2)
library(multcomp)
data(pigs)
$percent <- factor(pigs$percent) pigs
Compact Letter Displays
In Compact Letter Displays (CLD), groups that do not differ significantly from each other share the same letter, while groups that do show significant differences will have different letters.
Demonstrate using “pigs” data included with {emmeans} package.
- source: Source of protein in the diet (factor with 3 levels: fish meal, soybean meal, milk)
- percent: Protein percentage in the diet (numeric with 4 values: 9, 12, 15, and 18)
- conc: Concentration of free plasma leucine, in mcg/ml
Looks like most of the variability in conc is due to source, but percent contributes as well.
plot.design(pigs)
Fit a 2-way ANOVA with no interaction.
<- lm(conc ~ source + percent, data = pigs)
pigs.lm ::Anova(pigs.lm) car
Anova Table (Type II tests)
Response: conc
Sum Sq Df F value Pr(>F)
source 1116.12 2 21.6596 5.141e-06 ***
percent 466.73 3 6.0383 0.003452 **
Residuals 592.60 23
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now use {emmeans} to estimate marginal means.
<- emmeans(pigs.lm, c("percent", "source"))
pigs.emm pigs.emm
percent source emmean SE df lower.CL upper.CL
9 fish 23.3 2.36 23 18.4 28.2
12 fish 29.7 2.18 23 25.2 34.2
15 fish 31.6 2.42 23 26.6 36.6
18 fish 34.9 2.41 23 29.9 39.9
9 soy 32.8 2.20 23 28.2 37.3
12 soy 39.1 2.16 23 34.7 43.6
15 soy 41.1 2.24 23 36.5 45.7
18 soy 44.4 2.82 23 38.6 50.2
9 skim 38.9 2.21 23 34.3 43.5
12 skim 45.2 2.18 23 40.7 49.7
15 skim 47.2 2.42 23 42.2 52.2
18 skim 50.5 2.85 23 44.6 56.4
Confidence level used: 0.95
Now generate compact letter displays using the cld()
function from the {multcomp} package. The {emmeans} package provides a method for running this function on {emmeans} objects. This replicates the emmeans()
output above with an extra columns for the letters.
::cld(pigs.emm, Letters = LETTERS) multcomp
percent source emmean SE df lower.CL upper.CL .group
9 fish 23.3 2.36 23 18.4 28.2 A
12 fish 29.7 2.18 23 25.2 34.2 ABC
15 fish 31.6 2.42 23 26.6 36.6 AB DE
9 soy 32.8 2.20 23 28.2 37.3 BCD F
18 fish 34.9 2.41 23 29.9 39.9 BCD FG
9 skim 38.9 2.21 23 34.3 43.5 BCDEFGH
12 soy 39.1 2.16 23 34.7 43.6 DEFGHI
15 soy 41.1 2.24 23 36.5 45.7 C FGHI
18 soy 44.4 2.82 23 38.6 50.2 E HI
12 skim 45.2 2.18 23 40.7 49.7 GHI
15 skim 47.2 2.42 23 42.2 52.2 GHI
18 skim 50.5 2.85 23 44.6 56.4 I
Confidence level used: 0.95
P value adjustment: tukey method for comparing a family of 12 estimates
significance level used: alpha = 0.05
NOTE: If two or more means share the same grouping symbol,
then we cannot show them to be different.
But we also did not show them to be the same.
I like the note:
NOTE: If two or more means share the same grouping symbol, then we cannot show them to be different. But we also did not show them to be the same.
Save as a data frame so we can create the CLD plot.
<- multcomp::cld(pigs.emm, Letters = LETTERS)
cld_df names(cld_df)
[1] "percent" "source" "emmean" "SE" "df" "lower.CL" "upper.CL"
[8] ".group"
Now create the plot. Use show.legend = F
in geom_text()
to suppress a letter from appearing in the legend.
ggplot(cld_df) +
aes(x = percent, y = emmean, color = source) +
geom_point(position = position_dodge(width = 0.9)) +
geom_errorbar(mapping = aes(ymin = lower.CL, ymax = upper.CL),
position = position_dodge(width = 0.9),
width = 0.1) +
geom_text(mapping = aes(label = .group, y = upper.CL * 1.05),
position = position_dodge(width = 0.9),
show.legend = F)
Recall: groups that do not differ significantly from each other share the same letter, while groups that do show significant differences will have different letters. For example, fish at 9 percent appears to be different from soy and skim at 9 percent.
See this page for R code to create fancier CLD plots and some interesting critiques of CLD. For example, the developer of the {emmeans} package is on record saying, “Providing for CLDs at all remains one of my biggest regrets in developing this package.” In another Stack Overflow answer he states, “IMO, almost anything is better than a CLD. They display non-findings rather than findings.”
He instead suggests presenting simple comparisons in tabular form, like so:
pairs(pigs.emm, by = "source")
source = fish:
contrast estimate SE df t.ratio p.value
percent9 - percent12 -6.36 2.47 23 -2.570 0.0751
percent9 - percent15 -8.31 2.63 23 -3.156 0.0213
percent9 - percent18 -11.63 2.98 23 -3.899 0.0038
percent12 - percent15 -1.96 2.57 23 -0.763 0.8701
percent12 - percent18 -5.27 2.88 23 -1.828 0.2865
percent15 - percent18 -3.31 3.04 23 -1.088 0.7002
source = soy:
contrast estimate SE df t.ratio p.value
percent9 - percent12 -6.36 2.47 23 -2.570 0.0751
percent9 - percent15 -8.31 2.63 23 -3.156 0.0213
percent9 - percent18 -11.63 2.98 23 -3.899 0.0038
percent12 - percent15 -1.96 2.57 23 -0.763 0.8701
percent12 - percent18 -5.27 2.88 23 -1.828 0.2865
percent15 - percent18 -3.31 3.04 23 -1.088 0.7002
source = skim:
contrast estimate SE df t.ratio p.value
percent9 - percent12 -6.36 2.47 23 -2.570 0.0751
percent9 - percent15 -8.31 2.63 23 -3.156 0.0213
percent9 - percent18 -11.63 2.98 23 -3.899 0.0038
percent12 - percent15 -1.96 2.57 23 -0.763 0.8701
percent12 - percent18 -5.27 2.88 23 -1.828 0.2865
percent15 - percent18 -3.31 3.04 23 -1.088 0.7002
P value adjustment: tukey method for comparing a family of 4 estimates
pairs(pigs.emm, by = "percent")
percent = 9:
contrast estimate SE df t.ratio p.value
fish - soy -9.47 2.33 23 -4.059 0.0014
fish - skim -15.58 2.39 23 -6.526 <.0001
soy - skim -6.11 2.34 23 -2.613 0.0398
percent = 12:
contrast estimate SE df t.ratio p.value
fish - soy -9.47 2.33 23 -4.059 0.0014
fish - skim -15.58 2.39 23 -6.526 <.0001
soy - skim -6.11 2.34 23 -2.613 0.0398
percent = 15:
contrast estimate SE df t.ratio p.value
fish - soy -9.47 2.33 23 -4.059 0.0014
fish - skim -15.58 2.39 23 -6.526 <.0001
soy - skim -6.11 2.34 23 -2.613 0.0398
percent = 18:
contrast estimate SE df t.ratio p.value
fish - soy -9.47 2.33 23 -4.059 0.0014
fish - skim -15.58 2.39 23 -6.526 <.0001
soy - skim -6.11 2.34 23 -2.613 0.0398
P value adjustment: tukey method for comparing a family of 3 estimates