Compact Letter Displays

Author

Clay Ford

Published

March 13, 2024

In Compact Letter Displays (CLD), groups that do not differ significantly from each other share the same letter, while groups that do show significant differences will have different letters.

Demonstrate using “pigs” data included with {emmeans} package.

library(emmeans)
library(ggplot2)
library(multcomp)
data(pigs)
pigs$percent <- factor(pigs$percent)

Looks like most of the variability in conc is due to source, but percent contributes as well.

plot.design(pigs)

Fit a 2-way ANOVA with no interaction.

pigs.lm <- lm(conc ~ source + percent, data = pigs)
car::Anova(pigs.lm)
Anova Table (Type II tests)

Response: conc
           Sum Sq Df F value    Pr(>F)    
source    1116.12  2 21.6596 5.141e-06 ***
percent    466.73  3  6.0383  0.003452 ** 
Residuals  592.60 23                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now use {emmeans} to estimate marginal means.

pigs.emm <- emmeans(pigs.lm, c("percent", "source"))
pigs.emm
 percent source emmean   SE df lower.CL upper.CL
 9       fish     23.3 2.36 23     18.4     28.2
 12      fish     29.7 2.18 23     25.2     34.2
 15      fish     31.6 2.42 23     26.6     36.6
 18      fish     34.9 2.41 23     29.9     39.9
 9       soy      32.8 2.20 23     28.2     37.3
 12      soy      39.1 2.16 23     34.7     43.6
 15      soy      41.1 2.24 23     36.5     45.7
 18      soy      44.4 2.82 23     38.6     50.2
 9       skim     38.9 2.21 23     34.3     43.5
 12      skim     45.2 2.18 23     40.7     49.7
 15      skim     47.2 2.42 23     42.2     52.2
 18      skim     50.5 2.85 23     44.6     56.4

Confidence level used: 0.95 

Now generate compact letter displays using the cld() function from the {multcomp} package. The {emmeans} package provides a method for running this function on {emmeans} objects. This replicates the emmeans() output above with an extra columns for the letters.

multcomp::cld(pigs.emm, Letters = LETTERS)
 percent source emmean   SE df lower.CL upper.CL .group    
 9       fish     23.3 2.36 23     18.4     28.2  A        
 12      fish     29.7 2.18 23     25.2     34.2  ABC      
 15      fish     31.6 2.42 23     26.6     36.6  AB DE    
 9       soy      32.8 2.20 23     28.2     37.3   BCD F   
 18      fish     34.9 2.41 23     29.9     39.9   BCD FG  
 9       skim     38.9 2.21 23     34.3     43.5   BCDEFGH 
 12      soy      39.1 2.16 23     34.7     43.6     DEFGHI
 15      soy      41.1 2.24 23     36.5     45.7    C  FGHI
 18      soy      44.4 2.82 23     38.6     50.2      E  HI
 12      skim     45.2 2.18 23     40.7     49.7        GHI
 15      skim     47.2 2.42 23     42.2     52.2        GHI
 18      skim     50.5 2.85 23     44.6     56.4          I

Confidence level used: 0.95 
P value adjustment: tukey method for comparing a family of 12 estimates 
significance level used: alpha = 0.05 
NOTE: If two or more means share the same grouping symbol,
      then we cannot show them to be different.
      But we also did not show them to be the same. 

I like the note:

NOTE: If two or more means share the same grouping symbol, then we cannot show them to be different. But we also did not show them to be the same.

Save as a data frame so we can create the CLD plot.

cld_df <- multcomp::cld(pigs.emm, Letters = LETTERS)
names(cld_df)
[1] "percent"  "source"   "emmean"   "SE"       "df"       "lower.CL" "upper.CL"
[8] ".group"  

Now create the plot. Use show.legend = F in geom_text() to suppress a letter from appearing in the legend.

ggplot(cld_df) +
  aes(x = percent, y = emmean, color = source) +
  geom_point(position = position_dodge(width = 0.9)) +
  geom_errorbar(mapping = aes(ymin = lower.CL, ymax = upper.CL), 
                              position = position_dodge(width = 0.9),
                width = 0.1) +
  geom_text(mapping = aes(label = .group, y = upper.CL * 1.05), 
            position = position_dodge(width = 0.9), 
            show.legend = F)

Recall: groups that do not differ significantly from each other share the same letter, while groups that do show significant differences will have different letters. For example, fish at 9 percent appears to be different from soy and skim at 9 percent.

See this page for R code to create fancier CLD plots and some interesting critiques of CLD. For example, the developer of the {emmeans} package is on record saying, “Providing for CLDs at all remains one of my biggest regrets in developing this package.” In another Stack Overflow answer he states, “IMO, almost anything is better than a CLD. They display non-findings rather than findings.”

He instead suggests presenting simple comparisons in tabular form, like so:

pairs(pigs.emm, by = "source")
source = fish:
 contrast              estimate   SE df t.ratio p.value
 percent9 - percent12     -6.36 2.47 23  -2.570  0.0751
 percent9 - percent15     -8.31 2.63 23  -3.156  0.0213
 percent9 - percent18    -11.63 2.98 23  -3.899  0.0038
 percent12 - percent15    -1.96 2.57 23  -0.763  0.8701
 percent12 - percent18    -5.27 2.88 23  -1.828  0.2865
 percent15 - percent18    -3.31 3.04 23  -1.088  0.7002

source = soy:
 contrast              estimate   SE df t.ratio p.value
 percent9 - percent12     -6.36 2.47 23  -2.570  0.0751
 percent9 - percent15     -8.31 2.63 23  -3.156  0.0213
 percent9 - percent18    -11.63 2.98 23  -3.899  0.0038
 percent12 - percent15    -1.96 2.57 23  -0.763  0.8701
 percent12 - percent18    -5.27 2.88 23  -1.828  0.2865
 percent15 - percent18    -3.31 3.04 23  -1.088  0.7002

source = skim:
 contrast              estimate   SE df t.ratio p.value
 percent9 - percent12     -6.36 2.47 23  -2.570  0.0751
 percent9 - percent15     -8.31 2.63 23  -3.156  0.0213
 percent9 - percent18    -11.63 2.98 23  -3.899  0.0038
 percent12 - percent15    -1.96 2.57 23  -0.763  0.8701
 percent12 - percent18    -5.27 2.88 23  -1.828  0.2865
 percent15 - percent18    -3.31 3.04 23  -1.088  0.7002

P value adjustment: tukey method for comparing a family of 4 estimates 
pairs(pigs.emm, by = "percent")
percent = 9:
 contrast    estimate   SE df t.ratio p.value
 fish - soy     -9.47 2.33 23  -4.059  0.0014
 fish - skim   -15.58 2.39 23  -6.526  <.0001
 soy - skim     -6.11 2.34 23  -2.613  0.0398

percent = 12:
 contrast    estimate   SE df t.ratio p.value
 fish - soy     -9.47 2.33 23  -4.059  0.0014
 fish - skim   -15.58 2.39 23  -6.526  <.0001
 soy - skim     -6.11 2.34 23  -2.613  0.0398

percent = 15:
 contrast    estimate   SE df t.ratio p.value
 fish - soy     -9.47 2.33 23  -4.059  0.0014
 fish - skim   -15.58 2.39 23  -6.526  <.0001
 soy - skim     -6.11 2.34 23  -2.613  0.0398

percent = 18:
 contrast    estimate   SE df t.ratio p.value
 fish - soy     -9.47 2.33 23  -4.059  0.0014
 fish - skim   -15.58 2.39 23  -6.526  <.0001
 soy - skim     -6.11 2.34 23  -2.613  0.0398

P value adjustment: tukey method for comparing a family of 3 estimates