Forced one-way ANOVA

Author

Clay Ford

Generate data with an interaction

Notice there are two factor variables: x1 and x2. But also notice I use the interaction() function to create a single variable called x12.

n <- 200
x1 <- sample(x = letters[1:2], size = n, replace = TRUE)
x2 <- sample(x = LETTERS[4:5], size = n, replace = TRUE)
y <- 10 + (x1=="b")*3 + (x2=="E")*4 + (x1=="b")*(x2=="E")*-4 + 
  rnorm(n = n, mean = 0, sd = 3)
d <- data.frame(y, x1, x2, x12 = interaction(x1, x2)) 
# show a few rows
d[sample(200, size = 10),]
            y x1 x2 x12
92  14.451262  b  D b.D
152  9.541859  a  D a.D
121  6.362552  b  E b.E
104  7.016574  a  D a.D
99   8.579846  b  E b.E
85  15.329268  a  E a.E
164 14.049043  a  E a.E
145 13.979861  b  E b.E
166 10.234367  a  D a.D
171 17.494604  a  E a.E

Fit model using main effects and interaction

Notice the last line of output, the omnibus F test. Null is all coefficients (except intercept) are 0.

m <- lm(y ~ x1*x2, data = d)
summary(m)

Call:
lm(formula = y ~ x1 * x2, data = d)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.4024 -2.0411 -0.2701  1.7720 10.7515 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  10.0287     0.4336  23.128  < 2e-16 ***
x1b           2.6144     0.6132   4.263 3.13e-05 ***
x2E           4.7935     0.6487   7.389 4.16e-12 ***
x1b:x2E      -5.0409     0.8888  -5.672 5.01e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.127 on 196 degrees of freedom
Multiple R-squared:  0.2203,    Adjusted R-squared:  0.2084 
F-statistic: 18.46 on 3 and 196 DF,  p-value: 1.371e-10

Fit model using only the x12 variable

This is basically a one-way ANOVA using a variable that is an interaction of two factor variables. Notice the last line of output, the omnibus F test, is equivalent to the previous model.

m2 <- lm(y ~ x12, data = d)
summary(m2)

Call:
lm(formula = y ~ x12, data = d)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.4024 -2.0411 -0.2701  1.7720 10.7515 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  10.0287     0.4336  23.128  < 2e-16 ***
x12b.D        2.6144     0.6132   4.263 3.13e-05 ***
x12a.E        4.7935     0.6487   7.389 4.16e-12 ***
x12b.E        2.3670     0.6075   3.896 0.000134 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.127 on 196 degrees of freedom
Multiple R-squared:  0.2203,    Adjusted R-squared:  0.2084 
F-statistic: 18.46 on 3 and 196 DF,  p-value: 1.371e-10

pairwise comparisons

Posthoc pairwise comparisons are identical for both models.

library(emmeans)
emmeans(m, specs = c("x1", "x2")) |>
  contrast("pairwise")
 contrast  estimate    SE  df t.ratio p.value
 a D - b D   -2.614 0.613 196  -4.263  0.0002
 a D - a E   -4.794 0.649 196  -7.389  <.0001
 a D - b E   -2.367 0.608 196  -3.896  0.0008
 b D - a E   -2.179 0.649 196  -3.359  0.0052
 b D - b E    0.247 0.608 196   0.407  0.9771
 a E - b E    2.427 0.643 196   3.772  0.0012

P value adjustment: tukey method for comparing a family of 4 estimates 
emmeans(m2, specs = c("x12")) |>
  contrast("pairwise")
 contrast  estimate    SE  df t.ratio p.value
 a.D - b.D   -2.614 0.613 196  -4.263  0.0002
 a.D - a.E   -4.794 0.649 196  -7.389  <.0001
 a.D - b.E   -2.367 0.608 196  -3.896  0.0008
 b.D - a.E   -2.179 0.649 196  -3.359  0.0052
 b.D - b.E    0.247 0.608 196   0.407  0.9771
 a.E - b.E    2.427 0.643 196   3.772  0.0012

P value adjustment: tukey method for comparing a family of 4 estimates 

Non-parametric Krusak-Wallis

We can use this approach with Kruskal-Wallis:

kruskal.test(y ~ x12, data = d)

    Kruskal-Wallis rank sum test

data:  y by x12
Kruskal-Wallis chi-squared = 49.488, df = 3, p-value = 1.027e-10