2 x 2 Contigency Table

Bilder and Loughin (2015) present the following problem:

Do students whose native language is not English pick up on authors’ efforts to use humor in illustrating points in textbooks? Students at one university were shown a passage from a textbook that was meant to be humorous. Of the 211 students in the study where English was not their native language, 118 thought the passage was humorous. Of the 206 students in the study where English was their native language, 155 thought the passage was humorous. Analyze these data to answer the research question.

To begin we manually enter the data into a matrix and save the object as d. Setting byrow = TRUE allows us to enter the data by row. Otherwise the matrix is filled column-wise. Notice the second field in each row is calculated on-the-fly by subtracting the humorous count from the total count.

d <- matrix(c(118, 211 - 118,
              155, 206 - 155), nrow = 2, byrow = TRUE)
d
     [,1] [,2]
[1,]  118   93
[2,]  155   51

Our data will be easier to work with if we name the rows and columns. We can do this with the dimnames() function. This allows us to not only name the rows and columns, but also name the dimensions. This requires using a list object.

dimnames(d) <- list("English" = c("non-native", "native"),
                    "Humor" = c("humorous", "not humorous"))
d
            Humor
English      humorous not humorous
  non-native      118           93
  native          155           51

Finally we convert our matrix to a table class. While this does not change the way the data is displayed, this will provide additional data wrangling capabilities later in this example.

d <- as.table(d)
d
            Humor
English      humorous not humorous
  non-native      118           93
  native          155           51

One question we might want to immediately ask is if there is any association between English and Humor. Does knowing whether or not someone is a non-native speaker provide information about whether or not they’ll find authors’ attempts at humor funny? One way to assess this is with the chi-square test of independence. This is easy to implement with the chisq.test().

chisq.test(d)

    Pearson's Chi-squared test with Yates' continuity correction

data:  d
X-squared = 16.363, df = 1, p-value = 5.229e-05

The small p-value provides evidence against the null of no association. These two variables seem to be associated in some way. But how are they associated?

One way to investigate is to calculate and compare proportions. Specifically we want to calculate proportions conditional on English. This is because we consider English to be the independent variable and Humor to be the dependent variable. To do this we use the proportions() function with margin = 1. (Rows are the first margin, columns are the second margin.)

proportions(d, margin = 1)
            Humor
English       humorous not humorous
  non-native 0.5592417    0.4407583
  native     0.7524272    0.2475728

Of the non-native speakers, about 0.56 found the passage humorous. Of the native speakers, about 0.75 found the passage humorous. That’s a difference of -0.191

The difference in proportions, 0.56 versus 0.75, seems large. What is the probability we would get a difference this big or bigger if there really is no difference in the population? We can answer this with a 2-sample proportion test. Base R implements this test with the prop.test() function. To use it we need marginal totals in addition to the table counts. We can get marginal totals with the addmargins() function.

addmargins(d)
            Humor
English      humorous not humorous Sum
  non-native      118           93 211
  native          155           51 206
  Sum             273          144 417

Now we have what we need to implement the 2-sample proportion test. The first argument is the total number of “successes” for each group as a vector, or the numerators of the proportions. The next argument is the total number of “trials” for each group as a vector, or the denominators of the proportions.

prop.test(x = c(118, 155), n = c(211, 206))

    2-sample test for equality of proportions with continuity correction

data:  c(118, 155) out of c(211, 206)
X-squared = 16.363, df = 1, p-value = 5.229e-05
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.28720861 -0.09916235
sample estimates:
   prop 1    prop 2 
0.5592417 0.7524272 

Notice the p-value of this test is the same as the chi-square test above. In fact, the two tests are identical. Of more interest is the 95 percent confidence interval on the difference in proportions. This tells us that the difference in proportions between non native speakers and native speakers who understand the authors’ humor is about -0.19, 95% CI [-0.29, -0.10].

One way to visualize this data is by creating a Cleveland dot plot. The dotchart() function in R allows us to easily create this plot as long as your data is a matrix or table.

dotchart(d)

The dot plot shows how counts of English change conditional on Humor. It would make more sense to show Humor counts conditional on English. We can accomplish that by transposing the table using the t() function.

dotchart(t(d))

We can clearly see the counts of students finding humor in the textbook increasing dramatically for those students who are native English speakers.

Bar plots are also an effective way to visualize data of this nature. A basic bar plot can be created using the barplot() function. Notice we transpose the table again to ensure counts are shown conditional on English.

barplot(t(d), beside = TRUE, legend.text = TRUE)

To use the {ggplot2} package to create a bar plot (Wickham 2016), we need to convert our table into a data frame. Since our data is a table object, this is easy to do with the as.data.frame() function.

df <- as.data.frame(d)
df
     English        Humor Freq
1 non-native     humorous  118
2     native     humorous  155
3 non-native not humorous   93
4     native not humorous   51

The main function to use when our data frame already contains counts is geom_col(). The scale_fill_brewer(palette = "Set2") line sets the color palette to a colorblind-friendly palette courtesy of the {RColorBrewer} package (Neuwirth 2022).

library(ggplot2)
ggplot(df) +
  aes(x = English, y = Freq, fill = Humor) +
  geom_col(position = "dodge") +
  scale_fill_brewer(palette = "Set2") +
  theme_minimal()

To visualize proportions, we need to calculate the proportions and then convert to a data frame.

df2 <- proportions(d, margin = 1) |> 
  as.data.frame(responseName = "Prop")
df2
     English        Humor      Prop
1 non-native     humorous 0.5592417
2     native     humorous 0.7524272
3 non-native not humorous 0.4407583
4     native not humorous 0.2475728

And now we can use the same {ggplot2} code above substituting “Prop” for “Freq”.

ggplot(df2) +
  aes(x = English, y = Prop, fill = Humor) +
  geom_col(position = "dodge") +
  scale_fill_brewer(palette = "Set2") +
  theme_minimal()

To compare just the proportions who found the textbook humorous, we can subset the data frame prior to creating the plot as follows:

ggplot(subset(df2, Humor == "humorous")) +
  aes(x = English, y = Prop) +
  geom_col(position = "dodge", width = 1/2) +
  theme_minimal()

To conclude this example, let’s restructure our original data to be a data frame with one observation per subject. This is often how raw data will look in real life. The uncount() function from the {tidyr} package (Wickham and Girlich 2022) allows us to accomplish this. Notice we need to work with the “df” object we created using as.data.frame() above.

library(tidyr)
df_all <- uncount(df, weights = Freq)
head(df_all)
     English    Humor
1 non-native humorous
2 non-native humorous
3 non-native humorous
4 non-native humorous
5 non-native humorous
6 non-native humorous

With our data in this structure, we can use the modified prop.test() function from the {mosaic} package (Pruim, Kaplan, and Horton 2017) to carry out the 2-sample proportion test using R’s modeling syntax. The syntax Humor | English says compare proportions of Humor conditional on English.

library(mosaic)
prop.test(~ Humor | English, data = df_all)

    2-sample test for equality of proportions with continuity correction

data:  tally(Humor ~ English)
X-squared = 16.363, df = 1, p-value = 5.229e-05
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.28720861 -0.09916235
sample estimates:
   prop 1    prop 2 
0.5592417 0.7524272 

To carry out the chi-square test of independence, we need to create a table again. We can do that with the xtabs() function before piping into the chisq.test() function.

xtabs(~ English + Humor, data = df_all) |> 
  chisq.test()

    Pearson's Chi-squared test with Yates' continuity correction

data:  xtabs(~English + Humor, data = df_all)
X-squared = 16.363, df = 1, p-value = 5.229e-05

To visualize count data in this format using {ggplot2} we need to use the geom_bar() function.

ggplot(df_all) +
  aes(x = English, fill = Humor) +
  geom_bar(position = "dodge") +
  scale_fill_brewer(palette = "Set2") +
  theme_minimal()

To visualize proportions, it makes sense to create a table first, calculate the proportions on the desired margin, convert back to a data frame, and then create the plot using geom_col(). Thanks to the pipe operator, we can do all that on-the-fly before piping into the {ggplot2} code.

xtabs(~ English + Humor, data = df_all) |> 
  proportions(margin = 1) |> 
  as.data.frame(responseName = "Prop") |> 
  ggplot() +
  aes(x = English, y = Prop, fill = Humor) +
  geom_col(position = "dodge") +
  scale_fill_brewer(palette = "Set2") +
  theme_minimal()

It’s worth noting that a 2-sample proportion test can also be analyzed using logistic regression. This is easiest to do when our data has one row per subject. Before doing this it’s important to know which levels of the categorical variables will be considered as the reference level. By default, order is determined alphabetically. So if we model Humor as a function of English, we will be modeling the probability of “not humorous”. It probably makes more sense to model the probability of “humorous”, so we relevel the variable so that “not humorous” is the reference level.

df_all$Humor <- relevel(df_all$Humor, "not humorous")

Next we model Humor as a function of English.

m <- glm(Humor ~ English, data = df_all, family = binomial)
summary(m)

Call:
glm(formula = Humor ~ English, family = binomial, data = df_all)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)     0.2381     0.1387   1.717    0.086 .  
Englishnative   0.8735     0.2128   4.105 4.05e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 537.52  on 416  degrees of freedom
Residual deviance: 520.12  on 415  degrees of freedom
AIC: 524.12

Number of Fisher Scoring iterations: 4

The coefficient for English is relative to “non-native”. The fact it is positive and “significant” tells us that native speakers are more likely to find the textbook humorous. To interpret it, we can exponentiate to obtain an odds ratio.

exp(coef(m)["Englishnative"])
Englishnative 
     2.395314 

This says the odds of a native speaker finding the textbook humorous are about 2.4 times higher than the odds of a non-native speaker finding the textbook humorous. This is a relative comparison.

To carry out a 2-sample test, which is an absolute comparison, we can use the emmeans() and pairs() functions from the {emmeans} package (Lenth 2022). We set regrid = "response" to tell the emmeans() function to calculate probabilities instead of odds ratios.

library(emmeans)
emmeans(m, specs = ~ English, regrid = "response") |> 
  pairs() 
 contrast              estimate     SE  df z.ratio p.value
 (non-native) - native   -0.193 0.0455 Inf  -4.244  <.0001

References

Bilder, Christopher R., and Thomas M. Loughin. 2015. Analysis of Categorical Data in R. Boca Raton: CRC Press.
Lenth, Russell V. 2022. Emmeans: Estimated Marginal Means, Aka Least-Squares Means. https://CRAN.R-project.org/package=emmeans.
Neuwirth, Erich. 2022. RColorBrewer: ColorBrewer Palettes. https://CRAN.R-project.org/package=RColorBrewer.
Pruim, Randall, Daniel T Kaplan, and Nicholas J Horton. 2017. “The Mosaic Package: Helping Students to ’Think with Data’ Using R.” The R Journal 9 (1): 77–102. https://journal.r-project.org/archive/2017/RJ-2017-024/index.html.
Wickham, Hadley. 2016. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
Wickham, Hadley, and Maximilian Girlich. 2022. Tidyr: Tidy Messy Data. https://CRAN.R-project.org/package=tidyr.

  1. If we calculate proportions down the columns (margin = 2), we get the proportions conditional on Humor. Of the students who found the passage humorous, about 0.43 were non-native while 0.57 were native.↩︎