Example Analysis using Simulated Data

Author

Clay Ford

Look at data

I have simulated some data based on our conversation using R. I have named the data “d” for simplicity. We only have one data set and it’s less typing. Below I use the head() function to look at the “head” of the data (ie, first 6 rows). We see subject 1 went through all 6 rotations, has “red” personality type, and got scores between 90 and 93. The scores are probably lower than what you anticipate. But for sake of demonstration we’ll use this data.

head(d)

  rotation id personality score
1        A  1         red    90
2        B  1         red    91
3        C  1         red    91
4        D  1         red    93
5        E  1         red    93
6        F  1         red    92

How many rows does our data have? We can use the nrow() function

nrow(d)

[1] 2160

How many subjects do we have? We need to get the unique count of ID. First we use the unique() function to get the distinct values of the ID column. We use the $ operator to reach into our data frame, d, and extract the ID column. Then we use the length() function to get the “length” of the result. We have 360 subjects.

length(unique(d$id))

[1] 360

Let’s tally up the unique personality types using the table() function.

table(d$personality)


  blue  green    red yellow 
   474    534    594    558

That’s too many. Remember, we have one row per subject per rotation. Since we have 6 rotations we can divide by 6 to get the unique count of personality types.

table(d$personality)/6


  blue  green    red yellow 
    79     89     99     93

Calculate mean scores by rotation and personality type. Below we use tapply() to create a table of results. The first argument is the variable we want to summarize. The second is a list of categorical variables that determine the groups, and the last is the function we want to apply to all the groups.

tapply(d$score, list(d$rotation, d$personality), mean)

      blue    green      red   yellow
A 90.12658 92.03371 92.35354 90.79570
B 88.18987 90.26966 91.71717 88.84946
C 88.15190 90.42697 90.85859 89.59140
D 90.54430 94.13483 94.05051 92.89247
E 88.63291 90.52809 90.69697 89.69892
F 90.84810 92.44944 93.53535 91.25806

A visualization of raw data might be easier to digest. Below we use an R package, called ggplot2, that provides additional functionality to R. You can think of an R package like an app you might install on your phone.

library(ggplot2)
ggplot(d) +
  aes(x = rotation, y = score) +
  geom_jitter(width = 0.1) +
  facet_wrap(~personality)

Analysis

Fit a multilevel model and test each predictor’s contribution to the model, including the interaction. The lme4 package provides the lmer() function for fitting multilevel models. The car package provides the Anova() function for testing contributions of predictors in a model. All predictors appear “significant”.

library(lme4)
library(car)
m <- lmer(score ~ rotation + personality + rotation:personality + (1|id), 
          data = d)
Anova(m)

Analysis of Deviance Table (Type II Wald chisquare tests)

Response: score
                       Chisq Df Pr(>Chisq)    
rotation             577.317  5  < 2.2e-16 ***
personality          185.432  3  < 2.2e-16 ***
rotation:personality  45.363 15   6.71e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Calculate means from the model. The emmeans package makes this easier. The syntax pairwise ~ personality | rotation says to perform pairwise comparisons of scores between personality types, conditional on rotation. The highest means are green and red personality types in rotation D.

library(emmeans)
means <- emmeans(m, pairwise ~ personality | rotation)
means$emmeans

rotation = A:
 personality emmean    SE   df lower.CL upper.CL
 blue          90.1 0.296 1855     89.5     90.7
 green         92.0 0.279 1855     91.5     92.6
 red           92.4 0.264 1855     91.8     92.9
 yellow        90.8 0.273 1855     90.3     91.3

rotation = B:
 personality emmean    SE   df lower.CL upper.CL
 blue          88.2 0.296 1855     87.6     88.8
 green         90.3 0.279 1855     89.7     90.8
 red           91.7 0.264 1855     91.2     92.2
 yellow        88.8 0.273 1855     88.3     89.4

rotation = C:
 personality emmean    SE   df lower.CL upper.CL
 blue          88.2 0.296 1855     87.6     88.7
 green         90.4 0.279 1855     89.9     91.0
 red           90.9 0.264 1855     90.3     91.4
 yellow        89.6 0.273 1855     89.1     90.1

rotation = D:
 personality emmean    SE   df lower.CL upper.CL
 blue          90.5 0.296 1855     90.0     91.1
 green         94.1 0.279 1855     93.6     94.7
 red           94.1 0.264 1855     93.5     94.6
 yellow        92.9 0.273 1855     92.4     93.4

rotation = E:
 personality emmean    SE   df lower.CL upper.CL
 blue          88.6 0.296 1855     88.1     89.2
 green         90.5 0.279 1855     90.0     91.1
 red           90.7 0.264 1855     90.2     91.2
 yellow        89.7 0.273 1855     89.2     90.2

rotation = F:
 personality emmean    SE   df lower.CL upper.CL
 blue          90.8 0.296 1855     90.3     91.4
 green         92.4 0.279 1855     91.9     93.0
 red           93.5 0.264 1855     93.0     94.1
 yellow        91.3 0.273 1855     90.7     91.8

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95

Comparisons between all personality means conditional on rotation. There are many “significant” differences. Are they meaningful or practical differences? That’s for you to decide, I suppose. For example, the difference between blue and green personality types in rotation A is -1.9 (first row below). It’s statistically significant, but is it important?

means$contrasts

rotation = A:
 contrast       estimate    SE   df t.ratio p.value
 blue - green    -1.9071 0.406 1855  -4.692  <.0001
 blue - red      -2.2270 0.397 1855  -5.613  <.0001
 blue - yellow   -0.6691 0.402 1855  -1.663  0.3437
 green - red     -0.3198 0.384 1855  -0.833  0.8390
 green - yellow   1.2380 0.390 1855   3.175  0.0083
 red - yellow     1.5578 0.380 1855   4.102  0.0002

rotation = B:
 contrast       estimate    SE   df t.ratio p.value
 blue - green    -2.0798 0.406 1855  -5.116  <.0001
 blue - red      -3.5273 0.397 1855  -8.891  <.0001
 blue - yellow   -0.6596 0.402 1855  -1.639  0.3567
 green - red     -1.4475 0.384 1855  -3.768  0.0010
 green - yellow   1.4202 0.390 1855   3.642  0.0016
 red - yellow     2.8677 0.380 1855   7.552  <.0001

rotation = C:
 contrast       estimate    SE   df t.ratio p.value
 blue - green    -2.2751 0.406 1855  -5.597  <.0001
 blue - red      -2.7067 0.397 1855  -6.823  <.0001
 blue - yellow   -1.4395 0.402 1855  -3.578  0.0020
 green - red     -0.4316 0.384 1855  -1.124  0.6749
 green - yellow   0.8356 0.390 1855   2.143  0.1401
 red - yellow     1.2672 0.380 1855   3.337  0.0048

rotation = D:
 contrast       estimate    SE   df t.ratio p.value
 blue - green    -3.5905 0.406 1855  -8.833  <.0001
 blue - red      -3.5062 0.397 1855  -8.838  <.0001
 blue - yellow   -2.3482 0.402 1855  -5.836  <.0001
 green - red      0.0843 0.384 1855   0.220  0.9963
 green - yellow   1.2424 0.390 1855   3.186  0.0080
 red - yellow     1.1580 0.380 1855   3.049  0.0124

rotation = E:
 contrast       estimate    SE   df t.ratio p.value
 blue - green    -1.8952 0.406 1855  -4.662  <.0001
 blue - red      -2.0641 0.397 1855  -5.203  <.0001
 blue - yellow   -1.0660 0.402 1855  -2.649  0.0405
 green - red     -0.1689 0.384 1855  -0.440  0.9716
 green - yellow   0.8292 0.390 1855   2.126  0.1451
 red - yellow     0.9980 0.380 1855   2.628  0.0429

rotation = F:
 contrast       estimate    SE   df t.ratio p.value
 blue - green    -1.6013 0.406 1855  -3.939  0.0005
 blue - red      -2.6873 0.397 1855  -6.774  <.0001
 blue - yellow   -0.4100 0.402 1855  -1.019  0.7384
 green - red     -1.0859 0.384 1855  -2.827  0.0245
 green - yellow   1.1914 0.390 1855   3.055  0.0122
 red - yellow     2.2773 0.380 1855   5.997  <.0001

Degrees-of-freedom method: kenward-roger 
P value adjustment: tukey method for comparing a family of 4 estimates

We can make a plot of estimated means with 95% confidence intervals.

plot(means)

An effect plot can help us visualize the interaction in the model. Below we use yet another package, ggeffects, to do this. The interactions seem slight. If there was no interaction we would expect parallel lines. But notice how the lines diverge in places, particularly the red and green personality types. This says the effect of personality depends on rotation, and vice versa.

library(ggeffects)
plot(ggpredict(m, terms = c("rotation", "personality")), 
     connect.lines = TRUE) + 
  scale_color_manual(values = c("blue", "green4", "red", "yellow2"))

The blue personality type appears to perform worse through all rotations, while rotations D and F seem to have higher scores on average.

Code to generate data

If interested, here is the R code I used to generate the fake data we just analyzed. If you’ve never used R before it may look cryptic. Happy to explain it if you want to learn more.

Show the code

# simulate data
n <- 360
id <- 1:n
rotation <- LETTERS[1:6]
set.seed(1)
personality <- sample(c("red","yellow","blue","green"), 
                      size = n, replace = TRUE)
d <- expand.grid(rotation = rotation, id = id)
d$personality <- factor(personality[d$id])
# subject-specific noise
subj_error <- rnorm(n = n, mean = 0, sd = 2)
# add some fixed effects and a couple of interactions
d$score <- (90 + subj_error[d$id]) + 2*(d$personality == "green") +
  3*(d$personality == "red") + -3*(d$personality == "blue") +
  -4*(d$rotation == "B") + -3*(d$rotation == "C") +
  4*(d$rotation == "D") + -3*(d$rotation == "E") + 1.5*(d$rotation == "F") +
  2*(d$personality == "red")*(d$rotation == "B") +
  -3*(d$personality == "blue")*(d$rotation == "D") +
  rnorm(n = nrow(d), mean = 0, sd = 5)
# rescale scores to 80-100 and round to whole numbers
# https://stats.stackexchange.com/a/281165/46334
d$score <- (100 - 80)*((d$score - min(d$score))/(max(d$score) - min(d$score))) + 80
d$score <- round(d$score)