Statistical Analysis Approaches

Author

Clay Ford

Request

KL: I’ve recently run into some issues regarding statistical analyses of some data. I have been troubleshooting by Googling and asking a colleague, but we would like to bounce ideas off someone who has more knowledge than us. I was wondering if it was possible to set up an appointment? Following are some details of the data I’m working with:

  • Data from two different measurements were collected for two different genotypes

    • organized by measurement, comparing one genotype against anothee
  • For both measurements, data is non-normal and has unequal variance

  • For data collected it may be possible that we need to further subset the measurement data as a possible difference between genotypes may be from a subset of data points

  • Have been using GraphPad Prism v9.5.0

  • Unsure of:

    • whether to log transform the data and how that affects the biological & statistical interpretation
    • whether/how to parse out outliers versus a potential subset of data points
    • how to address the unequal variance

Import and visualize data

d_count <- read.csv("63x 24hpf # Phago (by cell).csv")
d_avg <- read.csv("63x 24hpf Avg Phago Vol (by cell).csv")

# reshape data
library(tidyr)
d_count <- pivot_longer(d_count, cols = 1:2, 
                        names_to = "genotype", values_to = "count")
d_count$genotype <- factor(d_count$genotype)
d_count <- na.omit(d_count)
d_avg <- pivot_longer(d_avg, cols = 1:2, 
                      names_to = "genotype", values_to = "avg")
d_avg$genotype <- factor(d_avg$genotype)
d_avg <- na.omit(d_avg)
library(ggplot2)
ggplot(d_count) +
  aes(x = genotype, y = count, fill = genotype) +
  geom_jitter(width = 0.05) +
  geom_violin(alpha = 1/4) 

ggplot(d_avg) +
  aes(x = genotype, y = avg, fill = genotype) +
  geom_jitter(width = 0.05) +
  geom_violin(alpha = 1/4) 

Boxplots identify outliers as those observations that are more than 1.5 times the IQR (Interquartile Range) away from the median. The IQR is the range (max - min) of the middle 50% of the data. Multiply that value by 1.5. Values beyond that are labeled a statistical “outlier” relative to the rest of the data. They may not actually be an outlier in any scientific sense.

ggplot(d_count) +
  aes(x = genotype, y = count) +
  geom_boxplot()

ggplot(d_avg) +
  aes(x = genotype, y = avg) +
  geom_boxplot()

We can use the boxplot() function to extract the outlying values and which “group” they belong to. Group 1 is blb, group 2 is WT.het.

box1 <- boxplot(count ~ genotype, data = d_count, plot = FALSE)
cbind(box1$group, box1$out)
      [,1] [,2]
 [1,]    1   12
 [2,]    1   11
 [3,]    1   14
 [4,]    2   13
 [5,]    2   17
 [6,]    2   19
 [7,]    2   16
 [8,]    2   18
 [9,]    2   27
[10,]    2   14
[11,]    2   14
[12,]    2   25
[13,]    2   19
[14,]    2   13
[15,]    2   31
[16,]    2   16
box2 <- boxplot(avg ~ genotype, data = d_avg, plot = FALSE)
cbind(box2$group, box2$out)
      [,1]      [,2]
 [1,]    1  9.928139
 [2,]    1 11.007095
 [3,]    1  9.605910
 [4,]    1  9.124490
 [5,]    1 17.307765
 [6,]    1  8.460956
 [7,]    1 11.392602
 [8,]    2  8.217241
 [9,]    2  9.123331
[10,]    2  9.516083
[11,]    2  7.909019
[12,]    2  7.529833
[13,]    2  8.358252
[14,]    2 11.727533
[15,]    2 10.826498
[16,]    2 11.093107
[17,]    2 28.236500
[18,]    2  8.752085

Some analysis approaches

Traditional T-test

Null: means of two groups are the same. Small p-values provide against this hypothesis. The confidence interval on the difference of means is more informative.

t_out_count <- t.test(count ~ genotype, data = d_count)
t_out_count

    Welch Two Sample t-test

data:  count by genotype
t = -0.60912, df = 177.56, p-value = 0.5432
alternative hypothesis: true difference in means between group blb and group WT.het is not equal to 0
95 percent confidence interval:
 -1.1955546  0.6315816
sample estimates:
   mean in group blb mean in group WT.het 
            4.560606             4.842593 
t_out_avg <- t.test(avg ~ genotype, data = d_avg)
t_out_avg

    Welch Two Sample t-test

data:  avg by genotype
t = 2.0846, df = 98.367, p-value = 0.0397
alternative hypothesis: true difference in means between group blb and group WT.het is not equal to 0
95 percent confidence interval:
 0.04576414 1.85876651
sample estimates:
   mean in group blb mean in group WT.het 
            3.292480             2.340214 

Log-transformed T-test

t_out_Lcount <- t.test(log(count) ~ genotype, data = d_count)
t_out_Lcount

    Welch Two Sample t-test

data:  log(count) by genotype
t = 1.0507, df = 143.54, p-value = 0.2952
alternative hypothesis: true difference in means between group blb and group WT.het is not equal to 0
95 percent confidence interval:
 -0.08519907  0.27855733
sample estimates:
   mean in group blb mean in group WT.het 
            1.344350             1.247671 
t_out_Lavg <- t.test(log(avg) ~ genotype, data = d_avg)
t_out_Lavg

    Welch Two Sample t-test

data:  log(avg) by genotype
t = 2.5549, df = 119.19, p-value = 0.01188
alternative hypothesis: true difference in means between group blb and group WT.het is not equal to 0
95 percent confidence interval:
 0.1145066 0.9033791
sample estimates:
   mean in group blb mean in group WT.het 
          0.56216633           0.05322348 

The average of log-transformed values is called the geometric mean. We exponentiate to return to the original scale of the data.

exp(t_out_Lavg$estimate)
   mean in group blb mean in group WT.het 
            1.754469             1.054665 

The exponentiated confidence interval on the difference in log-transformed means is for the ratio of the geometric means. The blb genotype geometric mean is estimated to be anywhere from 12% to 2.5 times bigger than the mean for the WT genotype.

exp(t_out_Lavg$conf.int)
[1] 1.121320 2.467928
attr(,"conf.level")
[1] 0.95

Wilcoxon test

Null: the distributions of the two groups are the same. A small p-value provides evidence against this hypothesis. The confidence interval estimates the median of the difference between a sample from one genotype and a sample from the other genotype.

w_out_count <- wilcox.test(count ~ genotype, data = d_count, conf.int = TRUE)
w_out_count

    Wilcoxon rank sum test with continuity correction

data:  count by genotype
W = 7661, p-value = 0.3543
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -4.284189e-05  9.999645e-01
sample estimates:
difference in location 
          4.392141e-05 
w_out_avg <- wilcox.test(avg ~ genotype, data = d_avg, conf.int = TRUE)
w_out_avg

    Wilcoxon rank sum test with continuity correction

data:  avg by genotype
W = 8624, p-value = 0.009903
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 0.142138 1.266976
sample estimates:
difference in location 
             0.6963284 

The results agree with the previous two approaches.

Permutation test

In a permutation test we hold the group labels fixed, shuffle the values around, and then calculate the difference in means between the two groups. If there is no difference in group means then it shouldn’t matter how we label the values. We do this many times and then see how often shuffling the values resulted in a larger test statistic (ie, difference in means) than what we observed. If there is no difference in group means, then we should see this happen a lot. If there is a difference in group means, then reshuffling the labels will make a difference and result in few instance of exceeding the observed difference in means.

wt <- d_avg$genotype == "WT.het"  # the wt labels
blb <- !wt  # The blb labels
avg <- d_avg$avg  # the observed values
# observed difference in means
test_stat <- mean(avg[wt]) - mean(avg[blb])
test_stat
[1] -0.9522653

Now do a for loop 9999 times randomly reshuffling the values and calculate the difference in means each time.

meandiffs <- double(9999)
for(i in 1:length(meandiffs)){
  savg <- sample(avg)
  meandiffs[i] <- mean(savg[wt] - mean(savg[blb]))
}

Calculate proportion of test statistics larger than or smaller than -0.9522653.

greater <- abs(meandiffs) > abs(test_stat)
mean(greater)
[1] 0.02410241

The proportion of test statistics is anologous to a p-value and is the area in the histogram to the left and the right of the straight lines, which are the positive and negative values of the observed difference in means.

hist(meandiffs)
abline(v = test_stat)
abline(v = -test_stat)

This is fairly close, but a bit larger, than the p-value calculated by the first t-test.

The coin package in R will do this for us if you don’t feel like coding a for loop.

library(coin)
independence_test(avg ~ genotype, data= d_avg)

    Asymptotic General Independence Test

data:  avg by genotype (blb, WT.het)
Z = 2.2052, p-value = 0.02744
alternative hypothesis: two.sided

The result for the count data agrees with all previous analyses.

independence_test(count ~ genotype, data= d_count)

    Asymptotic General Independence Test

data:  count by genotype (blb, WT.het)
Z = -0.47708, p-value = 0.6333
alternative hypothesis: two.sided

Bootstrap analysis : difference in means

In a bootstrap analysis we resample the data (with replacement) over and over and calculate a statistic of interest such as a difference in means or medians. Below we resample the row numbers and then use those row numbers to select observations from the data frame using indexing brackets in the call to aggregate(). We then take the difference means. We use the replicate() function to replicate this chunk of code 999 times. When finished we find the 2.5 and 97.5 percentiles to form a 95% confidence interval on the difference in means.

boot_out <- replicate(n = 999, expr = {
  i <- sample(nrow(d_avg), replace = TRUE)
  d <- aggregate(avg ~ genotype, data = d_avg[i,], FUN = mean)
  d$avg[1] - d$avg[2] 
  })
quantile(boot_out, probs = c(0.025, 0.975))
      2.5%      97.5% 
0.09464155 1.81291284 

The boot package makes this a little easier and provides more options for the confidence interval. The “bca” type is the adjusted bootstrap percentile (BCa) interval.

library(boot)
diffmeans <- function(d, i){
  d <- aggregate(avg ~ genotype, data = d[i,], FUN = mean)
  d$avg[1] - d$avg[2]
}
b_out <- boot(data = d_avg, statistic = diffmeans, R = 999)
boot.ci(b_out, type = c("perc", "bca"))
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 999 bootstrap replicates

CALL : 
boot.ci(boot.out = b_out, type = c("perc", "bca"))

Intervals : 
Level     Percentile            BCa          
95%   ( 0.0968,  1.9252 )   ( 0.1224,  1.9846 )  
Calculations and Intervals on Original Scale

Again the differences between the means in the count data is so small we can’t tell which one is bigger.

diffmeans <- function(d, i){
  d <- aggregate(count ~ genotype, data = d[i,], FUN = mean)
  d$count[1] - d$count[2]
}
b_out2 <- boot(data = d_count, statistic = diffmeans, R = 999)
boot.ci(b_out2, type = c("perc", "bca"))
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 999 bootstrap replicates

CALL : 
boot.ci(boot.out = b_out2, type = c("perc", "bca"))

Intervals : 
Level     Percentile            BCa          
95%   (-1.2952,  0.6346 )   (-1.2584,  0.6798 )  
Calculations and Intervals on Original Scale

Bootstrap analysis : difference in medians

We might also want to look at a difference in medians. We simply change mean to median and re-run the code.

diffmedian <- function(d, i){
  d <- aggregate(avg ~ genotype, data = d[i,], FUN = median)
  d$avg[1] - d$avg[2]
}
b_out <- boot(data = d_avg, statistic = diffmedian, R = 999)
boot.ci(b_out, type = c("perc", "bca"))
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 999 bootstrap replicates

CALL : 
boot.ci(boot.out = b_out, type = c("perc", "bca"))

Intervals : 
Level     Percentile            BCa          
95%   ( 0.174,  1.835 )   ( 0.272,  1.969 )  
Calculations and Intervals on Original Scale
diffmedian <- function(d, i){
  d <- aggregate(count ~ genotype, data = d[i,], FUN = median)
  d$count[1] - d$count[2]
}
b_out2 <- boot(data = d_count, statistic = diffmedian, R = 999)
boot.ci(b_out2, type = c("perc", "bca"))
Warning in norm.inter(t, adj.alpha): extreme order statistics used as endpoints
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 999 bootstrap replicates

CALL : 
boot.ci(boot.out = b_out2, type = c("perc", "bca"))

Intervals : 
Level     Percentile            BCa          
95%   (-1.0,  1.5 )   (-1.0,  0.0 )  
Calculations and Intervals on Original Scale
Warning : BCa Intervals used Extreme Quantiles
Some BCa intervals may be unstable

Notice the warning issued when this is tried with the count data. I think this is probably due to the fact that the counts are highly distinct and skewed and that there just isn’t much difference in the medians, as seen when we compare summaries of counts between the two genotypes.

tapply(d_count$count, d_count$genotype, summary)
$blb
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   3.000   4.000   4.561   5.750  14.000 

$WT.het
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   2.000   4.000   4.843   6.000  31.000 

Conclusion

All of these results agree in substance. All show a difference in means/medians between the averages and show no differences in means/medians between the counts. In my opinion it hardly makes a difference which one you pick, though I suppose because of the skewed nature of your data reviewers will probably want to see something besides a basic t-test on the raw data.

However, as we discussed in the meeting, I think your analysis could be improved by using all the data in mixed-effect model. Some of the “significant” results of the average data could be due to a few large raw values skewing the average value for a cell.