UVA StatLab Statistical Notes

Reports generated for various consultations for the UVA StatLab.

Unless otherwise noted, reports are by Clay Ford.

2025

Johnson-Neyman intervals (2025-11-17)
Demonstration of how to calculate Johnson-Neyman intervals using the {modelbased} package. Uses simulated data. Based on an example in one of the {modelbased} vignettes.
Measurement Invariance Demo (2025-06-26)
Example from Chapter 4 of Latent Variable Modeling Using R (Beaujean 2014), Latent Variable Models with Multiple Groups. The example examines if the structure of the WISC-III scale is the same in children with and without manic symptoms. Book code
Extracting data from UpSet Plot (2025-04-10)
How to extract specific interactions from a data frame identified in an UpSet plot.
Demo of Tukey HSD (2025-04-02)
Quick demo of how Tukey HSD corrects p-values for multiple comparisons.
Understanding the log link in a Poisson model (2025-03-21)
Why we use the log link with Poisson models (i.e., family = poisson(link = "log")).
Investigating Multicolinearity (2025-03-19)
Investigating multicolinearity and challenging idea of inspecting pairwise correlations before fitting a model and dropping highly correlated predictors.
analyze correlational data with treatments (2025-02-03)
Assessing correlation between multiple sets of variables stratified by a treatment variable. Uses the r.test() function from the {psych} package.
Repeated Measures ANOVA examples (2025-01-29)
Two examples of repeated measures ANOVA: (1) Using raw follow-up data as the outcome with baseline as a covariate, and (2) Using change scores as the outcome. I prefer the first method.
Comparing self-starting Michaelis-Menten Models (2025-01-15)
How to compare two fitted curves using an interaction in a nlme::gnls() model.
Understanding the {ggeffects} predict_response() margin argument (2025-01-10)
Elaborating on the {ggeffects} vignette, Difference Between Marginalization Methods: The margin Argument.

2024

Intro to Mixed-Effect Modeling (2024-12-18)
Introduces mixed-effect modeling using simulated data and compares the results to wrong and older methods.
Paired t-tests as mixed-effect model (2024-11-22)
How to run many paired t-tests as a single mixed-effect model. Uses simulated data.
tidyLPA notes (2024-11-01, Clay Ford and Lauren Brideau)
Notes on using the {tidyLPA} package and how it relates to the {mclust} package.
Mixed-Effect Modeling for ANOVA (2024-10-08)
Demo of implementing ANOVA analysis for multilevel data.
Using polycor::hetcor (2024-09-27)
Demonstration of using the hetcor() function in the {polycor} package using simulated data.
Polyserial and Polychoric Correlation (2024-09-24)
An intro to polyserial and polychoric correlation using the {polycor} package.
Paired t-test as linear model (2024-09-05)
How to perform a paired t-test as a linear model and calculate Cohen’s d. Also includes an example of calculating Cohen’s d following multiple imputation.
Marginal effects and plots for 2-way and 3-way interactions (2024-08-19)
Demo of Bayesian modeling of 2-way and 3-way interactions (via {rstanarm}) using the {emmeans} and {ggeffects} packages. Uses simulated data.
Post-hoc Comparisons with non-linear models (2024-07-15)
Simple demo of using emmeans() on a non-linear predictor.
Demo of Term plots (2024-06-25)
Demonstration of using term plots to detect departures from linearity for coefficients. Shows the base R termplot() and car::crPlots() functions.
Assessing residuals with lineup lots (2024-04-05)
Using lineup plots to better interpret a QQ plot of model residuals.
Compact Letter Displays (2024-03-24)
How to create and interpret Compact Letter Displays, though some people don’t like them.
Complex reshaping example (2024-02-15)
The data set in question had six different sets of variables that needed to be reshaped into “long” format. Each set was collected over five waves. This shows how to do the reshaping using pivot_longer().
Multivariate Paired Hotelling T Square Test (2024-02-27)
How to implement a Multivariate Paired Hotelling T Square Test in R and SPSS.
VarCov Matrices for Random Effects (2024-02-14)
In this note I show how to suppress estimation of covariance random effects when estimating random effects for a categorical predictor.

2023

Forced One-way ANOVA (2023-12-01)
Convert a two-way ANOVA into a one-way ANOVA.
Multilevel Quantile Regression (2023-10-23)
Example of Multilevel Quantile Regression using {lqmm} and simulated data.
Data wrangling nc4 files (2023-10-18)
Example of extracting values from nc4 object (netCDF) and converting to data frame.
Recoding example (2023-10-06)
Recoding example using a lookup table.
Multi-level model example using simulated data (2023-10-03)
Example of multi-level model written for someone who was brand new to R. Includes demo of {emmeans} and {ggeffects}.
Multilevel model example with 3 nested levels (2023-09-15)
Example of a multilevel model with 3 levels of nested groups. Uses simulated data.
Data wrangling consultation (2023-08-28)
Data wrangling tasks: (1) drop any groups with less than 10 rows, (2) drop rows with runs of zeroes greater than or equal to 5.
Examples of emmeans (2023-03-24)
Examples of using emmeans after fitting mixed-effect model, including estimating differences in differences.
Various t-test alternatives (2023-02-21)
Shows alternatives to t-tests, including log-transformed t-tests, Wilcoxon, permutation tests, and bootstrap.
Dummy coding versus effects coding (2023-01-07)
Demonstrates differences in coefficient interpretations for dummy coding versus effects coding. Uses simulated data.

2022

Multilevel model for discrete ordered data (2022-11-04)
Demo of fitting a multilevel model for an ordered categorical outcome. Uses the {ordinal} and {emmeans} packages.
Split violin plots (2022-10-18)
Demo of how to create split violin plots using introdataviz::geom_split_violin()
The stats::filter() function (2022-10-12)
How to use the the base R filter() function to calculate moving averages.
The normality assumption (2022-09-21, Jacob Goldstein-Greenwood)
For t-tests and linear models, we’re not worried about the overall normality of the dependent variable. We care about normality around conditional means.
Logistic regression interactions (2022-09-19)
How to interpret and visualize 2-level categorical variable interactions in a logistic regression model.
Nested Factorial Experiment (2022-09-19)
Example of a Nested Factorial Experiment from Statistical Models in S.
How to adjust p-values using R (2022-09-13)
Quick demo of how to use Holm method to adjust p-values.
EFA with binary data (2022-07-28)
Example of performing exploratory factor analysis with binary data. Uses simulated data.
Three approaches to logistic regression in R (2022-07-18)
Demo of three ways to fit a logistic regression model in R.
Comparing a mediation analysis and with a t-test (2022-05-25)
The difference in means from a t-test is the “Total Effect” in mediation analysis.
Dangers of only looking at R-squared (2022-05-09)
Demo of how R-squared can be misleadingly high despite a poor-fitting model. Includes demonstrations of non-linear models.
Simulate beta regression model (2022-05-22)
Simulate data from a beta distribution, show how to analyze the data using beta regression, and then how to repeatedly simulate and analyze the data to estimate power given a sample size.
Mediation Analysis with Multiple Imputation (2022-04-25)
An attempt at mediation analysis with multiply imputed data using the {mice} and {mediation} packages.
Plotting logistic regression probabilities in Python (2022-04-15)
How to create an effect plot for a binary predictor using a logistic regression model in Python. Uses matplotlib and statsmodels.
Splines in mixed-effect models (2022-04-13)
Demo of fitting splines in mixed-effect models. Also includes demo of piecewise splines using the {segmented} package. Uses simulated data.
Explaining large confidence intervals (2022-04-05)
Student wondered why confidence intervals on marginal means were so large. Due to small estimated degrees of freedom.
Nomogram demo (2022-03-31)
This demonstration is a modification of the example available on the nomogram() help page from the {rms} package. It demonstrates a nomogram for a logistic regression model using simulated data.
mixed-effect proportional odds mode (2022-02-25)
An example of a mixed-effect proportional odds mode using simulated data and the clmm() function from the {ordinal} package.
Robustness of t-test (2022-01-07)
Simulation demonstrating robustness of t-test to non-normal data.
ANCOVA example (2022-01-07)
Quick demo of using Analysis of Covariance to compare trends over time.

2021

Identifying peaks in a line plot (2021-06-14)
One possible method of identifying peaks in a line plot using the findpeaks() function in the {pracma} package.
Bootstrapped diff in means (2021-12-11)
Example of implementing a bootstrapped difference in means as an alternative to a t-test using the {boot} package in R.
Multilevel Model power analysis (2021-10-04)
Power analysis for a multilevel model with a Likert outcome ranging from 1-5 and a simple binary treatment variable. Presents two approaches: (1) manual simulation, and (2) the {simr} package.
logistic regression for a 2x2 table (2021-06-09)
Example of using logistic regression to get odds ratio, confidence interval on OR, and the p-value testing null if OR is 1.
Plotting Derivatives (2021-05-17)
Calculating and plotting first and second derivatives.
Comparing Trends with ANCOVA (2021-04-22)
Comparing slopes via ANCOVA and {emmeans} using simulated data.
Demo of Generalized Linear Mixed Model (2021-02-12)
A demonstration of binomial GLMM using simulated data.
CIs for Cramer’s V (2021-01-20)
Demo of calculating CIs for Cramer’s V for a small sample.

2020

Normality is for Residuals (2020-11-02)
The normality assumption in linear modeling (multiple regression) is on the residuals, not the dependent or independent variables.
Stepwise Model Selection performing poorly (2020-09-01)
Demo of stepwise model selection performing poorly in the presence of collinearity. Stepwise selection should be avoided anyway.