Demonstration of using the hetcor()
function in the
{polycor} package using simulated data. From the help page: “hetcor
computes a heterogenous correlation matrix, consisting of Pearson
product-moment correlations between numeric variables, polyserial
correlations between numeric and ordinal variables, and polychoric
correlations between ordinal variables.”
Load some packages.
library(lavaan) # for getCov() function
library(mvtnorm) # for generating multivariate data
library(scales) # for rescale() function
library(polycor) # for hetcor() function
First create a covariance matrix. The {lavaan} package has a nice
function for this. First create the lower portion of the matrix as a
text string with line breaks, then use the getCov()
function. I’m creating a correlation matrix to make life easier.
lower <- '
1
0.5 1
0.6 -0.3 1
0.8 0.2 0.7 1'
S <- getCov(lower)
S
## V1 V2 V3 V4
## V1 1.0 0.5 0.6 0.8
## V2 0.5 1.0 -0.3 0.2
## V3 0.6 -0.3 1.0 0.7
## V4 0.8 0.2 0.7 1.0
(See also the examples on the help page for hetcor()
for
an alternative way to generate a random covariance matrix using matrix
algebra.)
Next generate some multivariate normal data using our covariance matrix and convert to data frame
set.seed(123)
d <- as.data.frame(rmvnorm(n = 1000 * 4, mean = rep(1, 4), sigma = S))
Create ordered likert-type variables that range from 1 - 5. The
rescale()
function is handy for this.
d$V1 <- factor(round(rescale(d$V1, to = c(1,5))))
d$V2 <- factor(round(rescale(d$V2, to = c(1,5))))
Glance at the data.
head(d)
## V1 V2 V3 V4
## 1 3 3 2.2048383 1.3610639
## 2 3 4 0.5550635 0.3054864
## 3 3 3 2.0378567 1.4166686
## 4 4 3 1.2660335 2.4787838
## 5 3 2 2.0807018 0.8691973
## 6 2 3 -0.4157929 -0.4213679
Now we’re ready to use the hetcor()
function to generate
a correlation matrix for this data. It will automatically calculate
either polychoric, polyserial, or pearson correlations depending on data
type.
cor_d <- hetcor(d)
cor_d
##
## Two-Step Estimates
##
## Correlations/Type of Correlation:
## V1 V2 V3 V4
## V1 1 Polychoric Polyserial Polyserial
## V2 0.4622 1 Polyserial Polyserial
## V3 0.6193 -0.305 1 Pearson
## V4 0.8053 0.1762 0.7117 1
##
## Standard Errors:
## V1 V2 V3
## V1
## V2 0.01591
## V3 0.01039 0.01595
## V4 0.006368 0.01723 0.007805
##
## n = 4000
##
## P-values for Tests of Bivariate Normality:
## V1 V2 V3
## V1
## V2 0.9733
## V3 0.916 0.6441
## V4 0.988 0.1464 0.862
If desired we can extract the correlation matrix from the
hetcor
object.
cor_d$correlations
## V1 V2 V3 V4
## V1 1.0000000 0.4622263 0.6192562 0.8053214
## V2 0.4622263 1.0000000 -0.3050213 0.1761603
## V3 0.6192562 -0.3050213 1.0000000 0.7116983
## V4 0.8053214 0.1761603 0.7116983 1.0000000
This can be useful creating visualizations such as the kind available with the {corrplot} package.
library(corrplot)
corrplot(cor_d$correlations, type = 'upper', diag = FALSE)