Demonstration of using the hetcor() function in the {polycor} package using simulated data. From the help page: “hetcor computes a heterogenous correlation matrix, consisting of Pearson product-moment correlations between numeric variables, polyserial correlations between numeric and ordinal variables, and polychoric correlations between ordinal variables.”

Load some packages.

library(lavaan) # for getCov() function
library(mvtnorm) # for generating multivariate data
library(scales) # for rescale() function
library(polycor) # for hetcor() function

First create a covariance matrix. The {lavaan} package has a nice function for this. First create the lower portion of the matrix as a text string with line breaks, then use the getCov() function. I’m creating a correlation matrix to make life easier.

lower <- '
1
0.5 1
0.6 -0.3 1
0.8 0.2 0.7 1'
S <- getCov(lower)
S
##     V1   V2   V3  V4
## V1 1.0  0.5  0.6 0.8
## V2 0.5  1.0 -0.3 0.2
## V3 0.6 -0.3  1.0 0.7
## V4 0.8  0.2  0.7 1.0

(See also the examples on the help page for hetcor() for an alternative way to generate a random covariance matrix using matrix algebra.)

Next generate some multivariate normal data using our covariance matrix and convert to data frame

set.seed(123)
d <- as.data.frame(rmvnorm(n = 1000 * 4, mean = rep(1, 4), sigma = S))

Create ordered likert-type variables that range from 1 - 5. The rescale() function is handy for this.

d$V1 <- factor(round(rescale(d$V1, to = c(1,5))))
d$V2 <- factor(round(rescale(d$V2, to = c(1,5))))

Glance at the data.

head(d)
##   V1 V2         V3         V4
## 1  3  3  2.2048383  1.3610639
## 2  3  4  0.5550635  0.3054864
## 3  3  3  2.0378567  1.4166686
## 4  4  3  1.2660335  2.4787838
## 5  3  2  2.0807018  0.8691973
## 6  2  3 -0.4157929 -0.4213679

Now we’re ready to use the hetcor() function to generate a correlation matrix for this data. It will automatically calculate either polychoric, polyserial, or pearson correlations depending on data type.

cor_d <- hetcor(d)
cor_d
## 
## Two-Step Estimates
## 
## Correlations/Type of Correlation:
##        V1         V2         V3         V4
## V1      1 Polychoric Polyserial Polyserial
## V2 0.4622          1 Polyserial Polyserial
## V3 0.6193     -0.305          1    Pearson
## V4 0.8053     0.1762     0.7117          1
## 
## Standard Errors:
##          V1      V2       V3
## V1                          
## V2  0.01591                 
## V3  0.01039 0.01595         
## V4 0.006368 0.01723 0.007805
## 
## n = 4000 
## 
## P-values for Tests of Bivariate Normality:
##        V1     V2    V3
## V1                    
## V2 0.9733             
## V3  0.916 0.6441      
## V4  0.988 0.1464 0.862

If desired we can extract the correlation matrix from the hetcor object.

cor_d$correlations
##           V1         V2         V3        V4
## V1 1.0000000  0.4622263  0.6192562 0.8053214
## V2 0.4622263  1.0000000 -0.3050213 0.1761603
## V3 0.6192562 -0.3050213  1.0000000 0.7116983
## V4 0.8053214  0.1761603  0.7116983 1.0000000

This can be useful creating visualizations such as the kind available with the {corrplot} package.

library(corrplot)
corrplot(cor_d$correlations, type = 'upper', diag = FALSE)