# Confirmatory Factor Analysis with Ordinal Data

## Introduction

Factor analysis is arguably one of the most utilized multivariate techniques in the social, behavioral, and psychological sciences for uncovering hidden structures in data. At its core, factor analysis is a data reduction technqiue that aims to describe the pattern of correlations among a set of $P$ variables with a smaller set of $K$ latent variables, commonly called factors. The traditional factor analysis model is designed for multivariate normal data, however, many applications of factor analysis in the social and behavioral sciences come from self-report measures where individuals provide responses to a set of categories. Previous research has found there situations in which treating categorical data as continuous is inappropriate (see ******).

The purpose of this post is to describe the statistcal and computational machinery underlying factor analysis for ordinal/categorical response data. I’ll start by describing the notion that categorical data are *discretized manifestations* of a *continuous* random variable. I’ll then discuss the polychoric correlation, arguably one of the greatest statistical ideas to ever be presented (imo). Following this, I’ll present some `R`

code that we can use to estimate the polychoric correlations and we’ll see how the estimation algorithm performs in simulation conditions. This will take us to the SEM parameterization of the polychoric correlation matrix and this is the key we need to fit factor analytic models with ordinal data. I’ll present `R`

code that you can use to further your learning of the polychoric correlation matrix and factor analysis with ordinal/categorical data.

## Categorical Responses as *Discretized Manifestations* of a Continuous Random Variable

In *** Pearson, was working on problems related to (***). He devised an ingenious plan.
The genius of this approach is that you can mathematically prove that, in $2 \times 2$ contigency tables, there is ** always** a tetrachoric correlation $\rho \in (-1, 1)$ that will yield a partitioning of the continuous space such that the

*area*of each quadrant will equal the observed

*probabilities*from the data.

The observed response is $Y$ and the underlying latent response is $Y^\star$. In

Moreover, because the construction of categorical data thorugh a partitioning of the latent continuous space requires ordered thresholds, this explains why there don’t exist factor model techniques for ** nominal** data (there are models under the item response theory framework).

## Limited Information Estimation

Because the goal of factor analysis with ordinal/categorical items is to *reproduce* the tetrachoric correlation matrix $\Sigma_\rho$ estimation necessarily incorporates less information than full-information methods. Hence, factor models with ordinal/categorical items utilize *limited-information* estimation.

**Highlight your code snippets, take notes on math classes, and draw diagrams from textual representation.**

On this page, you’ll find some examples of the types of technical content that can be rendered with Wowchemy.

## Examples

### Code

Wowchemy supports a Markdown extension for highlighting code syntax. You can enable this feature by toggling the `syntax_highlighter`

option in your `config/_default/params.toml`

file.

```
```r
import pandas as pd
data = pd.read_csv("data.csv")
data.head()
```
```

renders as

```
RS <- function(nDraws)
{
r <- 1.85
draws <- NULL
nTotal <- nAccept <- 0
repeat
{
nTotal <- nTotal + 1
x <- rnorm(1, 0, sqrt(0.5))
rgx <- r*dnorm(x, 0, sqrt(0.5))
kx <- Kx(x)
if(runif(1, 0, rgx) < kx) {
draws <- c(draws, x)
nAccept <- nAccept + 1 }
if(length(draws) == nDraws) break
}
```