Applied Multivariate Statistical Analysis Chapter 9 - Factor Analysis and Inference for Structured Covariance Matrices
by Arpon Sarker
Introduction
This chapter deals with factor analysis which can be seen as an extension to principal components which constructs a linear combination of random variables to create a principal component $Y_i$ but factor analysis uses factors in a linear combination to get back an approximation of $\textbf{X}$. Factors are the underlying and unobservable random quantities that describe the covariance relationships. This is also related to data reduction and interpretation.
The Orthogonal Factor Model
The factor analysis model is \(X - \mu = LF + \varepsilon\)
The $L$ is the matrix of factor loadings, F is the matrix of common factors, and $\varepsilon$ is the matrix of specific factors (associated only with the response $X_i$). We assume that the expectation of $F$ and $\varepsilon$ are both 0 and that the covariance of F is the identity matrix and the covariance of $\varepsilon$ is the diagonal matrix of *specific variances. The covariance between them both is also 0. This is the orthogonal factor model, another model used is the oblique factor model which is when the factors $F$ are correlated. The total variance of each $X_i$ is \(Var(X_i) = h_i^2 + \psi_i \\ h_i^2 = \ell_{i1}^2 + \ldots + \ell_{im}^2\)
$h_i^2$ is the communality which is just the sum of squares of the loadings of the ith variable on the $m$ common factors and the specific variance $\psi$ fills in the specific gaps for each response variable to get as close to $X$. The factor model for the covariance matrix structure is: \(\Sigma = LL^T + \psi\)
Methods of Estimation
There are two methods for estimating the factor loadings $\ell_{ij}$ and specific variance $\psi_i$. The first method is the principal component (and principal factor) method which computes using the spectral decomposition the eigenvector and eigenvalues, where the factor loadings are jus tthe scaled coefficients of the first few principal components. You calculate the matrix of loadings first and then to calculate $\tilde{\psi}$, the estimated sample variance, just subtract the sum of squares of matrix loadings from the sample variance.
Using this definition, the proportion of total variance due to the jth factor can be calculated as the estimated eigenvalue $\hat{\lambda}j$ divided by the total varaince which is either $s{11} + \ldots + s_{pp}$ (for factor analysis of $S$) or $p$ (for factor analysis of $R$).
The principal factor solution is just the iterative version of this where the factor model has to equal 1; $1 = h_i^2 + \psi_i$.
The third method is the maximum likelihood method which uses the likelihood function $L(\mu, \Sigma)$ to estimate the parameters with the additional restriction of \(L^T\psi^{-1}L = \Delta \text{ (a diagonal matrix)}\)
The principal components usually account for higher variance explained or proportion of total variance due to the jth factor because the method is optimised for finding the largest variances but the maximum likelihood is a much better fit as by checking $R-\hat{L}\hat{L}^T - \hat{\psi}$, the values are much closer to 0.
In interpreting the common factors, the higher absolute values that each factor is made up of shows that this factor has grouped these variables into one underlying construct that explains all these variables. For large samples, a hypothesis test using the null hypothesis $H_0: \Sigma = LL^T + \psi$ is used to test the adequacy of the $m$ common factor model which helps compare between how many common factors are actually necessary. This uses the likelihood ratio statistic in the formula against the $\chi_p^2$-distribution. \(-2 \ln \Lambda = n \ln \left(\frac{|\hat{\Sigma}|}{|S_n|}\right)\)
Factor Rotation
The effects of factor rotation doesn’t change the communalities, the estimated covariance matrix, or the residual matrix since multiplication by an orthogonal matrix can be reduced to $I$ \(\hat{L}\hat{L}^T + \hat{\psi} = \hat{L}TT^T\hat{L}^T + \psi = \hat{L}*\hat{L}*^T + \hat{\psi}\) We do this for interpretation as maybe rotating the coordinate axes around to some better value can help create a better interpretation by looking at how high or low they are in reference to the new coordinate axes (the rotated loadings). To select a matrix $T$, a varimax criterion is used where $V$ is proportional to sum of the variance of squares of the scaled loading for the the jth factor.
Factor Scores
Factor scores are the estimated values of the common factors. These quantities are for diagnostic purposes and further analysis. The two methods for this is the weighted least squares method and the regression method. These two methods have matrix expressions that can be written in terms of each other.
Perspectives and a Strategy for Factor Analysis
- Perform a principal component factor analysis. Look for suspicious observations by plotting the factor scores or by calculating the standardised scores and squared distances or try a varimax rotation
- Perform a maximum likelihood factor analysis, including a varimax rotation.
- Compare the solutions from the two factor analyses. Plot factor scores for both methods against each other.
- Repeat the first three steps for other numbers of common factor $m$
- For large data sets, split them in half and perform a factor analysis on each part.
A vast majority of factor analyses do not provide clear-cut results as it is quite subjective but should be used in whether you understand the factors present and why they are like that.
tags: mathematics - statistics