Applied Multivariate Statistical Analysis Chapter 8 - Principal Components
by Arpon Sarker
Introduction
Principal component analysis is primarily used for dimensionality reduction and interpretation. This analysis explains the variance-covariance structure of a set of variables through a few linear combinations of these variables. Unfortunately, the geometrical interpretation of rotating the coordinate axes in relation to the greatest variance was not thoroughly explained and I still do not find it intuitive. It serves as an intermediate step for multiple regression, cluster analysis, and factor analysis.
Population Principal Components
We want to find the $\text{first principal component} = \text{linear combination } a_1^T$ that maximises $Var(a_1^TX)$ subject to $a_1^Ta_1=1$. This just means a linear combination that gives the highest variance (in relation to the total variance explained, later) and all coefficient vectors are of unit length so they are not just increased to infinity to net a higher variance. The principal components after this first one have the additional constraint of $Cov(a_i^TX, a_k^TX) = 0 \text{ for k} < i$ where all principal components after have to be perpendicular to the ones before it.
The i-th principal component is defined as the linear combination $Y_i = e_i^TX = e_{i1}X + \ldots + e_{ip}X_p$, where $Var(Y_i) = e_i^T\Sigma_i = \lambda_i$ and $Cov(Y_i, Y_k) = e_i^TXe_k = 0 \text{ for } i \neq k$. This just explains that for our principal component the varaince is just i-th eigenvalue and $e_i$ is just the corresponding i-th eigenvector. For the total population variance, it is $\sigma_{11} + \ldots + \sigma_{pp} = \sum_{i=1}^p = \lambda_1 + \ldots + \lambda_p = \sum_{i=1}^p Var(Y_i)$. Hence the proportion of total population variance due to the kth principal component is: \(= \frac{\lambda_k}{\lambda_1 + \ldots + \lambda_p}\)
The correlation coefficients between the components $Y_i$ and variables $X_k$ are $\rho_{Y_i, X_k} = \frac{e_{ik}\sqrt{\lambda_i}}{\sqrt{\sigma_{kk}}}$ which allows us to see for each variable, how much influence or effect does it have on the linear combination for the principal component based on see how large the absolute value of the correlation coefficient is compared to the other variables. You can also make constant density ellipses, assuming each $X_j$ is from a normal distribution. Doing a mean-centred ellipse changes nothing, since the principal components are only dependent on the covariance structure.
Principal Components Obtained from Standardised Variables
We have $\textbf{Z} = (V^{1/2})^{-1}(X-\mu)$ which is a matrix of all standardised variables and is analagous to the univariate $\frac{x-\mu}{\sqrt{\sigma}}$. The proportion of standardised population variance due to the kth principal component is \(= \frac{\lambda_k}{p}\)
Summarising Sample Variation by Principal Components
Instead of analysing population principal components, we are analysing sample principal components which has its own sample mean $a_1^T\bar{x}$ and sample covariance $a_1^TSa_2$. Using the same logic of finding some linear combination $a_i$ to maximise the sample variance and that the coefficient vectors $a_i^Ta_i=1$ as a restriction for finding the first sample principal component and the additional restriction of being orthogonal to the previous principal components for the i-th principal component, we get the exact same formulas.
A scree plot using the elbow method is a visual aid in determining the appropriate number of principal components.
When given a plot of constant density ellipses for some principal components but it is spherical, this means it is not possible to represent the data well in fewer than p dimensions.
This time standardising the sample principal components means the sample variance is still the respective eigenvalue and the covariance between two distinct principal components is still 0 but the correlation is different and is $r_{\hat{y}i, z_k} = \hat{e}{ik}\sqrt{\hat{\lambda}_k}$.
Note: An unusually small value for the last eigenvalue from either the sample covariance or correlation matirx can indicate an unnoticed linear dependency in the data set.
Graphing the Principal Components
- To help check the normal assumption, construct scatter diagrams for pairs of the first few principal components. Also, make Q-Q plots from the sample values generated by each principal component.
- Construct scatter diagrams and Q-Q plots for the last few principal components. These help identify suspect observations.
Large Sample Inferences
To test for an equal correlation structure (where all the correlations are the same), set the null hypothesis as \(H_0: \boldsymbol{\rho} = \boldsymbol{\rho}_0 = \begin{bmatrix} 1 & \rho & \cdots & \rho \\ \rho & 1 & \cdots & \rho \\ \vdots & \vdots & \ddots & \vdots \\ \rho & \rho & \cdots & 1 \end{bmatrix}\)
To test this, it may be based on a likelihood ratio statistic or an equivalent test procedure can be constructed from the off-diagonal elements of $\textbf{R}$ (Lawley).
Monitoring Quality with Principal Components
This subchapter deals with quality control as with chapter 6 and talks about stability and using a $\alpha$% control ellipse based on the first few principal components to check for outliers or any unwarranted observations against quality metrics. In order to use the principal components for future predictions, there should be no outliers and can be done by analysing the $T^2$-chart too.
tags: mathematics - statistics