3 January 2025

Applied Multivariate Statistical Analysis Chapter 3 - Sample Geometry and Random Sampling

by Arpon Sarker

Introduction

Using the vector concepts introduced in the previous chapter, the descriptive statistics $\bar{x}$, $S_n$, and $R$ can be expressed as geometrical interpretations. This entire chapter surrounds the assumption of random sampling:

Items/trials are independent of each other (measurements are unrelated to one another)
the joint distribution of all $p$ variables remains the same for all items

After that there are 3 different ways to gather a generalised variance for an overall description of variability and then using all these concepts to calculate descriptive statistics straight from data matrix $X$ and for linear combinations too.

The Geometry of the Sample

For random sampling which is this whole chapter, the rows of data matrix $X$ are the items or random variables $X_1, \ldots, X_n$ for each row. Looking at $p$ points in $n$ dimensions, each vector gives us fully complete information on each variable or in other words, each vector is the behaviour of each variable for an n-dimensional representation. For higher dimensions, we can plot these variable vectors in 2 or 3 dimensions depending on which 2/3 items we’d like to analyse.

Using the $\bar{x}$ as the sample mean vector which is just the arithmetic mean of each row or random variable for n dimensions. The geometrical interpretation of this is $y_i$ which is the column vectors of $X$ is related to the mean by $\bar{x_i} = \frac{y_i^T\textbf{1}}{n}$, where $\textbf{1}$ is just a vectors of 1s. This is just a projection of each vector onto the $\textbf{1}$ vector scaled by the mean and we can calculate the deviation/residual vector $d_i = y_i - \bar{x_i}\textbf{1}$. The square of the length of $d_i$ is $ns_{ii}$ and the dot product of $d_i$ and $d_k$ is $ns_{ik}$ and are scaled by n since the lengths don’t include the $\frac{1}{n}$ term. The sample correlation $r_{ik}$ is the cosine of the angle between $d_i$ and $d_k$.

Random Samples and the Expected Values of the Sample Mean and Covariance Matrix

Random samples are needed to make inferences about data matrix $X$. We treat each row vector as a random variable where $X_{11}$ to $X_{1p}$ is the first row of $X$ and is random varialbe $X_1$. Two significant points are:

The measurements in a single trial may be correlated but different trials must be independent.
Independence of measurements may not work when likely to drift over time.

Without the assumption of independence, the quality of statistical inferences will be much poorer (some calculations output 0, if dependent).

We can use the sampling distributions $\bar{X}$ and $S_n$ as point estimators of the corresponding population mean $\mu$ and covariance matrix $\Sigma$. No further assumptions need to be made of the underlying joint distributions of the variables. $E(\bar{X}) = \mu \\ Cov(\bar{X}) = \frac{1}{n}\Sigma \\ E(S_n) = \Sigma - \frac{1}{n} \Sigma, so \\ E(\frac{n}{n-1} S_n) = \Sigma \text{, unbiased estimator}$

The proof for $\bar{X}$ is much simpler than the proof for $S_n$ which is below:

Proof: $\sum_{j=1}^n (X_j - \bar{X})(X_j - \bar{X}) \\ = \sum_{j=1}^n (X_j - \bar{X})X_j^T + (\sum_{j=1}^n (X_j - \bar{X}))(-X_j^T) \text{, dist. law} \\ = \sum_{j=1}^n (X_jX_j^T - \bar{X}X_j^T) + (0)(-X_j^T) \text{, since this reverses mean vector to 0} \\ = \sum_{j=1}^n X_jX_j^T - n \bar{X} \bar{X}^T \text{, since } \sum_{j=1}^n X_j^T = n \bar{X} \\ = E(\sum_{j=1}^n X_jX_j^T - n \bar{X} \bar{X}^T) \\ = \sum_{j=1}^n E(X_jX_j^T) - E(n \bar{X} \bar{X}^T) \\ = n(\Sigma + \mu \mu^T - (\frac{1}{n}\Sigma + \mu \mu^T)) \\ \text{above identity: } Cov(VV^T) = \Sigma_v + \mu_v \mu_v^T \\ = (n-1) \Sigma \\ S_n = \frac{1}{n} (\sum_{j=1}^n X_jX_j^T - n \bar{X} \bar{X}^T) \text{, this is just cov formula} \\ \therefore E(S_n) = \frac{n-1}{n} \Sigma$

The unbiased sample covariance matrix is: $S = (\frac{n}{n-1}) S_n = \frac{1}{n-1}\sum_{j=1}^n X_jX_j^T - n \bar{X} \bar{X}^T$ Note: For the above formula, this is just for the matrix itself and not the expectation version. $E(S)$ is just $\Sigma$ which is why it is unbiased as there are no extra terms.

Generalised Variance

For a single numerical value to express the variation of $S$ (which will now be used as the covariance matrix), we use the generalised variance where $\text{Generalised Sample Variance } = |S|$ You can think of the determinant as the volume or area ($p = 2$). For instance, for $p=3$, $|S|$ is just the trapezoidal prism by joining up all 3 deviation vectors as a solid geometry. $|S| = \frac{(\text{area})^2}{(n-1)^2}$ We can also create ellipsoids under some constant $c$ in $p$-dimensional space: $(x-\bar{x})^TS^{-1}(x-\bar{x}) = c^2 \\ \text{volume of } \{x:(x-\bar{x})^TS^{-1}(x-\bar{x}) \le c^2\} = k_p|S|^{1/2}c^p$ The above formula is just the volume of the ellipsoid squared being equal to constant $k_p$ multiplied by the generalised sample variance. These geometric interpretations of the variance are quite easy to interpet the variance and covariance but has a weakness that many different datasets can have the same generalised variance as they span the same area but the orientation of the data isn’t taken into account.

Situations where Generalised Sample Variance is 0

A generalised sample variance of 0 is indicative of extreme degeneracy where at least one column of the matrix of deviations can be expressed as a linear combination of others or in other words, they are linearly dependent. This is because one deviation vector lies on the same plane as the other and thus the geometric shape they make is flat and has 0 volume.

Note: When sending datasets electronically, some datasets are augmented with a column that is an aggregate of the original data columns such as a total score being the addition of the math and english score. However, this creates linear dependence and an inverse cannot be found since the determinant is 0!

For a generalised variance to be 0:

$Sa = 0a = 0$ where $a$ is a scaled eigenvector of $S$ with an eigenvalue of 0.
$a^T(x_j-\bar{x}) = 0, \quad \forall j$ which is the linear combination of the mean-corrected data, using $a$ (the coefficients), is 0.
$a^Tx_j = c, \quad \forall j \quad (c = a^T\bar{x})$
Generalised Variance determined by $|R|$ and its Geometrical Interpretation

The above generalised variance using $|S|$ is affected by the variability of measurements of a single variable. For instance, some large or very small variability of a variable will lead to a long or short deviation vector which affects the size of the volume. Hence, we should scale the deviation vectors to constant length where the generalised sample variance of the standardised variables is $|R| = \frac{volume^2}{(n-1)^p}$. This formula is just the length of the standardised deviation vectors which is $(n-1)$ from our calculation of $S$ using this as a divisor from now on. Geometrically, the deviation vectors are all of the same length and is now only affected by the orientation of the vectors where perpendicular vectors give a larger generalised variance.

Total Sample Variance

Lastly, this generalisation of variance is the sum of the diagonal elements of the covariance matrix $S$ but pays no attention to the orientation (correlation structure) of the residual vectors.

Sample Mean, Covariance, and Correlation as Matrix Operations

This links the sample mean, covariance, and correlation as matrix operations by being able to calculate these matrices directly from data matrix $X$.

\[\bar{x} = \frac{1}{n}X^T\textbf{1} \\ S = \frac{1}{n-1} X^T(I-\frac{1}{n}\textbf{11}^T)X \text{, where expanding the terms give} \\ X^T(I-\frac{1}{n}\textbf{11}^T)X = (X-\frac{1}{n}\textbf{11}^TX)^T(X-\frac{1}{n}\textbf{11}^TX) \\ R = D^{-1/2}SD^{-1/2} \text{, where } D^{1/2} \text{ is the standard deviation matrix}\]

Sample Values of Linear Combinations

The sample values of linear combinations are set out as $c^TX = c_1X_1 + \ldots + c_pX_p$

Sample Mean: $\frac{c^Tx_1 + \ldots + c^Tx_n}{n} = c^T(x_1+ \ldots + x_n)\frac{1}{n} = c^T\bar{x}$

Sample Variance: $\frac{(c^x_1 -c^T\bar{x})^2 + \ldots + (c^x_n -c^T\bar{x})^2}{n-1} \\ = \frac{c^T(x_1-\bar{x})(x_1-\bar{x})^Tc + \ldots + c^T(x_n-\bar{x})(x_n-\bar{x})^Tc}{n-1} \\ = c^TSc$

Sample Covariance: between linear combination $b^TX$ and $c^TX$ $\frac{(b^Tx_1 - b^T\bar{x})(c^Tx_1 - c^T\bar{x}) + \ldots + (b^Tx_n - b^T\bar{x})(c^Tx_n - c^T\bar{x})}{n-1} \\ = \frac{b^T(x_1 - \bar{x})(x_1 - \bar{x})^Tc + \ldots + b^T(x_n - \bar{x})(x_n - \bar{x})^Tc}{n-1} \\ =b^TSc$

Exercises were not done.

tags: mathematics - statistics

Hacker theme

Hacker is a theme for GitHub Pages.