Linear Algebra and its Applications Chapter 2 - Matrix Algebra
by Arpon Sarker
Introduction
The introductory example uses 3D modelling and computational fluid dynamics (CFD) on Boeing’s aircrafts since computer models require graphics built on linear algebra principles and calculating the CFD involves solving systems of linear equations. Two methods assist in solving these millions of equations and variables which are partitioned matrices and matrix factorisations.
Definitions
Transpose:
Matrix whose columns are formed from the corresponding rows, denoted by $A^T$
Singular Matrix:
A matrix that is not invertible
Nonsingular Matrix:
A matrix that is invertible
Elementary Matrix:
A matrix that is obtained by performing a single elementary row operation on the identity matrix.
Factorisation:
an equation that expresses A as a product of two or more matrices.
Subspace:
A subspace of $\R^n$ is any set $H$ in $\R^n$ that has three properties:
- The zero vector is in $H$
- For each $\textbf{u}$ and $\textbf{v}$ in $H$, the sum $\textbf{u}+\textbf{v}$ is in $H$ (closure under vector addition)
- For each $\textbf{u}$ in $H$ and each scalar $c$, the vector $c\textbf{u}$ is in $H$ (closure under scalar multiplication)
Column Space:
The column space of a matrix $A$ is the set $\textrm{Col}\space A$ of all linear combinations of the columns of $A$ (i.e. Span{$\textbf{a}_1,\ldots,\textbf{a}_n$}.
Null Space:
The null space of a matrix $A$ is the set $\textrm{Null} \space A$ of all solutions to the homogenous equation $A\textbf{x}=\textbf{0}$
Basis:
A basis for a subspace of $\R^n$ is a linearly independent set in $H$ that spans $H$ (make sure it’s the smallest). Since a subspace can contain an infinite number of vectors, its best to work with a small finite set of vectors that span the subspace. It can be shown that the smallest possible spanning set must be linearly independent. The set {$\textbf{e}_1,\ldots,\textbf{e}_n$} is called the standard basis for $\R^n$.
coordinate vector of $x$ (relative to $\beta$) or $\beta$-coordinate vector of $x$
Suppose the set $\beta$ is the basis for a subspace $H$. For each $\textbf{x}$ in $H$, the coordinates of $\textbf{x}$ relative to the basis are the weights $c_1,\ldots,c_p$ such that $\textbf{x}=c_1\textbf{b}_1+\ldots+c_p\textbf{b}_p$ and the vector in $\R^p$ \([\textbf{x}]_\beta = \begin{bmatrix}c_1 \\ \vdots \\ c_p\end{bmatrix}\)
Defines vector in subspace as just plain coordinates in new coordinate system (basis) rather than the standard basis
Dimension of a Subspace:
The dimension of a nonzero subspace $H$, denoted $\textrm{dim}\space H$, is the number of vectors in any basis for $H$. The dimension of the zero subspace ${\textbf{0}}$ is defined to be zero.
Rank:
The rank of a matrix $A$, denoted by $\textrm{Rank}\space A$, is the dimension of the column space of $A$.
Theorems
Theorem 1:
Let $A,B,C$ be matrices of the same size and $r,s$ be scalars:
- $A + B = B + A$
- $(A + B) + C = A + (B + C)$
- $A + 0 = A$
- $r(A+B) = rA + rB$
- $(r+s)A = rA + sA$
- $r(sA) = (rs)A$
Theorem 2:
Let $A$ be an m x n matrix and let B and C have sizes for which the indicated sums and products are defined.
- $A(BC) = (AB)C$
- $A(B+C) = AB + AC$
- $(B+C)A = BA + CA$
- $r(AB) = (rA)B = A(rB)$
- $I_mA = A = AI_n$
Theorem 3:
- $(A^T)^T = A$
- $(A+B)^T = A^T+B^T$
- $(rA)^T = rA^T$
- $(AB)^T = B^TA^T$ (This requires some proof)
Theorem 4:
$A$ is invertible, if $ad-bc \neq 0$ (the reason will be stated in Chapter 3 - Introduction to Determinants and Properties of Determinants)
\[A^{-1} = \frac{1}{ad-bc}\begin{bmatrix}d & -b \\ -c & a\end{bmatrix}\]Theorem 5:
If $A$ is an invertible matrix then for each $\textbf{b}$, the equation $A\textbf{x}=\textbf{b}$ has the unique solution $\textbf{x}=A^{-1}\textbf{b}$.
Theorem 6:
- $(A^{-1})^{-1} = A$
- $(AB)^{-1} = B^{-1}A^{-1}$
- $(A^T)^{-1} = (A^{-1})^T$
Proof of b)
\((AB)(B^{-1}A^{-1}) = A(BB^{-1})A^{-1} = AA^{-1} = I\) Proof of c)
\[(A^{-1})^TA^T = (AA^{-1})^T = I\]Can do proof on the other side since inverse works in both directions
Theorem 7:
An n x n matrix $A$ is invertible iff $A$ is row equivalent to $I_n$, and in this case, any sequence of elementary row operations that reduces $A$ to $I_n$ also transforms $I_n$ into $A^{-1}$.
Proof:
Case: Suppose A is invertible, then every row has a pivot position and because $A$ is square, there must be $n$ pivot positions on the main diaginal, which implies the REF of $A$ is $I_n$. That is, $A \sim I_n$
Case: Conversely, suppose $A \sim I_n$, since each step of the row reduction of $A$ corresponds to a left-multiplication by an elementary matrix, there exists: \(A \sim E_1A \sim \ldots \sim E_p \ldots E_1A = I_n\)
Since each of the elementary matrices is invertible (just reverse their operation to get back $I$), the product of matrices is invertible:
\[A = (E_p\ldots E_1)^{-1} I_n = (E_p\ldots E_1)^{-1}\]Thus A is invertible, since it is the inverse of an invertible matrix
\[A^{-1} = E_p \ldots E_1\]This says that applying $E_1\ldots E_p$ to $I_n$ gets $I_n$ which is the basis for finding the inverse by row reducing $\begin{bmatrix} A & I \end{bmatrix} \sim \begin{bmatrix} I & A^{-1} \end{bmatrix}$.
Theorem 8: The Invertible Matrix Theorem
Let $A$ be a square n x n matrix. Then the following statements are equivalent.
a. $A$ is an invertible matrix.
b. $A$ is row equivalent to $I_n$
c. $A$ has $n$ pivot positions
d. The equation $A\textbf{x} = \textbf{0}$ has only the trivial solution
e. The columns of $A$ form a linearly independent set.
f. The linear transformation $\textbf{x} \mapsto A\textbf{x}$ is one-to-one
g. The equation $A\textbf{x} = \textbf{b}$ has at least one solution for each $\textbf{b}\in \R^n$
h. The columns of $A$ span $\R^n$
i. The linear transformation $\textbf{x} \mapsto A\textbf{x}$ maps $\R^n$ onto $\R^n$
j. There is an n x n matrix $C$ s.t. $CA=I$
k. There is an n x n matrix $D$ s.t. $AD=I$
l. $A^T$ is an invertible matrix
Proof:
If a) is true then $A^{-1}$ works for C, so $(a) \implies (j)$. $(j) \implies (d)$ since $A\textbf{x}=\textbf{0}$ only has the trivial solution when $CA = I_n$ as $CA\textbf{x} = C\textbf{0} = \textbf{0}$ which simplifies to $I\textbf{x}=\textbf{x}=\textbf{0}$ . $(d) \implies (c)$ since there are no free variables in this equation so $A$ must have $n$ pivot columns and $A$ is square so the pivots in an echelon form of $A$ must be on the main diagonal. $(c)\implies(b)$ since $A$ is square and has $n$ pivots which means the pivots must lie on the main diagonal so the reduced form of A is $I_n$. $(b)\implies(a)$ by theorem 7.
$(a)\implies(k)$ since $A^{-1}$ works for $D$. $(k)\implies(g)$ since $AD\textbf{b}=\textbf{b}=A(D\textbf{b})=\textbf{b}$ so $\textbf{x}=D\textbf{b}$ so $A\textbf{x}=\textbf{b}$ has a solution for each $\textbf{b}$.$(g)\implies(a)$ since $A$ has a pivot position in each row by Theorem 4 (Chapter 1), since $A$ is square, the pivots must be on the diagonal and $A$ is thus row equivalent to $I_n$. $(k)$ and $(g)$ are linked to the circle by $(a)$.
$(g), (h), (i)$ are all equivalent by Theorems 4 and 12a (chapter 1). Thus, $(h)$ and $(i)$ are linked to the circle through $(g)$.
$(d), (e), (f)$ are all equivalent by Theorem 12b so $(e)$ and $(f)$ are linked to the circle throught $(d)$.
$(a)\implies(l)$ by theorem 6c and $(l)\implies(a)$ from the same theorem with $A$ and $A^T$ interchanged.
Continuation of Theorem 8 (from Dimension and Rank)
Let $A$ be an n x n matrix. Then the following statements are equivalent to the statement that $A$ is an invertible matrix.
m. The columns of $A$ form a basis of $\R^n$.
n. $\textrm{Col}\space A=\R^n$
o. $\textrm{dim Col}\space A=n$
p. $\textrm{rank}\space A = n$
q. $\textrm{Nul}\space A = {\textbf{0}}$
r. $\textrm{dim Nul}\space A = 0$
(PRACTICE PROOF)
Theorem 9
Let $T : \R^n \rightarrow \R^n$ be a linear transformation and let $A$ be the standard matrix for $T$. Then $T$ is invertible iff $A$ is an invertible matrix. In that case, the linear transformation $S$ given by $S(\textbf{x}) =A^{-1}\textbf{x}$ is the unique function satisfying: \(S(T(\textbf{x})) = \textbf{x} \quad (1)\\ T(S(\textbf{x})) = \textbf{x} \quad (2)\)
Proof:
Case: Suppose $T$ is invertible. Then (2) shows that $T$ is onto $\R^n$, for if $\textbf{b}$ is in $\R^n$ and $\textbf{x}=S(\textbf{b})$ then $T(\textbf{x})=T(S(\textbf{b}))=\textbf{b}$ so each $\textbf{b}$ is in the range of $T$. Thus $A$ is invertible by Theorem 8, statement $(i)$.
Case: Conversely, suppose A is invertible and let $S(\textbf{x})=A^{-1}\textbf{x}$. Then, $S$ is a linear transformation and $S$ obviously satsfies $(1)$ and $(2)$ by $S(T(\textbf{x}))=S(A\textbf{x})=A^{-1}(A\textbf{x})=\textbf{x}$. Thus T is invertible.
The proof that S is unique comes from $\textbf{v}=T(\textbf{x})$, because $T$ is an onto mapping where $\textbf{v}\in\R^n$. The assumed properties of $S$ and $U$ show that $S(\textbf{v})= S(T(\textbf{x})) =\textbf{x}$ and $U(\textbf{v})=U(T(\textbf{x}))=\textbf{x}$. So $S(\textbf{v})$ and $U(\textbf{v})$ are equal for each $\textbf{v}$, so $S$ and $U$ are the same function from from $\R^n$ into $\R^n$.
Theorem 10: Column-Row Expansion of $AB$
If $A$ is m x n and $B$ is n x p, then \(AB = [col_1(A)\ldots col_n(A)]\begin{bmatrix}row_1{B}\\ \vdots\\ row_n{B}\end{bmatrix}\\ = {col}_1(A){row}_1(B) + \ldots + {col}_n(A){row}_n(B)\)
Proof:
For each row index i and column index j, the (i,j)-entry in ${col}k(A){row}_k(B)$ is the product of $a{ik}$ from ${col}k(A)$ and $b{kj}$ from ${row}_k(B)$. Hence, the (i,j)j-entry in the sum show above is \(a_{i1}b_{1j} + \ldots + a_{in}b_{nj}\) which is the row-column rule.
Theorem 11:
Let $C$ be the consumption matrix for an economy, and let $\textbf{d}$ be the final demand. If $C$ and $\textbf{d}$ have nonnegative entries and if each column sum of $C$ is less than 1 then $(I-C)^{-1}$ exists and the production vector \(\textbf{x} = (I-C)^{-1}\textbf{d}\) has nonnegative entries and is the unique solution of $\textbf{x}=C\textbf{x}+\textbf{d}$
Theorem 12:
The null space of an m x n matrix $A$ is subspace of $\R^n$. Equivalently, the set of all solutions to a system $A\textbf{x}=\textbf{0}$ of $m$ homogenous linear equations in $n$ unknowns is a subspace of $\R^n$.
(PRACTICE PROOF)
Theorem 13:
The pivot columns of a matrix $A$ form a basis for the column space of A. Warning: be careful of using pivot columns for basis of $\textrm{Col}\space A$.
Theorem 14: The Rank Theorem
If a matrix $A$ has $n$ columns, then $\textrm{rank}\space A+\textrm{dim Nul}\space A = n$
Theorem 15: The Basis Theorem
Let $H$ be a p-dimensional subspace of $\R^n$. Any linearly independent set of exactly p elements in $H$ is automatically a basis for $H$. Also, any set of p elements of $H$ that spans $H$ is automatically a basis for $H$.
Matrix Multiplication
The product $AB$ (the composition of mappings/linear transformations of $A$ and $B$) is a matrix whose columns are $A\textbf{b}_1, \ldots, \textbf{b}_p$ or
\[AB = A[\textbf{b}_1 \ldots \textbf{b}_p] = [A\textbf{b}_1 \ldots A\textbf{b}_p]\]My whole education has been based on the row-column rule for computing $AB$ but the most intuitive is splitting $B$ into its vectors and using the above since $A$ transforms each of $B$’s vectors. Each row and column (i,j) pair corresponds to an (i,j) pair in $AB$ so to compute an individual row:
\[row_i(AB) = row_i(A)\cdot B\]There is also a column-row rule which uses outer products.
Partitioned Matrices
The previous sections have looked at $A$ as a collection of column vectors but in this section, we consider other partitions. In real-world cases (VSLI microchips) the matrix below’s main diagonal containing the submatrices $A_{11}, A_{22}, A_{33}$ concern the three VLSI chips and the other submatrices depend on the interconnections among the microchips.
\[A = \begin{bmatrix}A_{11} &|&A_{12}&|&A_{13} \\ A_{21}&|&A_{22}&|&A_{23}\\A_{31}&|&A_{32}&|&A_{33}\end{bmatrix}\]Multiplication works by the same row-column rule if they are conformable for block multiplication.
The inverse of partitioned matrices is found by expanding \(\begin{bmatrix}A_{11}&A_{12}\\0&A_{22}\end{bmatrix} \begin{bmatrix}B_{11}&B_{12}\\B_{21}&B_{22}\end{bmatrix} = \begin{bmatrix}I_{p}&0\\0&I_{q}\end{bmatrix}\)
where you expand the LHS and match the corresponding entry to the right and write the inverse in terms of the inverses of the submatrices of A.
Views of Products
- Definition of $A\textbf{x}$ using the columns of $A$
- The column definition of $AB=A[\textbf{b}_1\ldots\textbf{b}_p]=[A\textbf{b}_1\ldots A\textbf{b}_p]$
- The row-column rule for computing $AB$
- the rows of AB as the product of the rows of $A$ and matrix $B$
- the column-row rule for computing $AB$ (theorem 10)
Matrix Factorisations
Using factorisations amounts to preprocessing of the data in $A$, organising that data into two or more parts whose structures are more useful in way, such as computational efficiency.
LU Factorisation Motivation
There is a fairly common industrial and business problem of solving a sequence of equations, all with the same coefficient matrix: \(A\textbf{x} = \textbf{b}_1, \quad A\textbf{x} = \textbf{b}_2,\quad \ldots \quad A\textbf{x} = \textbf{b}_p\)
If $A$ is invertible, you could computee $A^{-1}\textbf{b}$ for each of these. However, a more efficient way is to solve the first equation by row reduction and obtain an $LU$ Factorisation. For the rest of the equations, use the $LU$ Factorisation which is computationally more efficient.
$L$ is an m x m lower triangular matrix with 1s on the diagonal and $U$ is an m x n echelon form of $A$.
\[A\textbf{x} = \textbf{b} \implies L(U\textbf{x}) = \textbf{b} \implies \\ L\textbf{y} = \textbf{b} \\ U\textbf{x} = \textbf{y}\]Why is it more efficient? (PRACTICE THIS) Split mapping into two mappings of
and each equation is easy to solve since both are in triangular form!
Algorithm for LU Factorisation: (PRACTICE THIS)
- Reduce $A$ to echelon form $U$ by a sequence of row replacement operations, if possible.
- Places entries in $L$ such that the same sequence of row operations reduces $L$ to $I$.
Proof:
Suppose $A$ can be reduced to echelon form $U$, using only row replacements (row swapping is another case). In this case, there exist *unit lower triangular elementary matrices $E_1,\ldots,E_p$ such that
\[E_p\ldots E_1A=U\\ A = (E_p\ldots E_1)^{-1}U = LU\\ L = (E_p\ldots E_1)^{-1}\]It can be shown that products and inverses of unit lower triangular matrices are also unit lower triangular. Therefore, $L$ is unit lower triangular.
(PRACTICE THIS) For row interchanges (since partial pivoting is used for high accuracy), the $LU$ factorisation can be modified to produce an $L$ that is permuted lower triangular where the rearrangement of $L$ makes a unit lower triangular matrix. The permuted LU factorisation works the same way as above, except the reduction follows the order of pivots in $L$ from left to right.
Subspaces of $\R^n$
Dimension and Rank
Watch 3B1B for intuition!
Applications
- Matrix Factorisation in Electrical Engineering
- Transfer Matrix
- Ladder Network
- Series Circuit
- Shunt Circuit
- Leontief Input-Output Model
- $\textbf{x} = C\textbf{x}+\textbf{d}$
- amount produced = intermediate demand + final demand
- $(I-C)^{-1} \approx I+C+C^2 + \ldots + C^m$
- Computer Graphics
- Storing objects as a bunch of vertices (book omits details of which points are connected) since standard transformations map line segments onto other line segments \(D = \begin{bmatrix}x-coord_1 & \ldots \\ y-coord_1 & \ldots \end{bmatrix}\)
- Can create composition transformations by multipying two transformation matrices together
- Homogenous Coordinates: why? -> Since translating an object on screen is not a linear transformation
- Adds an extra coordinate of 1 (e.g $(x,y,1)$)
- Homogenous 3D Coordinates for 3D modelling: $(X,Y,Z,H)$, $x=\frac{X}{H}$,… (INTUITION?)
- Perspective Projections (DON’T UNDERSTAND THIS)
Numerical Notes
- Fastest way to obtain $AB$ depends on the way the computer stores matrices in memory. Standard high-performance algorithms (such as LAPACK) calculate $AB$ by columns as in the definition above in Matrix Multiplication.
- Definition of $AB$ as vectors of $B$ multiplied by $A$ is great for parallel processing
- Practically, $A^{-1}$ is seldom computed. $A^{-1}$ and $A^{-1}\textbf{b}$ takes 3 times as many operations as solving by row reduction and row reduction may be more accurate.
- In practical work, you may encounter “nearly singular” or ill-conditioned matrix which is an invertible matrix that can become singular if some of its entries are changed slightly. Row reduction may produce fewer than $n$ pivot positions. Roundoff error can also make a singular matrix appear to be invertible. Some matrix programs commpute a condition number for a square matrix where the larger the number, the closer the matrix is to being singular.
- When matrices are too large to fit in a computer’s high-speed memory, partitioning permits the computer to work ith only two or three submatrices at a time.
- Some high-speed computers, especially with vector pipeline architecture, perform matrix calculations more efficiently when the algorithms use partitioned matrices.
The following operation counts apply to an n x n dense (most entries nonzero) matrix $A$ for n moderately large ($n \geq 30$)
- Computing an $LU$ Factorisation takes about $2n^3/3$ flops (about the same as row reducing $[A\quad\textbf{b}]$), whereas finding $A^{-1}$ requires $2n^3$ flops
- Solving $L\textbf{y}=\textbf{b}$ and $U\textbf{x}=\textbf{b}$ requires $2n^2$ flops since each triangular system of n x n requires $n^2$ flops.
- $A^{-1}\textbf{b}$ also requires $2n^2$ flops but the result may not be accurate as obtained from $L$ and $U$ (from roundoff error in computing $A^{-1}$ and $A^{-1}\textbf{b}$)
- If $A$ is sparse (mostly zero entries), then $L$ and $U$ may be sparse whereas, $A^{-1}$ is likely to be dense. In this case, $LU$ factorisation is much faster….
- $A\textbf{x}=\textbf{b}$ can be written as $(I-C)\textbf{x}=\textbf{b}$, $C=I-A$. If the system is large and sparse, it can happen that the column sums of the absolute values in $C$ are less than 1. In this case, $C^m\rightarrow0$. We can find the formula for for $A^{-1}$ really quickly using the formula for $(I-C)^{-1}$
- Continuous movement of graphical 3D objects requires intensive computation with 4x4 matrices which uses high-end computer graphics boards that can do 4x4 matrix operations and graphics algorithms embedded in their circuitry.
- Rank determination is often unsuitable for large-scale problems since row operations can change the apparent rank of a matrix. In practical applications, the effective rank of a matrix is determined by singular value decomposition.