QR Factorization
The QR factorization of a matrix constructs an orthgonal basis for its columnspace. It’s also the standard computational tool for solving linear least-squares problems.
Slides
This page uses the LinearAlgebra
package, which contains norm
and qr
. We also use the Random
package to seed the random number generator.
Orthogonal and orthonormal vectors
For any two \(n\)-vectors \(x\) and \(y\), the cosine identity \[ x\T y = \|x\|\|y\|\cos(\theta) \] relates the inner product of the vectors to the the angle between them. Thus, the vectors \(x\) and \(y\) are orthogonal when \[ x\T y = 0, \quad\text{or equivalently,}\quad \cos(\theta) = 0. \] Orthogonal vectors are orthonormal when they’re also normalized: \[ x\T y = 0, \quad \norm{x}=1, \quad \norm{y}=1. \]
These definitions extend readily to a collection of vectors: the set of \(k\) vectors \(\{x_1,\ldots,x_k\}\) in \(\R^n\) is orthogonal (and orthonormal) whenever \[ x_i\T x_j = 0 \quad\text{for all}\quad i\ne j \qquad (\text{and}\ \norm{x_i} = 1) \]
The 2-dimensional canonical unit vectors \(e_1 = (1,0)\) and \(e_2 = (0,1)\) are orthonormal:
Orthogonal matrices
A square \(n\)-by-\(n\) matrix \(Q\) is orthogonal when its columns are orthonormal: \[ Q = \begin{bmatrix} q_1 & q_2 & \cdots & q_n \end{bmatrix} \quad\text{and}\quad Q\T Q = QQ\T = I. \] Because a matrix \(B\) is the inverse of a matrix \(A\) if and only if \(BA =AB = I\), the inverse of an orthogonal matrix is its transpose: \[ Q^{-1} = Q\T. \]
An orthogonal matrix rotates a vector, but doesn’t change its length. Thus inner products are invariant under orthogonal transformations, i.e., for any vectors \(x\) and \(y\), \[ (Qx)\T(Qy) = x\T Q\T Qy = x\T y, \] and so \[ \| Q x \|_2 = \sqrt{(Qx)\T (Qx)} = \sqrt{x\T x} = \| x \|_2. \]
Another property of orthogonal matrices is that their determinant is either \(1\) or \(-1\). This can be observed from the fact that for any compatible matrices $ A $ and $ B $, \(\det( A B ) = \det( A )\det( B )\) and \(\det( A )= \det( A^T )\). Hence, \[ \det( Q \T Q ) = \det( I ) \iff \det( Q )^2 = 1 \iff \det( Q ) = ±1. \]
QR factorization
Every \(m\)-by-\(n\) matrix \(A\) has a QR factorization, which means it can be decomposed as the product \[ A = Q R, \] where \[ Q = \begin{bmatrix} q_1 & \cdots & q_m \end{bmatrix} \textt{and} R = \begin{bmatrix} r_{11} & r_{12} & \cdots & r_{1n} \\ 0 & r_{22} & \cdots & r_{2n} \\ \vdots & & \ddots & \vdots \\ 0 & 0 & \cdots & r_{mn} \end{bmatrix}, \] respectively, are an \(m\)-by-\(m\) orthogonal matrix and an \(m\)-by-\(n\) upper-triangular matrix. This factorization isn’t unique, unless we impose additional conditions on the factors \(Q\) and \(R\).
Householder reflectors and Givens rotations are the preferred methods for computing the QR factorization. These methods require \(\mathcal{O}(m^2n)\) flops to complete.
The columns of the orthogonal matrix \(Q\) reveal an orthogonal basis for the range of \(A,\) and so triangularity of \(R\) gives a simple “recipe” for reconstructing the columns of \(A=[a_1,\ldots,a_n]\) from those of \(Q\): \[ \begin{align} a_1 &= r_{11} q_1 \\ a_2 &= r_{12} q_1 + r_{22} q_2 \\ a_3 &= r_{13} q_1 + r_{23} q_2 + r_{33} q_3 \\ &\ \ \vdots \\ a_n &= r_{1n} q_1 + r_{2n} q_2 + \cdots + r_{nn} q_n. \end{align} \] From these formulas we deduce a key property of this factorization: for any index \(k < n\), the span of the leading \(k\) columns of \(Q\) contain the span of the leading \(k\) columns of \(A\), i.e., \[ \span\{a_1,\ldots,a_k\} \subseteq \span\{q_1,\ldots,q_k\}. \] If we further assume that \(A\) has full column rank, this inclusion tightens to an equality, which tells us that leading \(k\) columns of \(Q\) provide a tight basis for the corresponding columns.
Theorem 1 (QR factorization for full column rank matrices) If \(A=QR\) is the QR factorization of an \(m\)-by-\(n\) matrix \(A\) with full column rank, then the first \(n\) columns of \(Q\) form a basis for the range of \(A\) and the remaining \(m-n\) columns form a basis for its orthogonal complement: \[ \begin{aligned} \range(A) &= \range(Q_1)\\ \range(A)^\perp=\Null(A^T) &= \range(Q_2)\end{aligned} \] where \[ \begin{aligned} Q_1 &= \begin{bmatrix}q_1 & \cdots & q_n\end{bmatrix},\\ Q_2 &= \begin{bmatrix}q_{n+1} & \cdots & q_m\end{bmatrix}, \end{aligned} \] and for \(k=1,\ldots,n\), \[ \span\{a_1,\ldots,a_k\} = \span\{q_1,\ldots,q_k\} \] where the diagonal elements \(r_{kk}\ne 0.\)
This theorem tells us that a full column rank matrix can then be decomposed as \[ A = Q R = \begin{bmatrix}Q_1 & Q_2\end{bmatrix} \begin{bmatrix}R_1 \\ 0 \end{bmatrix} = Q_1 R_1, \tag{1}\] where the triangular submatrix \(R_1 = R[1{:}n,1{:}n]\) has no zeros on its diagonal. The factorization \(A = Q_1 R_1\) is the thin or economy QR.
In Julia, we can compute the full QR decomposition of a matrix using via
Verify the orthogonality of Q
:
norm(Q' * Q - I) = 7.387315390615699e-16
norm(Q * Q' - I) = 5.46115636976745e-16
Julia stores the factor Q
as a series of Householder reflectors.
Thin QR
To extract the “thin” QR factorization from qr
, use the Matrix
conversion function:
Solving least squares via QR
The QR factorization can be used to solve the least squares problem \[ \min_{x\in\R^n}\ \| A x -b\|^2. \tag{2}\]
Assume throughout this section that \(A\) has full columns rank, and hence \(m\geq n\). (The QR factorization also can be used to solve the more general problem, but we don’t consider that case here.) Let \(A = QR\) be the QR factorization shown in Equation 1. Because the 2-norm is invariant under orthogonal rotation, \[ \begin{align} \|Ax -b\|^2 & = \| Q \T ( Ax -b)\|^2\\ &=\left\|\begin{bmatrix}R_1\\0\end{bmatrix} x -\begin{bmatrix}Q_1^T\\Q_2^T\end{bmatrix}b\right\|^2\\ & = \|R_1x - Q_1^T b\|^2+\|Q_2^T b\|^2. \end{align} \tag{3}\]
Hence, minimizing \(\|R_1 x - Q_1^T b\|\) also minimizes \(\|Ax-b\|\). Because $ A $ is full rank, \(R_1\) is nonsingular, and the least-squares solution is obtained as the unique solution of the system \[ R_1 x = Q_1^T b. \tag{4}\]
The procedure for solving least-squares problems via the QR factorization can then be summarized as follows:
Here’s a simple random example in Julia:
and we can verify that x
is the least-squares solution by verifying that the residual \(r=b-Ax\) is orthogonal to the columns of \(A\):
The last line of the derivation shown in Equation 3 asserts that the norm of the residual is equal to the norm of \(Q_2^T b\). Let’s check:
Accuracy of QR versus normal equations
A solution \(x^*\) of a full-rank least-squares problem Equation 2 can be obtained as the solution of the normal equations: \[ A\T A x = A\T b. \] This approach, however, can in some cases suffer from numerical instability, which means that floating-point computations used by most computers yield solutions with unacceptable errors. The QR approach, on the other hand, often allows us to solve a wider-range of problems. The next example, borrowed from the course on Fundamentals of Numerical Computation, illustrates this point.
The Pythagorean identity asserts that \[ \sin^2(θ) + \cos^2(θ) = 1 \] for all values of \(\theta\). This implies that the matrix \[ \begin{bmatrix} \sin^2(\theta_1) & \cos^2(\theta_1) & 1 \\ \vdots & \vdots & \vdots \\\sin^2(\theta_m) & \cos^2(\theta_m) & 1 \end{bmatrix}, \] for any values \(\theta_1,\ldots,\theta_m\), doesn’t have full column rank. Let’s define a slightly perturbed version of this matrix that does have full column rank:
We can check that this matrix has full column rank:
Now create a right-hand side that corresponds to a known solution xᵉ
:
Solution via the normal equations
Here’s the solution obtained via the normal equations, and its corresponding relative error:
xⁿ = [0.9124057128050078, 1.9124057125688128, 1.0875942872529518]
eⁿ = 0.06193851453811387
This solution has only about 1 digit of accuracy.
Solution via QR factorization
Here’s the solution obtained via the QR factorization, and its corresponding relative error:
xʳ = [0.999999999633748, 1.999999999633748, 1.0000000003662528]
eʳ = 2.5897945013010657e-10
Clearly, this solution is much more accurate than that obtained via the normal equations. The cost is essentially the same, so there’s no real downside to using QR. In general, we can use QR implicitly via the backslash operator, which organizes its computations in a slightly different way, and obtains even better accuracy: