UBC CPSC 406 2022-T2

QR Factorization

The QR factorization of a matrix constructs an orthgonal basis for its columnspace. It's also one of the best computational tools for solving least-squares problems.

Required Reading

⚠ Note

This page uses the LinearAlgebra package, which contains norm and qr.

using LinearAlgebra

Orthogonal and orthonormal vectors

For any two nn-vectors xx and yy, the cosine identity

xT ⁣y=xycos(θ) x^T\! y = \|x\|\|y\|\cos(\theta)

relates the inner product of the vectors to the the angle between them. Thus, the vectors xx and yy are orthogonal when

xT ⁣y=0,or equivalently,cos(θ)=0. x^T\! y = 0, \quad\text{or equivalently,}\quad \cos(\theta) = 0.

Orthogonal vectors are orthonormal when they're also normalized:

xT ⁣y=0,x=1,y=1. x^T\! y = 0, \quad \| x\|=1, \quad \| y\|=1.

These definitions extend readily to a collection of vectors: the set of kk vectors {x1,,xk}\{x_1,\ldots,x_k\} in Rn\mathbb R^n is orthogonal (and orthonormal) whenever

xiT ⁣xj=0for allij(and xi=1) x_i^T\! x_j = 0 \quad\text{for all}\quad i\ne j \qquad (\text{and}\ \| x_i\| = 1)

The 2-dimensional canonical unit vectors e1=(1,0)e_1 = (1,0) and e2=(0,1)e_2 = (0,1) are orthonormal:

e₁ = [1, 0];  e₂ = [0, 1]
@show e₁'e₂
@show norm(e₁)
@show norm(e₂)
e₁' * e₂ = 0
norm(e₁) = 1.0
norm(e₂) = 1.0

Orthogonal matrices

A square nn-by-nn matrix QQ is orthogonal when its columns are orthonormal:

Q=[q1q2qn]andQT ⁣Q=QQT ⁣=I. Q = \begin{bmatrix} q_1 & q_2 & \cdots & q_n \end{bmatrix} \quad\text{and}\quad Q^T\! Q = QQ^T\! = I.

Because a matrix BB is the inverse of a matrix AA if and only if BA=AB=IBA =AB = I, the inverse of an orthogonal matrix is its transpose:

Q1=QT ⁣. Q^{-1} = Q^T\!.

An orthogonal matrix rotates a vector, but doesn't change its length. Thus inner products are invariant under orthogonal transformations, i.e., for any vectors xx and yy,

(Qx)T ⁣(Qy)=xT ⁣QT ⁣Qy=xT ⁣y, (Qx)^T\!(Qy) = x^T\! Q^T\! Qy = x^T\! y,

and so

Qx2=(Qx)T ⁣(Qx)=xT ⁣x=x2. \| Q x \|_2 = \sqrt{(Qx)^T\! (Qx)} = \sqrt{x^T\! x} = \| x \|_2.

Another property of orthogonal matrices is that their determinant is either 11 or 1-1. This can be observed from the fact that for any compatible matrices A A and B B , det(AB)=det(A)det(B)\det( A B ) = \det( A )\det( B ) and det(A)=det(AT)\det( A )= \det( A^T ). Hence,

det(QT ⁣Q)=det(I)    det(Q)2=1    det(Q)=±1. \det( Q ^T\! Q ) = \det( I ) \iff \det( Q )^2 = 1 \iff \det( Q ) = ±1.

QR Factorization

Every mm-by-nn matrix AA has a QR factorization, which means it can be decomposed as the product

A=QR, A = Q R,

where

Q=[q1qm] andR=[r11r12r1n0r22r2n00rmn], Q = \begin{bmatrix} q_1 & \cdots & q_m \end{bmatrix} \quad\text{ and}\quad R = \begin{bmatrix} r_{11} & r_{12} & \cdots & r_{1n} \\ 0 & r_{22} & \cdots & r_{2n} \\ \vdots & & \ddots & \vdots \\ 0 & 0 & \cdots & r_{mn} \end{bmatrix},

respectively, are an mm-by-mm orthogonal matrix and an mm-by-nn upper-triangular matrix. This factorization isn't unique, unless we impose additional conditions on the factors QQ and RR.

Householder reflectors and Givens rotations are the preferred methods for computing the QR factorization. These methods require O(m2n)\Omicron(m^2n) flops to complete.

The columns of the orthogonal matrix QQ reveal an orthogonal basis for the range of A,A, and so triangularity of RR gives a simple "recipe" for reconstructing the columns of A=[a1,,an]A=[a_1,\ldots,a_n] from those of QQ:

a1=r11q1a2=r12q1+r22q2a3=r13q1+r23q2+r33q3  an=r1nq1+r2nq2++rnnqn.\begin{aligned} a_1 &= r_{11} q_1 \\ a_2 &= r_{12} q_1 + r_{22} q_2 \\ a_3 &= r_{13} q_1 + r_{23} q_2 + r_{33} q_3 \\ &\ \ \vdots \\ a_n &= r_{1n} q_1 + r_{2n} q_2 + \cdots + r_{nn} q_n. \end{aligned}

From these formulas we deduce a key property of this factorization: for any index k<nk < n, the span of the leading kk columns of QQ contain the span of the leading kk columns of AA, i.e.,

span{a1,,ak}span{q1,,qk}. \hbox{span}\{a_1,\ldots,a_k\} \subseteq \hbox{span}\{q_1,\ldots,q_k\}.

If we further assume that AA has full column rank, this inclusion tightens to an equality, which tells us that leading kk columns of QQ provide a tight basis for the corresponding columns.

Theorem: (QR factorization for full column rank matrices) If A=QRA=QR is the QR factorization of an mm-by-nn matrix AA with full column rank, then the first nn columns of QQ form a basis for the range of AA and the remaining mnm-n columns form a basis for its orthogonal complement:
range(A)=range(Q1)whereQ1=[q1qn],range(A)=null(AT)=range(Q2)whereQ2=[qn+1qm],\begin{aligned} \hbox{range}(A) &= \hbox{range}(Q_1) \quad \text{where}\quad Q_1 = \begin{bmatrix}q_1 & \cdots & q_n\end{bmatrix}, \\\hbox{range}(A)^\perp=\hbox{null}(A^T) &= \hbox{range}(Q_2) \quad \text{where}\quad Q_2 = \begin{bmatrix}q_{n+1} & \cdots & q_m\end{bmatrix}, \end{aligned}
and for k=1,,nk=1,\ldots,n,
span{a1,,ak}=span{q1,,qk} \hbox{span}\{a_1,\ldots,a_k\} = \hbox{span}\{q_1,\ldots,q_k\}
where the diagonal elements rkk0.r_{kk}\ne 0.

This theorem tells us that a full column rank matrix can then be decomposed as

A=QR=[Q1Q2][R10]=Q1R1, A = Q R = \begin{bmatrix}Q_1 & Q_2\end{bmatrix} \begin{bmatrix}R_1 \\ 0 \end{bmatrix} = Q_1 R_1,

where the triangular submatrix R1=R[1:n,1:n]R_1 = R[1{:}n,1{:}n] has no zeros on its diagonal. The factorization A=Q1R1A = Q_1 R_1 is the thin or economy QR.

In Julia, we can compute the full QR decomposition of a matrix using via

m, n = 4, 3
A = randn(m,n)
Q, R = qr(A)
LinearAlgebra.QRCompactWY{Float64, Matrix{Float64}, Matrix{Float64}}
Q factor:
4×4 LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}:
 -0.589037   -0.609377    0.391674   0.35817
  0.594233   -0.0982621   0.0108708  0.798194
 -0.547289    0.568937   -0.37926    0.482646
  0.0199064   0.543429    0.838233   0.0406633
R factor:
3×3 Matrix{Float64}:
 -1.64787  -1.95628   1.26521
  0.0       2.8769   -0.0479272
  0.0       0.0       1.30317

and we can verify the orthogonality of Q:

@show norm(Q'*Q - I)
@show norm(Q*Q' - I)
norm(Q' * Q - I) = 5.928593550334434e-16
norm(Q * Q' - I) = 4.896838930763021e-16

Julia stores the factor Q as a series of Householder reflectors.

Thin QR

To extract the "thin" QR factorization from qr, use the Matrix conversion function:

F = qr(A)
Q₁ = Matrix(F.Q)
R₁ = F.R
@show size(Q₁)
@show size(R₁)
@show norm(A - Q₁*R₁)
size(Q₁) = (4, 3)
size(R₁) = (3, 3)
norm(A - Q₁ * R₁) = 7.494005416219807e-16

Solving least squares via QR

The QR factorization can be used to solve the least squares problem

minxRn Axb2. \min_{x\in\mathbb R^n}\ \| A x -b\|^2.

Assume throughout this section that AA has full columns rank, and hence mnm\geq n. (The QR factorization also can be used to solve the more general problem, but we don't consider that case here.) Let A=QRA = QR be the QR factorization shown in (1). Because the 2-norm is invariant under orthogonal rotation,

Axb2=QT ⁣(Axb)2=[R10]x[Q1TQ2T]b2=R1xQ1Tb2+Q2Tb2.\begin{aligned} \|Ax -b\|^2 & = \| Q ^T\! ( Ax -b)\|^2\\ &=\left\|\begin{bmatrix} R_1\\0\end{bmatrix} x -\begin{bmatrix} Q_1^T\\Q_2^T\end{bmatrix}b\right\|^2\\ & = \|R_1x - Q_1^T b\|^2+\|Q_2^T b\|^2. \end{aligned}

Hence, minimizing R1xQ1Tb\|R_1 x - Q_1^T b\| also minimizes Axb\|Ax-b\|. Because A A is full rank, R1R_1 is nonsingular, and the least-squares solution is obtained as the unique solution of the system

R1x=Q1Tb. R_1 x = Q_1^T b.

The procedure for solving least-squares problems via the QR factorization can then be summarized as follows:

Algorithm: (Least-squares via QR)

Here's a simple random example in Julia:

m, n = 4, 3
A = randn(m, n)
b = randn(m)
F = qr(A)
Q₁, R₁ = Matrix(F.Q), F.R
x = R₁ \ Q₁'b   # <--- do NOT be tempted to use x = inv(R₁)*Q₁'b

and we can verify that x is the least-squares solution by verifying that the residual r=bAxr=b-Ax is orthogonal to the columns of AA:

r = b - A*x
@show norm(A'r)
norm(A' * r) = 3.820983142562135e-15

The last line of the derivation shown in (3) asserts that the norm of the residual is equal to the norm of Q2TbQ_2^T b. Let's check:

Q₂ = F.Q[:,n+1:end]
norm(Q₂'b) ≈ norm(r)
true

Accuracy of QR versus normal equations

A solution xx^* of a full-rank least-squares problem (2) can be obtained as the solution of the normal equations:

AT ⁣Ax=AT ⁣b. A^T\! A x = A^T\! b.

This approach, however, can in some cases suffer from numerical instability, which means that floating-point computations used by most computers yield solutions with unacceptable errors. The QR approach, on the other hand, often allows us to solve a wider-range of problems. The next example, borrowed from the course on Fundamentals of Numerical Computation, illustrates this point.

The Pythagorean identity asserts that

sin2(θ)+cos2(θ)=1 \sin^2(θ) + \cos^2(θ) = 1

for all values of θ\theta. This implies that the matrix

[sin2(θ1)cos2(θ1)1sin2(θm)cos2(θm)1], \begin{bmatrix} \sin^2(\theta_1) & \cos^2(\theta_1) & 1 \\ \vdots & \vdots & \vdots \\\sin^2(\theta_m) & \cos^2(\theta_m) & 1\end{bmatrix},

for any values θ1,,θm\theta_1,\ldots,\theta_m, doesn't have full column rank. Let's define a slightly perturbed version of this matrix that does have full column rank:

θ = LinRange(0,3,400)
ε = 1e-7
A = @. [sin(θ)^2   cos(θ+ε)^2   θ^0]

We can check that this matrix has full column rank:

rank(A) == 3
true

Now create a right-hand side that corresponds to a known solution xᵉ:

xᵉ = [1., 2., 1.]
b  = A*xᵉ

Solution via the normal equations

Here's the solution obtained via the normal equations, and its corresponding relative error:

xⁿ = A'A \ A'b
eⁿ = norm(xᵉ - xⁿ) / norm(xᵉ)
@show xⁿ
@show eⁿ
xⁿ = [1.0769230768721778, 2.0769230770796003, 0.9230769230769231]
eⁿ = 0.054392829346937996

This solution has only about 1 digit of accuracy.

Solution via QR factorization

Here's the solution obtained via the QR factorization, and its corresponding relative error:

Q, R = qr(A); Q = Matrix(Q)
xʳ = R \ (Q'b)
eʳ = norm(xᵉ - xʳ) / norm(xᵉ)
@show@show
xʳ = [0.9999999962497133, 1.9999999962497135, 1.000000003750286]
eʳ = 2.65185298142961e-9

Clearly, this solution is much more accurate than that obtained via the normal equations. The cost is essentially the same, so there's no real downside to using QR. In general, we can use QR implicitly via the backslash operator, which organizes its computations in a slightly different way, and obtains even better accuracy:

xᵇ = A \ b
eᵇ = norm(xᵉ - xᵇ) / norm(xᵉ)
@show xᵇ
@show eᵇ
xᵇ = [0.9999999998372602, 1.9999999998372604, 1.0000000001627405]
eᵇ = 1.1507453610876192e-10