This page uses the LinearAlgebra
package, which contains norm
and qr
.
using LinearAlgebra
Orthogonal projection and QR decomposition of UBC Math 307 lecture notes
This page uses the LinearAlgebra
package, which contains norm
and qr
.
using LinearAlgebra
For any two -vectors and , the cosine identity
relates the inner product of the vectors to the the angle between them. Thus, the vectors and are orthogonal when
Orthogonal vectors are orthonormal when they're also normalized:
These definitions extend readily to a collection of vectors: the set of vectors in is orthogonal (and orthonormal) whenever
The 2-dimensional canonical unit vectors and are orthonormal:
e₁ = [1, 0]; e₂ = [0, 1]
@show e₁'e₂
@show norm(e₁)
@show norm(e₂)
e₁' * e₂ = 0
norm(e₁) = 1.0
norm(e₂) = 1.0
A square -by- matrix is orthogonal when its columns are orthonormal:
Because a matrix is the inverse of a matrix if and only if , the inverse of an orthogonal matrix is its transpose:
An orthogonal matrix rotates a vector, but doesn't change its length. Thus inner products are invariant under orthogonal transformations, i.e., for any vectors and ,
and so
Another property of orthogonal matrices is that their determinant is either or . This can be observed from the fact that for any compatible matrices and , and . Hence,
Every -by- matrix has a QR factorization, which means it can be decomposed as the product
where
respectively, are an -by- orthogonal matrix and an -by- upper-triangular matrix. This factorization isn't unique, unless we impose additional conditions on the factors and .
Householder reflectors and Givens rotations are the preferred methods for computing the QR factorization. These methods require flops to complete.
The columns of the orthogonal matrix reveal an orthogonal basis for the range of and so triangularity of gives a simple "recipe" for reconstructing the columns of from those of :
From these formulas we deduce a key property of this factorization: for any index , the span of the leading columns of contain the span of the leading columns of , i.e.,
If we further assume that has full column rank, this inclusion tightens to an equality, which tells us that leading columns of provide a tight basis for the corresponding columns.
This theorem tells us that a full column rank matrix can then be decomposed as
where the triangular submatrix has no zeros on its diagonal. The factorization is the thin or economy QR.
In Julia, we can compute the full QR decomposition of a matrix using via
m, n = 4, 3
A = randn(m,n)
Q, R = qr(A)
LinearAlgebra.QRCompactWY{Float64, Matrix{Float64}, Matrix{Float64}}
Q factor:
4×4 LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}:
-0.589037 -0.609377 0.391674 0.35817
0.594233 -0.0982621 0.0108708 0.798194
-0.547289 0.568937 -0.37926 0.482646
0.0199064 0.543429 0.838233 0.0406633
R factor:
3×3 Matrix{Float64}:
-1.64787 -1.95628 1.26521
0.0 2.8769 -0.0479272
0.0 0.0 1.30317
and we can verify the orthogonality of Q
:
@show norm(Q'*Q - I)
@show norm(Q*Q' - I)
norm(Q' * Q - I) = 5.928593550334434e-16
norm(Q * Q' - I) = 4.896838930763021e-16
Julia stores the factor Q
as a series of Householder reflectors.
To extract the "thin" QR factorization from qr
, use the Matrix
conversion function:
F = qr(A)
Q₁ = Matrix(F.Q)
R₁ = F.R
@show size(Q₁)
@show size(R₁)
@show norm(A - Q₁*R₁)
size(Q₁) = (4, 3)
size(R₁) = (3, 3)
norm(A - Q₁ * R₁) = 7.494005416219807e-16
The QR factorization can be used to solve the least squares problem
Assume throughout this section that has full columns rank, and hence . (The QR factorization also can be used to solve the more general problem, but we don't consider that case here.) Let be the QR factorization shown in (1). Because the 2-norm is invariant under orthogonal rotation,
Hence, minimizing also minimizes . Because is full rank, is nonsingular, and the least-squares solution is obtained as the unique solution of the system
The procedure for solving least-squares problems via the QR factorization can then be summarized as follows:
Algorithm: (Least-squares via QR)
compute the thin QR factorization
backsolve the triangular linear system
Here's a simple random example in Julia:
m, n = 4, 3
A = randn(m, n)
b = randn(m)
F = qr(A)
Q₁, R₁ = Matrix(F.Q), F.R
x = R₁ \ Q₁'b # <--- do NOT be tempted to use x = inv(R₁)*Q₁'b
and we can verify that x
is the least-squares solution by verifying that the residual is orthogonal to the columns of :
r = b - A*x
@show norm(A'r)
norm(A' * r) = 3.820983142562135e-15
The last line of the derivation shown in (3) asserts that the norm of the residual is equal to the norm of . Let's check:
Q₂ = F.Q[:,n+1:end]
norm(Q₂'b) ≈ norm(r)
true
A solution of a full-rank least-squares problem (2) can be obtained as the solution of the normal equations:
This approach, however, can in some cases suffer from numerical instability, which means that floating-point computations used by most computers yield solutions with unacceptable errors. The QR approach, on the other hand, often allows us to solve a wider-range of problems. The next example, borrowed from the course on Fundamentals of Numerical Computation, illustrates this point.
The Pythagorean identity asserts that
for all values of . This implies that the matrix
for any values , doesn't have full column rank. Let's define a slightly perturbed version of this matrix that does have full column rank:
θ = LinRange(0,3,400)
ε = 1e-7
A = @. [sin(θ)^2 cos(θ+ε)^2 θ^0]
We can check that this matrix has full column rank:
rank(A) == 3
true
Now create a right-hand side that corresponds to a known solution xᵉ
:
xᵉ = [1., 2., 1.]
b = A*xᵉ
Here's the solution obtained via the normal equations, and its corresponding relative error:
xⁿ = A'A \ A'b
eⁿ = norm(xᵉ - xⁿ) / norm(xᵉ)
@show xⁿ
@show eⁿ
xⁿ = [1.0769230768721778, 2.0769230770796003, 0.9230769230769231]
eⁿ = 0.054392829346937996
This solution has only about 1 digit of accuracy.
Here's the solution obtained via the QR factorization, and its corresponding relative error:
Q, R = qr(A); Q = Matrix(Q)
xʳ = R \ (Q'b)
eʳ = norm(xᵉ - xʳ) / norm(xᵉ)
@show xʳ
@show eʳ
xʳ = [0.9999999962497133, 1.9999999962497135, 1.000000003750286]
eʳ = 2.65185298142961e-9
Clearly, this solution is much more accurate than that obtained via the normal equations. The cost is essentially the same, so there's no real downside to using QR. In general, we can use QR implicitly via the backslash operator, which organizes its computations in a slightly different way, and obtains even better accuracy:
xᵇ = A \ b
eᵇ = norm(xᵉ - xᵇ) / norm(xᵉ)
@show xᵇ
@show eᵇ
xᵇ = [0.9999999998372602, 1.9999999998372604, 1.0000000001627405]
eᵇ = 1.1507453610876192e-10