QR Factorization

CPSC 406 – Computational Optimization

Overview

\[ \def\argmin{\operatorname*{argmin}} \def\Ball{\mathbf{B}} \def\bmat#1{\begin{bmatrix}#1\end{bmatrix}} \def\Diag{\mathbf{Diag}} \def\half{\tfrac12} \def\ip#1{\langle #1 \rangle} \def\maxim{\mathop{\hbox{\rm maximize}}} \def\maximize#1{\displaystyle\maxim_{#1}} \def\minim{\mathop{\hbox{\rm minimize}}} \def\minimize#1{\displaystyle\minim_{#1}} \def\norm#1{\|#1\|} \def\Null{{\mathbf{null}}} \def\proj{\mathbf{proj}} \def\R{\mathbb R} \def\Rn{\R^n} \def\rank{\mathbf{rank}} \def\range{{\mathbf{range}}} \def\span{{\mathbf{span}}} \def\st{\hbox{\rm subject to}} \def\T{^\intercal} \def\textt#1{\quad\text{#1}\quad} \def\trace{\mathbf{trace}} \]

  • Orthogonality
  • QR properties
  • Solution of linear least-squares problems

Orthogonal vectors

Two vectors \(x\) and \(y\) in \(\Rn\)

  • recall cosine identity \[ x\T y = \|x\|_2\|y\|_2\cos\theta \]

  • \(x\) and \(y\) in \(\mathbb{R}^n\) are orthogonal if \[ x\T y = 0 \quad (\cos\theta = 0) \]

  • \(x\) and \(y\) are orthonormal if \[ x\T y = 0, \quad x\T x = 1, \quad y\T y = 1 \]

  • a set of orthogonal \(n\)-vectors \(\{q_1,\ldots,q_m\}\) are linearly independent

    • if \(m=n\) then it’s a basis for \(\Rn\)

Orthogonal matrices

An \(n\times r\) matrix \(Q\) is orthonormal if its columns are pairwise orthonormal: \[ Q = [q_1\ | \cdots |\ q_r], \qquad Q\T Q = \begin{bmatrix} q_1\T q_1 & q_2\T q_1 & \cdots & q_r\T q_1 \\ q_1\T q_2 & q_2\T q_2 & \cdots & q_r\T q_2 \\ \vdots & \vdots & \ddots & \vdots \\ q_1\T q_r & q_2\T q_r & \cdots & q_r\T q_r \end{bmatrix} = I_r \]

  • if \(r=n\) (ie, \(Q\) is square) then \(Q\) is orthogonal \[ Q^{-1} = Q\T \quad\text{and}\quad Q\T Q = QQ\T = I_n \]

  • orthogonal transformations preserve lengths and angles \[ \|x\|_2 = \|Qx\|_2 \textt{and} x\T y = x\T Q\T Qy = (Qx)\T(Qy) \textt{and} \det(Q) = \pm 1 \]

QR Factorization

 

 

where

  1. \(Q\) is orthogonal (\(Q\T Q = Q Q\T = I_m\))
  2. \(\hat R\) is upper triangular (\(\hat R_{ij} = 0\) for \(i > j\))
  3. \(\range(\hat{Q})=\range(A)\)
  4. \(\range(\bar{Q})=\range(A)^\perp \equiv \Null(A\T)\)
using LinearAlgebra
Q, R = qr(A)

 

Reduced QR Factorization

For \(A\) \(m\times n\) with \(m\geq n\), full rank

 

 

taking column by column, with \(\hat Q=[q_1\ |\ \cdots\ |\ q_n]\) \[ \begin{align*} a_1 &= r_{11}q_1 \\ a_2 &= r_{12}q_1 + r_{22}q_2 \\ a_3 &= r_{13}q_1 + r_{23}q_2 + r_{33}q_3 \\ &\vdots \\ a_n &= r_{1n}q_1 + r_{2n}q_2 + \cdots + r_{nn}q_n \end{align*} \]

Nonsingular equations with QR

Given \(n\times n\) matrix \(A\), full rank, solve \[ A x = b \] solve by QR factorization \(A = QR\) and \(Q\T Q = I\)

mathematically

\[ x = A^{-1}b = (QR)^{-1}b = R^{-1}Q^{-1}b = R^{-1}Q\T b \]

computationally

 

using LinearAlgebra
Q, R = qr(A)   # O(n^3) <-- dominant cost
y = Q'b        # O(n^2) <-- matrix-vector multiply 
x = R \ y      # O(n^2) <-- triangular solve

 

Solving Least-Squares via QR

\[ \min_{x\in\Rn}\ \|Ax-b\|^2, \qquad A = QR \]

\[ \begin{align} \|Ax-b\|^2 &= (Ax-b)\T(Ax-b) \\ &= (Ax-b)\T Q Q\T(Ax-b) \\ &= \|Q\T(Ax-b)\|^2 \\ &= \left\|\begin{bmatrix} \hat R \\ 0 \end{bmatrix}x - \begin{bmatrix} \hat Q\T \\ \bar Q\T \end{bmatrix}b\right\|^2 \\ &= \underbrace{\color{DarkRed}{\|\hat R x - \hat Q\T b\|^2}}_{\small (1)} + \underbrace{\color{Green}{\|\bar Q\T b\|^2}}_{\small (2)} \end{align} \]

where (1) is minimized when \(\hat Rx=\hat Q\T b\) and (2) is constant

Solving Least-Squares via QR

\[ \min_{x\in\Rn}\ \|Ax-b\|^2, \qquad A = QR \]

mathematically

\[ \begin{align} A\T A x &= A\T b \\ R\T Q\T Q R x &= R\T Q\T b \\ Rx &= Q\T b\\ x &= R^{-1}Q\T b \end{align} \]

computationally

 

using LinearAlgebra
F = qr(A)                # O(n^3) <-- dominant cost
Q, R = Matrix(F.Q), F.R  # extract _thin_ Q, and R
y = Q'b                  # O(n^2) <-- matrix-vector multiply
x = R \ y                # O(n^2) <-- triangular solve

 

more numerically stable than solving \(A\T Ax = A\T b\) directly

Accuracy of QR vs Normal Equations

For \(\epsilon\) positive, this matrix has full rank because \(\sin^2(\theta)+\cos^2(\theta)=1\) \[ A = \begin{bmatrix} sin^2(\theta_1) & cos^2(\theta_1+\epsilon) & 1 \\ sin^2(\theta_2) & cos^2(\theta_2+\epsilon) & 1 \\ \vdots & \vdots & \vdots \\ sin^2(\theta_m) & cos^2(\theta_m+\epsilon) & 1 \\ \end{bmatrix} \]

using LinearAlgebra
θ = LinRange(0,3,400)
ε = 1e-7
A = @. [sin(θ)^2   cos+ε)^2   θ^0]
xᵉ = [1., 2., 1.]
b  = A*xᵉ

xn = A'A \ A'b               # Compute xn via normal equations

Q, R = qr(A); Q = Matrix(Q)  # Compute xr via QR
xr = R \ (Q'b)

xb = A \ b                   # Compute xb via backslash

@show xn xr xb;
xn = [0.9124057128050078, 1.9124057125688128, 1.0875942872529518]
xr = [0.999999999633748, 1.999999999633748, 1.0000000003662528]
xb = [0.9999999998426099, 1.99999999984261, 1.0000000001573903]