Singular Value Decomposition

Abstract

The singular value decomposition (SVD) is a fundamental matrix factorization that generalizes the eigendecomposition to rectangular matrices and, like the QR factorization, provides a robust foundation for solving least squares problems, and can be used for computing matrix ranks, norms, and pseudoinverses.

View slides in full screen / PDF

Note

This page uses the LinearAlgebra package, which contains norm and svd. We also use the Random package to seed the random number generator.

Library import code
import Random; Random.seed!(1234)
using LinearAlgebra

Brief History

The singular value decomposition (SVD) is one of the most important matrix factorizations in linear algebra. Its theoretical roots extend back to the 19th centry with work by mathematicians like Eugenio Beltrami, Camille Jordan (Stewart 1993). The modern computational approach to the SVD is often credited to Gene Golub and William Kahan, who published a landmark paper in 1965 on how to compute the SVD in a stable manner using the Golub-Kahan bidiagonalization algorithm.

For additional background and applications of the SVD, see the Wikipedia article on the Singular Value Decomposition.

Existence of the SVD

Theorem 1 (Existence) Let \(A\) be an \(m \times n\) matrix (real or complex) of rank \(r\). Then there exist orthonormal (or unitary in the complex case) matrices \[ U \in \mathbb{R}^{m \times m}, \quad V \in \mathbb{R}^{n \times n} \] and a diagonal matrix \[ \Sigma \in \mathbb{R}^{m \times n} \] such that \[ A = U\,\Sigma\,V^T. \] The first \(r\) diagonal entries \(\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r > 0\) of \(\Sigma\) are called the singular values of \(A\).

  • If \(A\) is real, then \(U\) and \(V\) are real orthogonal matrices.
  • If \(A\) is complex, the factorization is written \(A = U \Sigma V^*\), and \(U\), \(V\) are unitary.

Key points about existence

  1. The singular values \(\sigma_1, \sigma_2, \ldots, \sigma_r\) are the square roots of the non-zero eigenvalues of \(A^T A\) (or \(A^* A\) in the complex case).
  2. The columns of \(U\) are called the left singular vectors; the columns of \(V\) are the right singular vectors.
  3. The rank \(r\) of \(A\) is exactly the number of non-zero singular values.

Properties of the SVD

Suppose that \(A\) is \(m \times n\). Given the factorization \(A = U \,\Sigma\, V^T\), where \(U\) is \((m \times m)\), \(\Sigma\) is \((m \times n)\), and \(V\) is \((n \times n)\), we can list several important properties:

  1. Orthogonality (or unitarity): \[U^T U = I_m \quad\text{and}\quad V^T V = I_n.\]

  2. Connection to rank:

    • The rank of \(A\), \(\mathrm{rank}(A)\), is equal to the number of non-zero singular values.
    • Equivalently, it is the number of non-zero diagonal entries of \(\Sigma\).
  3. Spectral norm: \[ \|A\|_2 := \max_{\|x\|_2 = 1} \|Ax\|_2 = \sigma_1, \] the largest singular value of \(A\). Geometrically, this is the maximum stretching factor of the linear transformation \(A\).

  4. Frobenius morm: \[ \|A\|_F := \sqrt{\sum_{i=1}^r \sigma_i^2} = \sqrt{\mathrm{trace}(A^T A)}, \] a measure of the “size” of all entries of \(A\).

Four Fundamental Subspaces

The Singular Value Decomposition (SVD) provides valuable insights into the four fundamental subspaces associated with a matrix \(A\). These subspaces are:

  1. Column Space of \(A\) (\(\range(A)\)):
    • Spanned by the left singular vectors corresponding to the non-zero singular values.
    • These vectors form an orthonormal basis for \(\range(A)\).
  2. Null Space of \(A\) (\(\Null(A)\)):
    • Spanned by the right singular vectors corresponding to zero singular values.
    • These vectors form an orthonormal basis for \(\Null(A)\).
  3. Row Space of \(A\) (\(\range(A^T)\)):
    • Spanned by the right singular vectors corresponding to the non-zero singular values.
    • These vectors form an orthonormal basis for \(\range(A^T)\).
  4. Left Null Space of \(A\) (\(\Null(A^T)\)):
    • Spanned by the left singular vectors corresponding to zero singular values.
    • These vectors form an orthonormal basis for \(\Null(A^T)\).

Understanding these subspaces is crucial for comprehending the structure of \(A\) and solving related linear algebra problems.

Matrix rank

The Eckart-Young-Mirsky theorem asserts that the best rank \(k\) approximation (with k (A)$, in terms of the spectral and Frobenius norms, is given by the truncated SVD: \[A_k = \sum_{j=1}^k \sigma_j\, u_j \, v_j^T.\]

Below we illustrate the best rank-\(k\) approximation for a simple image matrix. The original image is shown on the top-left, followed by lower rank approximations.

Code
using Plots, Images, FileIO, LinearAlgebra

img = load("../assets/img/e-small.jpg")
img = convert(AbstractArray{Float16}, img)
U, s, V = svd(img)

function plot_approx(U, s, V, r; kwargs...)
    B = U[:,1:r] * diagm(s[1:r]) * V[:,1:r]'
    B = reverse(B, dims=(1,2))
    heatmap(B, color=:greys, aspect_ratio=:equal, showaxis=false, colorbar=false; kwargs...)
end

l = @layout [a b; c d]
p = []
for r in [length(s), 50, 25, 5]
    push!(p, plot!(plot_approx(U, s, V, r; title="rank=$r", fmt=:png)))
end
plot(p..., layout=l)

Minimum-Norm Least Squares Solution

One of the most important consequences of the SVD is that it provides a direct expression for the minimum-norm solution to the least-squares problem: \[ \min_x \, \|A x - b\|_2. \]

  • If \(A = U \Sigma V^T\) is the SVD of \(A\), and we write \(b\) in the basis of \(U\) as \(b = U \hat{b}\), then the least-squares solution is \[ \bar{x} \;=\; \sum_{j=1}^r \frac{\hat{b}_j}{\sigma_j} \, v_j \quad \text{where} \quad \hat{b}_j = u_j^T \, b. \]
  • In matrix form, we often write \(\bar{x} = V \Sigma^+ U^T b\), which leads us to the notion of the pseudo-inverse.

Pseudoinverse of a Matrix

The pseudoinverse of a matrix \(A\), denoted \(A^+\), is defined via the SVD: \[ A^+ \;=\; V \,\Sigma^+\, U^T, \] where \(\Sigma^+\) is constructed by inverting the non-zero singular values on the diagonal of \(\Sigma\) and transposing the shape. Concretely, the \(i\)th diagonal entry is \[ (\Sigma^+)_{ii} = \begin{cases} \displaystyle \frac{1}{\sigma_i}, & \text{if } \sigma_i > 0,\\[6pt] 0, & \text{if } \sigma_i = 0. \end{cases} \]

Key Properties

  • \(A^+ A\) is the orthogonal projector onto the column space of \(A\).
  • \(A A^+\) is the orthogonal projector onto the row space of \(A\).
  • The pseudoinverse exists for any matrix \(A\), regardless of its shape or rank.
  • The minimum-norm least-squares solution is given by \(x = A^+ b\).

Computational Examples in Julia

Below are a few simple Julia examples demonstrating how to use the SVD in practice.

using LinearAlgebra

# Example 1: Basic SVD
# Generate a random 5 x 3 matrix
A = rand(5, 3)

# Perform the SVD
U, S, V = svd(A);

Verify orthogonality of the columns of \(U\) and \(V\):

U'*U
3×3 Matrix{Float64}:
  1.0          1.75662e-16  -2.15147e-16
  1.75662e-16  1.0           1.72283e-16
 -2.15147e-16  1.72283e-16   1.0
V'*V
3×3 Matrix{Float64}:
  1.0           5.55112e-17  -1.66533e-16
  5.55112e-17   1.0          -1.11022e-16
 -1.66533e-16  -1.11022e-16   1.0

The singular values are stored in the vector S and indicate the rank of \(A\):

rank_A = sum(S .> 1e-10) == rank(A)
true

References

Stewart, G. W. 1993. “On the Early History of the Singular Value Decomposition.” SIAM Review 35 (4): 551–66. https://www.jstor.org/stable/2132388.