Singular Value Decomposition

Abstract

The singular value decomposition (SVD) is a fundamental matrix factorization that generalizes the eigendecomposition to rectangular matrices and, like the QR factorization, provides a robust foundation for solving least squares problems, and can be used for computing matrix ranks, norms, and pseudoinverses.

Slides

View slides in full screen / PDF

Note

This page uses the LinearAlgebra package, which contains norm and svd. We also use the Random package to seed the random number generator.

Library import code

import Random; Random.seed!(1234)
using LinearAlgebra

Brief History

The singular value decomposition (SVD) is one of the most important matrix factorizations in linear algebra. Its theoretical roots extend back to the 19th centry with work by mathematicians like Eugenio Beltrami, Camille Jordan (Stewart 1993). The modern computational approach to the SVD is often credited to Gene Golub and William Kahan, who published a landmark paper in 1965 on how to compute the SVD in a stable manner using the Golub-Kahan bidiagonalization algorithm.

For additional background and applications of the SVD, see the Wikipedia article on the Singular Value Decomposition.

Existence of the SVD

Theorem 1 (Existence) Let $A$ be an $m \times n$ matrix (real or complex) of rank $r$. Then there exist orthonormal (or unitary in the complex case) matrices \[ U \in \mathbb{R}^{m \times m}, \quad V \in \mathbb{R}^{n \times n} \] and a diagonal matrix \[ \Sigma \in \mathbb{R}^{m \times n} \] such that \[ A = U\,\Sigma\,V^T. \] The first $r$ diagonal entries $\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r > 0$ of $\Sigma$ are called the singular values of $A$.

If $A$ is real, then $U$ and $V$ are real orthogonal matrices.
If $A$ is complex, the factorization is written $A = U \Sigma V^*$, and $U$, $V$ are unitary.

Key points about existence

The singular values $\sigma_1, \sigma_2, \ldots, \sigma_r$ are the square roots of the non-zero eigenvalues of $A^T A$ (or $A^* A$ in the complex case).
The columns of $U$ are called the left singular vectors; the columns of $V$ are the right singular vectors.
The rank $r$ of $A$ is exactly the number of non-zero singular values.

Properties of the SVD

Suppose that $A$ is $m \times n$. Given the factorization $A = U \,\Sigma\, V^T$, where $U$ is $(m \times m)$, $\Sigma$ is $(m \times n)$, and $V$ is $(n \times n)$, we can list several important properties:

Orthogonality (or unitarity): \[U^T U = I_m \quad\text{and}\quad V^T V = I_n.\]
Connection to rank:
- The rank of $A$, $\mathrm{rank}(A)$, is equal to the number of non-zero singular values.
- Equivalently, it is the number of non-zero diagonal entries of $\Sigma$.
Spectral norm: \[ \|A\|_2 := \max_{\|x\|_2 = 1} \|Ax\|_2 = \sigma_1, \] the largest singular value of $A$. Geometrically, this is the maximum stretching factor of the linear transformation $A$.
Frobenius morm: \[ \|A\|_F := \sqrt{\sum_{i=1}^r \sigma_i^2} = \sqrt{\mathrm{trace}(A^T A)}, \] a measure of the “size” of all entries of $A$.

Four Fundamental Subspaces

The Singular Value Decomposition (SVD) provides valuable insights into the four fundamental subspaces associated with a matrix $A$. These subspaces are:

Column Space of $A$ ($\range(A)$):
- Spanned by the left singular vectors corresponding to the non-zero singular values.
- These vectors form an orthonormal basis for $\range(A)$.
Null Space of $A$ ($\Null(A)$):
- Spanned by the right singular vectors corresponding to zero singular values.
- These vectors form an orthonormal basis for $\Null(A)$.
Row Space of $A$ ($\range(A^T)$):
- Spanned by the right singular vectors corresponding to the non-zero singular values.
- These vectors form an orthonormal basis for $\range(A^T)$.
Left Null Space of $A$ ($\Null(A^T)$):
- Spanned by the left singular vectors corresponding to zero singular values.
- These vectors form an orthonormal basis for $\Null(A^T)$.

Understanding these subspaces is crucial for comprehending the structure of $A$ and solving related linear algebra problems.

Matrix rank

The Eckart-Young-Mirsky theorem asserts that the best rank $k$ approximation (with k (A)$, in terms of the spectral and Frobenius norms, is given by the truncated SVD: \[A_k = \sum_{j=1}^k \sigma_j\, u_j \, v_j^T.\]

Below we illustrate the best rank-$k$ approximation for a simple image matrix. The original image is shown on the top-left, followed by lower rank approximations.

Code

using Plots, Images, FileIO, LinearAlgebra

img = load("../assets/img/e-small.jpg")
img = convert(AbstractArray{Float16}, img)
U, s, V = svd(img)

function plot_approx(U, s, V, r; kwargs...)
    B = U[:,1:r] * diagm(s[1:r]) * V[:,1:r]'
    B = reverse(B, dims=(1,2))
    heatmap(B, color=:greys, aspect_ratio=:equal, showaxis=false, colorbar=false; kwargs...)
end

l = @layout [a b; c d]
p = []
for r in [length(s), 50, 25, 5]
    push!(p, plot!(plot_approx(U, s, V, r; title="rank=$r", fmt=:png)))
end
plot(p..., layout=l)

Precompiling Images...
    434.5 ms  ✓ Graphics
    410.8 ms  ✓ libsixel_jll
    409.1 ms  ✓ LittleCMS_jll
    783.3 ms  ✓ OpenEXR
    519.9 ms  ✓ OpenJpeg_jll
    545.4 ms  ✓ ImageMagick_jll
  11142.4 ms  ✓ ImageCore
   1928.7 ms  ✓ ImageBase
   2698.9 ms  ✓ JpegTurbo
   2882.0 ms  ✓ ImageMorphology
   2899.6 ms  ✓ PNGFiles
   2933.3 ms  ✓ ImageMagick
  14801.4 ms  ✓ JLD2
   1060.6 ms  ✓ HistogramThresholding
   2075.8 ms  ✓ Sixel
   1348.2 ms  ✓ ImageAxes
   1297.6 ms  ✓ ImageShow
   1456.6 ms  ✓ ImageTransformations
   1372.6 ms  ✓ ImageDistances
   1014.6 ms  ✓ ImageMetadata
   1676.2 ms  ✓ MetaGraphs
   1499.4 ms  ✓ ImageBinarization
   1873.3 ms  ✓ ImageContrastAdjustment
   1286.3 ms  ✓ Netpbm
  23924.7 ms  ✓ TiffImages
    573.2 ms  ✓ ImageIO
  23917.2 ms  ✓ ImageFiltering
   2552.3 ms  ✓ ImageSegmentation
   4411.5 ms  ✓ ImageCorners
   5026.3 ms  ✓ ImageQualityIndexes
   3291.3 ms  ✓ Images
  31 dependencies successfully precompiled in 48 seconds. 178 already precompiled.
Precompiling FileIOExt...
    925.5 ms  ✓ FileIO → HTTPExt
   1520.2 ms  ✓ Plots → FileIOExt
  2 dependencies successfully precompiled in 2 seconds. 189 already precompiled.

Minimum-Norm Least Squares Solution

One of the most important consequences of the SVD is that it provides a direct expression for the minimum-norm solution to the least-squares problem: \[ \min_x \, \|A x - b\|_2. \]

If $A = U \Sigma V^T$ is the SVD of $A$, and we write $b$ in the basis of $U$ as $b = U \hat{b}$, then the least-squares solution is \[ \bar{x} \;=\; \sum_{j=1}^r \frac{\hat{b}_j}{\sigma_j} \, v_j \quad \text{where} \quad \hat{b}_j = u_j^T \, b. \]
In matrix form, we often write $\bar{x} = V \Sigma^+ U^T b$, which leads us to the notion of the pseudo-inverse.

Pseudoinverse of a Matrix

The pseudoinverse of a matrix $A$, denoted $A^+$, is defined via the SVD: \[ A^+ \;=\; V \,\Sigma^+\, U^T, \] where $\Sigma^+$ is constructed by inverting the non-zero singular values on the diagonal of $\Sigma$ and transposing the shape. Concretely, the $i$th diagonal entry is \[ (\Sigma^+)_{ii} = \begin{cases} \displaystyle \frac{1}{\sigma_i}, & \text{if } \sigma_i > 0,\\[6pt] 0, & \text{if } \sigma_i = 0. \end{cases} \]

Key Properties

$A^+ A$ is the orthogonal projector onto the column space of $A$.
$A A^+$ is the orthogonal projector onto the row space of $A$.
The pseudoinverse exists for any matrix $A$, regardless of its shape or rank.
The minimum-norm least-squares solution is given by $x = A^+ b$.

Computational Examples in Julia

Below are a few simple Julia examples demonstrating how to use the SVD in practice.

using LinearAlgebra

# Example 1: Basic SVD
# Generate a random 5 x 3 matrix
A = rand(5, 3)

# Perform the SVD
U, S, V = svd(A);

Verify orthogonality of the columns of $U$ and $V$:

U'*U

3×3 Matrix{Float64}:
  1.0          1.75662e-16  -2.15147e-16
  1.75662e-16  1.0           1.72283e-16
 -2.15147e-16  1.72283e-16   1.0

V'*V

3×3 Matrix{Float64}:
  1.0           5.55112e-17  -1.66533e-16
  5.55112e-17   1.0          -1.11022e-16
 -1.66533e-16  -1.11022e-16   1.0

The singular values are stored in the vector S and indicate the rank of $A$:

rank_A = sum(S .> 1e-10) == rank(A)

true

References

Stewart, G. W. 1993. “On the Early History of the Singular Value Decomposition.” SIAM Review 35 (4): 551–66. https://www.jstor.org/stable/2132388.