CPSC 406 – Computational Optimization
\[ \def\argmin{\operatorname*{argmin}} \def\Ball{\mathbf{B}} \def\bmat#1{\begin{bmatrix}#1\end{bmatrix}} \def\Diag{\mathbf{Diag}} \def\half{\tfrac12} \def\int{\mathop{\rm int}} \def\ip#1{\langle #1 \rangle} \def\maxim{\mathop{\hbox{\rm maximize}}} \def\maximize#1{\displaystyle\maxim_{#1}} \def\minim{\mathop{\hbox{\rm minimize}}} \def\minimize#1{\displaystyle\minim_{#1}} \def\norm#1{\|#1\|} \def\Null{{\mathbf{null}}} \def\proj{\mathbf{proj}} \def\R{\mathbb R} \def\Re{\mathbb R} \def\Rn{\R^n} \def\rank{\mathbf{rank}} \def\range{{\mathbf{range}}} \def\sign{{\mathbf{sign}}} \def\span{{\mathbf{span}}} \def\st{\hbox{\rm subject to}} \def\T{^\intercal} \def\textt#1{\quad\text{#1}\quad} \def\trace{\mathbf{trace}} \]
A function \(f:\mathbb{R}^n\to\mathbb{R}\) is \(L\)-smooth (ie, \(L\)-Lipschitz gradient)
\[ \|\nabla f(x) - \nabla f(y)\| \le L\|x-y\| \quad \forall x,y \]
If \(f\) is twice continuously differentiable, then \(f\) is \(L\)-smooth if and only if for all \(x\)
\[ \nabla^2 f(x) \preceq L I \quad\text{ie,}\quad \|\nabla^2 f(x)\|_2 \le L \]
If \(f\) is \(L\)-smooth, then for all \(x,z\)
\[ f(z) \le f(x) + \nabla f(x)^T(z-x) + \frac{L}{2}\|z-x\|^2 \]
means that any \(L\)-smooth function is globally majorized by a quadratic approximation
Projected gradient method for minimizing \(L\)-smooth \(f\) over a convex set \(C\) \[ x_{k+1} = \proj_C(x_k - \alpha \nabla f(x_k)) \]
By the descent lemma, for any \(\alpha\in(0,\frac1L]\),
\[ \begin{aligned} f(z) &\le f(x) + \nabla f(x)^T(z-x) + \frac{L}{2}\|z-x\|^2 \le f(x) + \nabla f(x)^T(z-x) + \frac{1}{2\alpha}\|z-x\|^2 \end{aligned} \]
(projected) gradient descent step minimizes the quadratic upper bound:
\[ \begin{aligned} \proj_C(x - \alpha \nabla f(x)) &= \argmin_{z\in C} \frac{1}{2\alpha}\|z - (x-\alpha\nabla f(x))\|^2 \\ &= \argmin_{z\in C} \frac{\alpha}2\|\nabla f(x)\|^2 + \nabla f(x)^T(z-x)+\frac{1}{2\alpha}\|z-x\|^2 \\ &= \argmin_{z\in C} f(x) + \nabla f(x)^T(z-x) + \frac{1}{2\alpha}\|z-x\|^2 \end{aligned} \]
\[ f_{k+1} \le f_k + \nabla f_k^T(x_{k+1}-x_k) + \frac{L}{2}\|x_{k+1}-x_k\|^2 \]
\[ \begin{aligned} f_{k+1} &\le f_k - \alpha \nabla f_k^T\nabla f_k + \frac{L}{2}\|- \alpha \nabla f_k\|^2 \\ &= f_k - \alpha \|\nabla f_k\|^2 + \frac{L\alpha^2}{2}\|\nabla f_k\|^2 \\ &= f_k - \alpha\left(1-\frac{\alpha L}{2}\right) \|\nabla f_k\|^2 \end{aligned} \]
\[ f_{k+1} < f_k \quad\text{if}\quad \alpha\in(0,2/L) \quad\text{and}\quad \nabla f_k\ne 0 \]
if \(\alpha\in(0,2/L]\) then \(f_{k+1} \le f_k - \alpha\left(1-\frac{\alpha L}{2}\right) \|\nabla f_k\|^2\)
minimize RHS over \(\alpha\in(0,2/L]\) gives \(\alpha^* = 1/L\) and
\[ f_{k+1} \le f_k - \frac{1}{2L} \|\nabla f_k\|^2 \]
\[ \frac1{2L}\alpha\sum_{k=0}^{T}\|\nabla f_k\|^2 \le f(x_0) - f(x_T) \le f(x_0) - f^*, \quad\text{where $f^*$ is min value} \]
That least-squares obective \[ f(x) = \half\|Ax-b\|^2 \] is \(L\)-smooth. What is \(L\)?