CPSC 406 – Computational Optimization
\[ \def\argmin{\operatorname*{argmin}} \def\Ball{\mathbf{B}} \def\bmat#1{\begin{bmatrix}#1\end{bmatrix}} \def\Diag{\mathbf{Diag}} \def\half{\tfrac12} \def\ip#1{\langle #1 \rangle} \def\maxim{\mathop{\hbox{\rm maximize}}} \def\maximize#1{\displaystyle\maxim_{#1}} \def\minim{\mathop{\hbox{\rm minimize}}} \def\minimize#1{\displaystyle\minim_{#1}} \def\norm#1{\|#1\|} \def\Null{{\mathbf{null}}} \def\proj{\mathbf{proj}} \def\R{\mathbb R} \def\Rn{\R^n} \def\rank{\mathbf{rank}} \def\range{{\mathbf{range}}} \def\span{{\mathbf{span}}} \def\st{\hbox{\rm subject to}} \def\T{^\intercal} \def\textt#1{\quad\text{#1}\quad} \def\trace{\mathbf{trace}} \]
\[ \min_{x\in\Rn}\ f(x), \quad f_k = f(x^k), \quad g_k = \nabla f(x^k), \quad H_k = \nabla^2 f(x^k) \]
\[ \begin{aligned} q_k(x) &:= f_k + g_k^T(x-x^k) + \half(x-x^k)^T H_k (x-x^k) \\ &\le f_k + g_k^T(x-x^k) + \half L \|x-x^k\|^2 =: \hat q_k(x) \end{aligned} \]
\[ 0 = \nabla \hat q_k(\hat x) = g_k + L(\hat x-x^k) \]
\[ \bar x = x^k - \frac1L g_k \]
\[ q_k(x) = f_k + g_k^T(x-x^k) + \half(x-x^k)^T H_k (x-x^k), \quad g_k = \nabla f(x^k), \quad H_k = \nabla^2 f(x^k)\succ 0 \]
\[ 0 = \nabla q_k(\bar x) = g_k + H_k(\bar x - x^k) \quad\Longleftrightarrow\quad \bar x = x^k - H_k^{-1} g_k \]
\[ x^{k+1} = x^k + \underbrace{d_N^k}_{=\rlap{\text{Newton direction}}}, \qquad H_k d_N^k = -g_k \]
\[ x^{k+1} = x^k + \alpha d_N^k, \qquad H_k d_N^k = -g_k \]
Example
\[ f(x) = \sqrt{1+x^2}, \quad \nabla f(x) = \frac{x}{\sqrt{1+x^2}}, \quad \nabla^2 f(x) = \frac{1}{(1+x^2)^{3/2}} \]
\[ x^{k+1} = x^k - \frac{f'(x^k)}{f''(x^k)} = -(x^k)^3\qquad\qquad \]
\[ x^k \to \begin{cases} 0 & \text{if } |x^0| < 1 \\ \pm 1 & \text{if } |x^0| =1 \\ \infty & \text{if } |x^0| > 1 \end{cases} \]
Suppose \(f:\Rn\to\R\) is twice continuously differentiable and
\[ \|x^{k+1}-x^*\| \le \frac{L}{2\epsilon} \|x^k-x^*\|^2 \]
\[ \|x^{k+1}-x^*\| \le \left(\frac{2\epsilon}L\right)\left(\frac14\right)^{2^k} \]
\[ f(x) = 100(x_2-x_1^2)^2+(1-x_1)^2 \]
k fval
1 1.0100e+02
2 6.7230e+01
3 1.9074e+00
4 1.5506e+00
5 1.1674e+00
6 8.3524e-01
7 6.1188e-01
8 3.8893e-01
9 3.8636e-01
10 1.3032e-01
11 9.0166e-02
12 3.1699e-02
13 2.9670e-02
14 1.3869e-03
15 1.7446e-04
16 3.6871e-08
17 1.3610e-13
18 2.2550e-26
k fval
1 1.0100e+02
100 1.4702e+00
200 1.4543e+00
300 1.4345e+00
400 1.4200e+00
500 1.4059e+00
600 1.3918e+00
700 1.3776e+00
800 1.3633e+00
900 1.3490e+00
1000 1.3347e+00
\[ x^T A x > 0 \textt{for all} x\ne 0 \]
\[ 0 < x^T A x = x^T(\lambda x) = \lambda x^T x = \lambda \|x\|^2 \]
if \(A\succ 0\), then \(A=R^T R\) for some nonsingular upper triangular \(R\) \[ A = \begin{bmatrix} a_{11} & w^T \\ w & K \end{bmatrix} = \underbrace{\begin{bmatrix} \alpha & \\ w/\alpha & I \end{bmatrix}}_{R_1^T} \underbrace{\begin{bmatrix} 1 & \\ & K - ww^T/\alpha^2 \end{bmatrix}}_{A_1} \underbrace{\begin{bmatrix} \alpha & w^T/\alpha \\ & I \end{bmatrix}}_{R_1} \quad \alpha := \sqrt{a_{11}} \]
\(A\succ 0 \ \Longleftrightarrow\ K-ww^T/\alpha^2 \succ 0\), thus apply above factorization to \(K-ww^T/\alpha^2\): \[ K - ww^T/\alpha^2 = \bar{R}^T_2 \bar{A}_2\bar{R}_2, \]
recursively apply to obtain \(A=R^T R\) \[ \begin{aligned} A = R_1^T \begin{bmatrix} 1 & \\ & \bar{R}_2^T\bar{A}_2\bar{R}_2 \end{bmatrix} R_1 &= R_1^T\underbrace{\begin{bmatrix} 1 & \\ & \bar{R}_2^T\end{bmatrix}}_{R_2^T} \underbrace{\begin{bmatrix} 1 & \\ & \bar{A}_2\end{bmatrix}}_{A_2} \underbrace{\begin{bmatrix} 1 & \\ & \bar{R}_2 \end{bmatrix}}_{R_2} R_1 \\ &= R_1^T R_2^T A_2 R_2 R_1 = \cdots = \underbrace{(R_1^T R_2^T \cdots R_n^T)}_{R^T} \underbrace{(R_n \cdots R_2 R_1)}_R \end{aligned} \]
\[ A = R^T R \qquad \text{for some nonsingular upper triangular } R \]
3×3 LowerTriangular{Float64, Matrix{Float64}}:
2.0 ⋅ ⋅
6.0 1.0 ⋅
-8.0 5.0 3.0
\[ H_k d_N^k = -g_k, \textt{where} H_k = \nabla^2 f(x^k), \quad g_k = \nabla f(x^k) \]
choose \(\epsilon>0\) small
\(H_k = U \Lambda U^T, \quad \Lambda = \Diag(\lambda_1,\ldots,\lambda_n)\)
\(\bar\lambda_i = \max(\lambda_i, \epsilon)\)
\(\bar\Lambda = \Diag(\bar\lambda_1,\ldots,\bar\lambda_n)\)
solve \(U\bar\Lambda U^T d_N^k = -g_k\)
\(A=QR\) can be used to solve linear systems or least-squares problems \[ Ax = b \quad\Longleftrightarrow\quad Rx = Q^T b \]
if \(A\succ0\), other factorizations are available:
why?