CPSC 406 – Computational Optimization
\[ \def\argmin{\operatorname*{argmin}} \def\Ball{\mathbf{B}} \def\bmat#1{\begin{bmatrix}#1\end{bmatrix}} \def\Diag{\mathbf{Diag}} \def\half{\tfrac12} \def\int{\mathop{\rm int}} \def\ip#1{\langle #1 \rangle} \def\maxim{\mathop{\hbox{\rm maximize}}} \def\maximize#1{\displaystyle\maxim_{#1}} \def\minim{\mathop{\hbox{\rm minimize}}} \def\minimize#1{\displaystyle\minim_{#1}} \def\norm#1{\|#1\|} \def\Null{{\mathbf{null}}} \def\proj{\mathbf{proj}} \def\R{\mathbb R} \def\Re{\mathbb R} \def\Rn{\R^n} \def\rank{\mathbf{rank}} \def\range{{\mathbf{range}}} \def\sign{{\mathbf{sign}}} \def\span{{\mathbf{span}}} \def\st{\hbox{\rm subject to}} \def\T{^\intercal} \def\textt#1{\quad\text{#1}\quad} \def\trace{\mathbf{trace}} \]
\[ \min_x \set{f(x) \mid x \in C} \]
\(f: \Rn \to \Re\) is convex differentiable
\(C \subseteq \Rn\) is convex
\(x^*\) is optimal if all feasible directions are non-increasing in \(f\)
if \(C=\Rn\) the problem is unconstrained
\[ x^* \in \argmin_{x\in\Rn} f(x) \iff 0\le f'(x^*, d) = \nabla f(x^*)^Td \quad\text{for all}\quad x^*+d\in\Rn \]
\[ x^* \in \argmin_{x\in C} f(x) \iff 0\le f'(x^*, x-x^*) = \nabla f(x^*)^T(x-x^*) \quad \forall x\in C \]
The normal cone to the set \(C\subset\Rn\) at the point \(x\in C\) is the set
\[ \mathcal{N}_C(x) = \set{d\in\Rn \mid d^T(z-x) \leq 0 \quad \forall z\in C} \]
\[ \min_{x\in\R_+} \half(x_1-1)^2 + \half(x_2+1)^2 \]
Solution and gradient:
\[ x^* = \begin{bmatrix}1\\0\end{bmatrix} \qquad \nabla f(x^*) = \begin{bmatrix}x^*_1-1\\x^*_2+1\end{bmatrix} = \begin{bmatrix}0\\1\end{bmatrix} \]
Normal cone at \(x^*(1, 0)\):
\[ \mathcal{N}_{\R_+^2}(x^*) = \Set{\lambda\begin{bmatrix}\phantom-0\\-1\end{bmatrix} : \lambda\ge0} \]
Optimality:
\[ -\nabla f(x^*) \in \mathcal{N}_{\R_+^2}(x^*) \]
a point \(x^*\in\argmin_{x\in C} f(x)\) if and only if
\[ \nabla f(x^*)^T(x-x^*) \geq 0 \quad \forall x\in C \]
Use the definition of the normal code to deduce the equivalent condition
\[ -\nabla f(x^*) \in \mathcal{N}_C(x^*) \]
\[ x + \epsilon d\in C \quad \text{$\forall d\in\Rn$ and $\epsilon>0$ small} \]
\[ \begin{aligned} 0 \le g^T(z-x) &= \phantom+\epsilon g^T d & \text{for all}\quad z=x+\epsilon d\in C \\0 \le g^T(z-x) &= -\epsilon g^T d & \text{for all}\quad z=x-\epsilon d\in C \end{aligned} \]
\[ x\in\mathop{\rm int} C \implies \mathcal{N}_C(x) = \set{0} \]
[aside; the opposite implication is also true, but requires the supporting hyperplane theorem.]
\[ x^*\in\argmin_{x\in\Rn} f(x) \quad \iff\quad -\nabla f(x^*)\in\mathcal{N}_C(x)=\set{0} \quad\iff\quad \nabla f(x^*) = 0 \]
\[ C = \set{x\in\Rn \mid Ax=b}, \quad A\in\R^{m\times n}, \quad b\in\R^m \]
For any \(x\in C\), define the translated set
\[ C_x = \set{z-x\mid z\in C} = \Null(A) \]
Then, \[ \begin{aligned} \mathcal{N}_C(x) &= \set{g\mid g^T(z-x) \leq 0 \quad \forall z\in C}\\[10pt] &= \set{g\mid g^Td \leq 0 \quad \forall d\in C_x}\\[10pt] &= \set{g\mid g^Td \leq 0 \quad \forall d\in \Null(A)}\\[10pt] &= \set{g\mid g^Td = 0 \quad \forall d\in \Null(A)}\\[10pt] &= \range(A^T) \end{aligned} \]
\[ \min_{x\in\Rn} \set{f(x) \mid Ax=b} \]
a point \(x\in C=\set{x\mid Ax=b}\) is optimal if and only if
\[ - \nabla f(x) \in \mathcal{N}_C(x^*) = \range(A^T)\\ \]
or, equivalently,
\[ \nabla f(x) = A^T y \quad \text{for some $y\in\R^m$} \]