Convex Optimality

CPSC 406 – Computational Optimization

Convex Optimality

  • optimality for convex problems
  • normal cone
  • Lagrange multipliers for linearly constrained problems

Optimality

\[ \min_x \set{f(x) \mid x \in C} \]

  • \(f: \Rn \to \Re\) is convex differentiable

  • \(C \subseteq \Rn\) is convex

  • \(x^*\) is optimal if all feasible directions are non-increasing in \(f\)

  • if \(C=\Rn\) the problem is unconstrained

\[ x^* \in \argmin_{x\in\Rn} f(x) \iff 0\le f'(x^*, d) = \nabla f(x^*)^Td \quad\text{for all}\quad x^*+d\in\Rn \]

  • implies \(\nabla f(x^*) = 0\)

Optimality – constrained

\[ x^* \in \argmin_{x\in C} f(x) \iff 0\le f'(x^*, x-x^*) = \nabla f(x^*)^T(x-x^*) \quad \forall x\in C \]

  • does not imply \(\nabla f(x^*) = 0\)

Normal cone

The normal cone to the set \(C\subset\Rn\) at the point \(x\in C\) is the set

\[ \mathcal{N}_C(x) = \set{d\in\Rn \mid d^T(z-x) \leq 0 \quad \forall z\in C} \]

 

 

  • \(\mathcal{N}_C(x_1)\) is the normal to supporting hyperplane \(H_1 = \set{z\in\Rn\mid d^T z\le d^T x_1}\)
  • \(\mathcal{N}_C(x_2) = \set{0}\) because \(x_2\) is an interior point
  • \(\mathcal{N}_C(x_3)\) is the cone of normals at the vertex \(x_3\)

Example

\[ \min_{x\in\R_+} \half(x_1-1)^2 + \half(x_2+1)^2 \]

Solution and gradient:

\[ x^* = \begin{bmatrix}1\\0\end{bmatrix} \qquad \nabla f(x^*) = \begin{bmatrix}x^*_1-1\\x^*_2+1\end{bmatrix} = \begin{bmatrix}0\\1\end{bmatrix} \]

Normal cone at \(x^*(1, 0)\):

\[ \mathcal{N}_{\R_+^2}(x^*) = \Set{\lambda\begin{bmatrix}\phantom-0\\-1\end{bmatrix} : \lambda\ge0} \]

Optimality:

\[ -\nabla f(x^*) \in \mathcal{N}_{\R_+^2}(x^*) \]

Necessary and sufficient optimality

a point \(x^*\in\argmin_{x\in C} f(x)\) if and only if

\[ \nabla f(x^*)^T(x-x^*) \geq 0 \quad \forall x\in C \]

Use the definition of the normal code to deduce the equivalent condition

\[ -\nabla f(x^*) \in \mathcal{N}_C(x^*) \]

Interior point

  • a point \(x\) is in the interior of \(C\) (ie, \(x\in\mathop{\rm int} C\)) if all directions are feasible, ie,

\[ x + \epsilon d\in C \quad \text{$\forall d\in\Rn$ and $\epsilon>0$ small} \]

  • if \(g\in\mathcal{N}_C(x)\) and \(x\in\mathop{\rm int} C\) then for every direction \(d\),

\[ \begin{aligned} 0 \le g^T(z-x) &= \phantom+\epsilon g^T d & \text{for all}\quad z=x+\epsilon d\in C \\0 \le g^T(z-x) &= -\epsilon g^T d & \text{for all}\quad z=x-\epsilon d\in C \end{aligned} \]

  • together, these imply \(g=\), and thus

\[ x\in\mathop{\rm int} C \implies \mathcal{N}_C(x) = \set{0} \]

[aside; the opposite implication is also true, but requires the supporting hyperplane theorem.]

  • unconstrained optimality:

\[ x^*\in\argmin_{x\in\Rn} f(x) \quad \iff\quad -\nabla f(x^*)\in\mathcal{N}_C(x)=\set{0} \quad\iff\quad \nabla f(x^*) = 0 \]

Normal cone to an affine set

\[ C = \set{x\in\Rn \mid Ax=b}, \quad A\in\R^{m\times n}, \quad b\in\R^m \]

For any \(x\in C\), define the translated set

\[ C_x = \set{z-x\mid z\in C} = \Null(A) \]

Then, \[ \begin{aligned} \mathcal{N}_C(x) &= \set{g\mid g^T(z-x) \leq 0 \quad \forall z\in C}\\[10pt] &= \set{g\mid g^Td \leq 0 \quad \forall d\in C_x}\\[10pt] &= \set{g\mid g^Td \leq 0 \quad \forall d\in \Null(A)}\\[10pt] &= \set{g\mid g^Td = 0 \quad \forall d\in \Null(A)}\\[10pt] &= \range(A^T) \end{aligned} \]

Application: Linearly constrained optimization

\[ \min_{x\in\Rn} \set{f(x) \mid Ax=b} \]

a point \(x\in C=\set{x\mid Ax=b}\) is optimal if and only if

\[ - \nabla f(x) \in \mathcal{N}_C(x^*) = \range(A^T)\\ \]

or, equivalently,

\[ \nabla f(x) = A^T y \quad \text{for some $y\in\R^m$} \]

  • the vector \(y=(y_1,\ldots,y_m)\) contains the Lagrange multipliers for each constraint \(a_i^T x = b_i\)