Projected Gradient Descent

CPSC 406 – Computational Optimization

Projected gradient descent

  • projection onto a convex set
  • gradient projection method

Orthogonal projection

For a set \(C\subset\Rn\) closed convex, the projection of a point \(x\in\Rn\) onto \(C\) is the point

\[ \proj_C(x) = \argmin_{z\in C} \|x-z\| \]

Properties:

  1. if \(x\in C\), then \(\proj_C(x)=x\)
  2. \(\proj_C(x)\) is unique (objective is strictly convex)
  3. \(z=\proj_C(x) \iff z\in C \quad\text{and}\quad (x-z)^T(y-z)\le 0 \quad\forall y\in C\)

proof of (3):

  • Let \(g(z) = \half\|x-z\|^2\). By optimality,

\[ z=\proj_C(x) \iff -\nabla g(z) = x-z \in \mathcal{N}_C(z) \]

Projection onto 2-norm ball

\[ C = \set{x\in\Rn \mid \|x\|_2 \le \alpha} = \alpha𝔹_2 \quad (\alpha\ge0) \]


\[ \proj_C(x) = \begin{cases} x & \text{if } \|x\|_2 \le \alpha \\ \alpha\frac{x}{\|x\|_2} & \text{if } \|x\|_2 > \alpha \end{cases} \]

 

 

Projection onto positive orthant

\[ C = \Rn_+ = \set{x\in\Rn \mid x\ge 0} \]


\[ \proj_C(x) = \begin{bmatrix} \max\set{0, x_1}\\ \vdots\\ \max\set{0, x_n} \end{bmatrix} \]

 

 

Projection onto infinity-norm ball

\[ C = \set{x\in\Rn \mid \|x\|_\infty \le \alpha} = \alpha𝔹_\infty \quad (\alpha\ge0) \qquad \|x\|_\infty = \max_{i=1,\ldots,n} |x_i| \]


\[ \proj_C(x) = \begin{bmatrix} \sign(x_1)\cdot\min\set{\alpha,|x_1|}\\ \vdots\\ \sign(x_n)\cdot\min\set{\alpha,|x_n|}\\ \end{bmatrix} \]

 

 

Projection onto affine set

\[ C = \set{x\in\Rn \mid Ax=b} \quad A\in\R^{m\times n}, b\in\R^m \]


\[ \proj_C(x) = \argmin\set{\half\|z-x\|^2\mid Az=b} \]

Because \(\mathcal{N}_C = \range(A^T)\), the optimality condition is \[ x - \proj_C(x) \in \range(A^T) \]

Example: de-biasing

\[ C = \set{x\in\Rn\mid e^Tx = 0}, \quad e = (1,1,\ldots,1), \quad e^T x = \sum_{i=1}^n x_i \]

By optimality conditions of the problem \[ \proj_C(x) = \argmin\set{\half\|z-x\|^2\mid e^Tz=0} \] we have \[ x - \proj_C(x) \in \range(e) = \alpha e \quad \text{for some $\alpha\in\R$} \] To find \(\alpha\), premultiply by \(e^T\): \[ e^T(x - \proj_C(x)) = \alpha e^Te = n \quad\implies\quad \alpha = \frac{e^Tx}{n} = \mathbf{avg}(x) \] Thus, \[ \proj_C(x) = x - \frac{e^Tx}{n}e \]

Projected gradient descent

\[ \min_{x} \set{f(x) \mid x\in C} \]

  • \(f:\Rn\to\Re\) convex and smooth
  • \(C\subset\Rn\)

Algorithm:

  • Start from \(x_0\in C\)
  • For \(k=0,1,2,\ldots\)
    • \(g_k = \nabla f(x_k)\)
    • linesearch on \(\phi(\alpha)= f(\proj_C(x_k - \alpha g_k))\) (eg, backtracking, constant stepsize, etc.)
    • \(x_{k+1} = \proj_C(x_k - \alpha_k g_k)\)
    • stop if \(\|x_{k+1}-x_k\|\) is small

Stationarity

\[ x^*\in\argmin_{x\in C} f(x) \quad\iff\quad x^* = \proj_C(x^* - \alpha\nabla f(x^*))\quad\forall \alpha>0 \]

By projection theorem:

\[ (x^* - \alpha\nabla f(x^*) - x^*)^T(z-x^*) \le 0 \quad\forall z\in C \]

equivalently,

\[ -\alpha \nabla f(x^*)^T(z-x^*) \le 0 \quad\forall z\in C \]

Use definition of Normal cone to deduce equivalent condition:

\[ -\nabla f(x^*) \in \mathcal{N}_C(x^*) \]