Strong Convexity

CPSC 406 – Computational Optimization

Strong convexity

A function $f:\mathbb{R}^n\to\mathbb{R}$ is $\mu$-strongly convex (with $\mu>0$) if for all $x,y$

\[ f(z) \ge f(x) + \nabla f(x)^T(z-x) + \frac{\mu}{2}\|z-x\|^2 \]

If $f$ is twice continuously differentiable, then $f$ is $\mu$-strongly convex if and only if for all $x$

\[ d^T\nabla^2 f(x) d \ge \mu\|d\|^2 \quad \forall d\in\mathbb{R}^n \quad\iff\quad \nabla^2 f(x) \succeq I\mu \]

Example (Quadratic functions) For a positive definite matrix $A$, the function \[ f(x) = \frac{1}{2}x^TAx + b^Tx + \gamma \] is $\mu$-strongly convex with $\mu=\lambda_{\min}(A)$.

Alternative characterization

A function $f$ is $\mu$-strongly convex if and only if for all $x$

\[ g(x) = f(x) - \frac{\mu}{2}\|x\|^2 \] is convex.

Imples that Tikhonov regularization induces strong convexity

Distance to solution

Lemma 1 (Lipschitz smooth) If $f$ is $L$-smooth, then for all $x$ and all minimizers $x^*$ with $f^*=f(x^*)$, \[ \frac{1}{2L}\|\nabla f(x)\|^2 \le f(x) - f^* \le \frac{L}{2}\|x-x^*\|^2 \]

gradient norm does not bound the distance to the solution

Lemma 2 (Strongly convex) If $f$ is $\mu$-strongly convex, then for all $x$ and all minimizers $x^*$ with $f^*=f(x^*)$, \[ \frac{\mu}{2}\|x-x^*\|^2 \le f(x) - f^* \le \frac{1}{2\mu}\|\nabla f(x)\|^2 \]

Smoothness and strong convexity

$L$ smoothness imples

\[ f(y) \le f(x) + \nabla f(x)^T(y-x) + \frac{L}{2}\|y-x\|^2 \]

$\mu$ strong convexity implies

\[ f(y) \ge f(x) + \nabla f(x)^T(y-x) + \frac{\mu}{2}\|y-x\|^2 \]

together, for all $x,y$

\[ \frac{\mu}{2}\|y-x\|^2 \le f(y) - f(x) - \nabla f(x)^T(y-x) \le \frac{L}{2}\|y-x\|^2 \]

implies Hessian eigenvalues bounded above and below:

\[ \mu I \preceq \nabla^2 f(x) \preceq L I \quad \forall x \]

Linear convergence

Linear convergence with strong convexity

Assume $L$-smoothness and $\mu$-strong convexity. Earlier we deduced

\[ f_{k+1} \le f_k - \frac{1}{2L} \|\nabla f_k\|^2 \]

under strong convexity, $\|\nabla f_k\|^2 \ge 2\mu(f_k - f^*)$, hence

\[ f_{k+1} \le f_k - \frac{\mu}{L}(f_k - f^*) \quad\iff\quad f_{k+1} - f^* \le (1-\frac{\mu}{L})(f_k - f^*) \]

recursing down from $k=T, T-1, \ldots, 0$ gives

\[ f_T - f^* \le (1-\frac{\mu}{L})^T(f_0 - f^*) \le \exp\left(-\frac{\mu}{L}T\right)(f_0 - f^*) \]

if we require $\epsilon \le f_T - f^*$, then require at least $T$ iterations such that

\[ T \ge \frac{L}{\mu}\log\left(\frac{f_0 - f^*}{\epsilon}\right) \quad \text{where $\frac{L}{\mu}$ is the condition number} \]