Strong Convexity

CPSC 406 – Computational Optimization

Strong convexity

A function \(f:\mathbb{R}^n\to\mathbb{R}\) is \(\mu\)-strongly convex (with \(\mu>0\)) if for all \(x,y\)

\[ f(z) \ge f(x) + \nabla f(x)^T(z-x) + \frac{\mu}{2}\|z-x\|^2 \]


If \(f\) is twice continuously differentiable, then \(f\) is \(\mu\)-strongly convex if and only if for all \(x\)

\[ d^T\nabla^2 f(x) d \ge \mu\|d\|^2 \quad \forall d\in\mathbb{R}^n \quad\iff\quad \nabla^2 f(x) \succeq I\mu \]

Example (Quadratic functions) For a positive definite matrix \(A\), the function \[ f(x) = \frac{1}{2}x^TAx + b^Tx + \gamma \] is \(\mu\)-strongly convex with \(\mu=\lambda_{\min}(A)\).

Alternative characterization

A function \(f\) is \(\mu\)-strongly convex if and only if for all \(x\)

\[ g(x) = f(x) - \frac{\mu}{2}\|x\|^2 \] is convex.

  • Imples that Tikhonov regularization induces strong convexity

Distance to solution

Lemma 1 (Lipschitz smooth) If \(f\) is \(L\)-smooth, then for all \(x\) and all minimizers \(x^*\) with \(f^*=f(x^*)\), \[ \frac{1}{2L}\|\nabla f(x)\|^2 \le f(x) - f^* \le \frac{L}{2}\|x-x^*\|^2 \]

  • gradient norm does not bound the distance to the solution

Lemma 2 (Strongly convex) If \(f\) is \(\mu\)-strongly convex, then for all \(x\) and all minimizers \(x^*\) with \(f^*=f(x^*)\), \[ \frac{\mu}{2}\|x-x^*\|^2 \le f(x) - f^* \le \frac{1}{2\mu}\|\nabla f(x)\|^2 \]

Smoothness and strong convexity

  • \(L\) smoothness imples

\[ f(y) \le f(x) + \nabla f(x)^T(y-x) + \frac{L}{2}\|y-x\|^2 \]

  • \(\mu\) strong convexity implies

\[ f(y) \ge f(x) + \nabla f(x)^T(y-x) + \frac{\mu}{2}\|y-x\|^2 \]

  • together, for all \(x,y\)

\[ \frac{\mu}{2}\|y-x\|^2 \le f(y) - f(x) - \nabla f(x)^T(y-x) \le \frac{L}{2}\|y-x\|^2 \]

  • implies Hessian eigenvalues bounded above and below:

\[ \mu I \preceq \nabla^2 f(x) \preceq L I \quad \forall x \]

Linear convergence

Linear convergence with strong convexity

  • Assume \(L\)-smoothness and \(\mu\)-strong convexity. Earlier we deduced

\[ f_{k+1} \le f_k - \frac{1}{2L} \|\nabla f_k\|^2 \]

  • under strong convexity, \(\|\nabla f_k\|^2 \ge 2\mu(f_k - f^*)\), hence

\[ f_{k+1} \le f_k - \frac{\mu}{L}(f_k - f^*) \quad\iff\quad f_{k+1} - f^* \le (1-\frac{\mu}{L})(f_k - f^*) \]

  • recursing down from \(k=T, T-1, \ldots, 0\) gives

\[ f_T - f^* \le (1-\frac{\mu}{L})^T(f_0 - f^*) \le \exp\left(-\frac{\mu}{L}T\right)(f_0 - f^*) \]

  • if we require \(\epsilon \le f_T - f^*\), then require at least \(T\) iterations such that

\[ T \ge \frac{L}{\mu}\log\left(\frac{f_0 - f^*}{\epsilon}\right) \quad \text{where $\frac{L}{\mu}$ is the condition number} \]

Smoothness and strong convexity

  • \(L\) smoothness imples

\[ f(y) \le f(x) + \nabla f(x)^T(y-x) + \frac{L}{2}\|y-x\|^2 \]

  • \(\mu\) strong convexity implies

\[ f(y) \ge f(x) + \nabla f(x)^T(y-x) + \frac{\mu}{2}\|y-x\|^2 \]

  • together, for all \(x,y\)

\[ \frac{\mu}{2}\|y-x\|^2 \le f(y) - f(x) - \nabla f(x)^T(y-x) \le \frac{L}{2}\|y-x\|^2 \]

  • implies Hessian eigenvalues bounded above and below:

\[ \mu I \preceq \nabla^2 f(x) \preceq L I \quad \forall x \]

Linear convergence with strong convexity

  • Assume \(L\)-smoothness and \(\mu\)-strong convexity. Earlier we deduced

\[ f_{k+1} \le f_k - \frac{1}{2L} \|\nabla f_k\|^2 \]

  • under strong convexity, \(\|\nabla f_k\|^2 \ge 2\mu(f_k - f^*)\), hence

\[ f_{k+1} \le f_k - \frac{\mu}{L}(f_k - f^*) \quad\iff\quad f_{k+1} - f^* \le (1-\frac{\mu}{L})(f_k - f^*) \]

  • recursing down from \(k=T, T-1, \ldots, 0\) gives

\[ f_T - f^* \le (1-\frac{\mu}{L})^T(f_0 - f^*) \le \exp\left(-\frac{\mu}{L}T\right)(f_0 - f^*) \]

  • if we require \(\epsilon \le f_T - f^*\), then require at least \(T\) iterations such that

\[ T \ge \frac{L}{\mu}\log\left(\frac{f_0 - f^*}{\epsilon}\right) \quad \text{where $\frac{L}{\mu}$ is the condition number} \]