The Multiplicative Weights Update Method

CPSC 406 – Computational Optimization

Why Multiplicative Weights Update?


A introduction to Multiplicative Weights Update1 method from an optimization perspective


We feel that this meta-algorithm and its analysis are simple and useful enough that they should be viewed as a basic tool taught to all algorithms students together with divide-and-conquer, dynamic programming, random sampling, and the like. — Arora, Hazan, Kale, 2012


(…) so hard to believe that it has been discovered five times and forgotten. — Papadimitriou


Although taught usually from a purely algorithmic perspective, I (Victor) think the optimization perspective is insightful.

Prediction with Expert Advice

A player and an adversary play a game through \(T\) days/rounds. On each round \(t\):

  • The player picks a convex combination of \(n\) stocks (experts) to invest in
  • Then, the adversary chooses how bad each stock is
  • At the end of the day, the player earns/loses money according to their investment choices

Q: Can the player do well even against an evil adversary?

Formalizing the Problem

On round \(t\)

  • Player picks weights \(p_t\) in the simplex \[ p_t \in \Delta_n = \Big\{ p \in [0,1]^n \colon \sum_{i = 1}^n p_i = 1\Big\} \]
  • Adversary picks losses \(\ell_t \in [-1,1]^n\) for the experts


  • By the end of the round: player loses \(p_t(i) \cdot \ell_t(i)\) for stock/expert \(i\). \[ \text{Loss on round}~t = \sum_{i = 1}^n \ell_t(i)\cdot p_t(i) = \ell_t \T p_t. \]

How to Evaluate the Player?


Attempt 1 - Total Loss

Player tries to minimize \(\displaystyle \sum_{t = 1}^T \ell_t \T p_t\)

BAD: adversary can make \(\ell_t = \begin{pmatrix} 1 &1 &\dotsm& 1 \end{pmatrix} \T\) always




Attempt 2 - Compare with the best of each round

Player tries to minimize \(\displaystyle \sum_{t = 1}^T \ell_t \T p_t - \sum_{t = 1}^T \min_{i \in [n]} \ell_t(i)\)

BAD: adversary can make the loss \(\ell_t\) to be -1 everywhere except at

Evaluating the player through Regret

Compare with the best expert of the game

Player tries to minimize the regret \[ \Regret(T) = \sum_{t = 1}^T \ell_t \T p_t - \min_{i \in [n]}\sum_{t = 1}^T \ell_t(i). \] Intuition: Player’s regret of not picking the best expert in hindsight every round.

Regret may always grow, but we want \[ \frac{\Regret(T)}{T} \to 0~\text{as}~T \to \infty \] In words, average regret goes to 0.

Example

\[ \ell_1 = \bmat{0 \\ 1 \\ -1}, \ell_2 = \bmat{0 \\ -0.5 \\ 1}, \ell_3 = \bmat{-1 \\ 1 \\ 1}, \]

Follow The Leader

The Follow the Leader algorithm picks1 \[ p_{t+1} \in \argmin_{p \in \Delta_n} \Big\{\sum_{s = 1}^t \ell_s \T p \Big\}. \]

This is a very intuitive algorithm, but can fail terribly

Key problem: player changes their decision too abruptly

(Online) Gradient Descent

Pick \(p_1 = \frac{1}{n}e\), a step-size \(\alpha > 0\), and \[ p_{t+1} \in \argmin_{p \in \Delta_n} \Big\{\ell_t \T p + \frac{1}{2\alpha}\norm{p - p_{t}}_2^2 \Big\}. \]

Via optimality conditions, we have

\[ -\ell_t - \frac{1}{\alpha} (p_{t+1} + p_t) \in \cN_{\Delta_n}(p_{t+1}) \iff p_t -\alpha \ell_t - p_{t+1} \in \cN_{\Delta_n}(p_{t+1}) \]

Reminder: \(z = \proj_C(x) \iff x - z \in \cN_{C}(z)\).

\[ \implies p_{t+1} = \proj_{\Delta_n}(p_t - \alpha \ell_t) \] (Gradient descent step)

Theorem If \(\alpha = 1/\sqrt{nT}\), then \(\Regret(T) \leq \sqrt{nT}\).

Multiplicative Weights Update

Pick \(p_1 = \frac{1}{n}e\), step size \(\alpha > 0\), and update weights as \[ p_{t+1} \in \argmin_{p \in \Delta_n} \Big\{\sum_{s = 1}^t \ell_s \T p + \frac{1}{\alpha}\sum_{i = 1}^n p_i \ln p_i \Big\}. \]


Alternative view: Start with \(w_1 = e\). Then

  • Pick \(p_t = w_t/\norm{w_t}_1\)

  • Multiplicative update \(w_{t+1}(i) = w_t\cdot\exp(-\alpha \ell_t(i))\)

Entropy vs \(\ell_2\)-norm


Theorem If \(\alpha =\sqrt{ 2 \ln(n)/T}\),\(~\) then1 \(\Regret(T) \leq \sqrt{2T\ln n}\).

Application to Zero-Sum Games

The game is define by a payoff matrix \(A \in [0,1]^{m \times n}\)

We have two players: the column player and the row player. In each round

  • Row player selects a row \(i\)
  • Column player selects a column \(j\)
  • Row player gets \(A(\)\(i\),\(j\)\()\) points, column player gets \(-A(\)\(i\),\(j\)\()\) points

Zero-Sum Games

The game is define by a payoff matrix \(A \in \R^{m \times n}\)

We have two players: the row player and the column player. In each round

  • Row player selects a distribution \(q \in \Delta_m\) over rows
  • Column player selects a distribution \(p \in \Delta_n\) over columns
  • The row player (expected) payoff is

\[ \sum_{i = 1}^m \sum_{j = 1}^n A(i,j) q_i p_j = q \T A p. \]

Question: Does it matter who plays first?

Von Neumann’s Minmax Theorem \[ \max_{p\in \Delta_n} \min_{q \in \Delta_m} q\T A p = \min_{q \in \Delta_m} \max_{p\in \Delta_n} q\T A p = \OPT \]

Finding an Approximate Equilibrium with MWU

Let’s play many rounds with column player going first.

  • column player plays its current strategy \(p_t\) (start with \(p_1 = \frac{1}{n} \mathbb{1}\))
  • row player picks \(q_j = e_j\) where row \(j\) maximizes \(e_j \T A p_t\)
  • column player updates its strategy with MWU: \[ p_{t+1}(i) \propto p_t(i) \exp(-\alpha \ell_t(i)) \quad \text{with} \quad \ell_t = A \T e_j \quad\text{($j$th row of $A$}) \]

Define \(\displaystyle \bar{p} = \frac{1}{T} \sum_{t = 1}^T p_t\) and \(\displaystyle \bar{q} = \frac{1}{T} \sum_{t = 1}^T q_t\). Then:

\[\displaystyle \frac{\Regret(T)}{T} - \eps \leq \bar{q} \T A \bar{p} \leq \OPT + \frac{\Regret(T)}{T}\]

If \(T >2 \ln(n)/\eps^2\), then \(\displaystyle \OPT - \eps \leq \bar{q} \T A \bar{p} \leq \OPT + \eps\)

Proof (in class if we have time)