UBC CPSC 406 - The Multiplicative Weights Update Method

Why Multiplicative Weights Update?

A introduction to Multiplicative Weights Update¹ method from an optimization perspective

We feel that this meta-algorithm and its analysis are simple and useful enough that they should be viewed as a basic tool taught to all algorithms students together with divide-and-conquer, dynamic programming, random sampling, and the like. — Arora, Hazan, Kale, 2012

(…) so hard to believe that it has been discovered five times and forgotten. — Papadimitriou

Although taught usually from a purely algorithmic perspective, I (Victor) think the optimization perspective is insightful.

Prediction with Expert Advice

A player and an adversary play a game through $T$ days/rounds. On each round $t$:

The player picks a convex combination of $n$ stocks (experts) to invest in
Then, the adversary chooses how bad each stock is
At the end of the day, the player earns/loses money according to their investment choices

Q: Can the player do well even against an evil adversary?

Formalizing the Problem

On round $t$

Player picks weights $p_t$ in the simplex \[ p_t \in \Delta_n = \Big\{ p \in [0,1]^n \colon \sum_{i = 1}^n p_i = 1\Big\} \]
Adversary picks losses $\ell_t \in [-1,1]^n$ for the experts

By the end of the round: player loses $p_t(i) \cdot \ell_t(i)$ for stock/expert $i$. \[ \text{Loss on round}~t = \sum_{i = 1}^n \ell_t(i)\cdot p_t(i) = \ell_t \T p_t. \]

How to Evaluate the Player?

Attempt 1 - Total Loss

Player tries to minimize $\displaystyle \sum_{t = 1}^T \ell_t \T p_t$

BAD: adversary can make $\ell_t = \begin{pmatrix} 1 &1 &\dotsm& 1 \end{pmatrix} \T$ always

Attempt 2 - Compare with the best of each round

Player tries to minimize $\displaystyle \sum_{t = 1}^T \ell_t \T p_t - \sum_{t = 1}^T \min_{i \in [n]} \ell_t(i)$

BAD: adversary can make the loss $\ell_t$ to be -1 everywhere except at

Evaluating the player through Regret

Compare with the best expert of the game

Player tries to minimize the regret \[ \Regret(T) = \sum_{t = 1}^T \ell_t \T p_t - \min_{i \in [n]}\sum_{t = 1}^T \ell_t(i). \] Intuition: Player’s regret of not picking the best expert in hindsight every round.

Regret may always grow, but we want \[ \frac{\Regret(T)}{T} \to 0~\text{as}~T \to \infty \] In words, average regret goes to 0.

Example

\[ \ell_1 = \bmat{0 \\ 1 \\ -1}, \ell_2 = \bmat{0 \\ -0.5 \\ 1}, \ell_3 = \bmat{-1 \\ 1 \\ 1}, \]

Follow The Leader

The Follow the Leader algorithm picks¹ \[ p_{t+1} \in \argmin_{p \in \Delta_n} \Big\{\sum_{s = 1}^t \ell_s \T p \Big\}. \]

This is a very intuitive algorithm, but can fail terribly

Key problem: player changes their decision too abruptly

(Online) Gradient Descent

Pick $p_1 = \frac{1}{n}e$, a step-size $\alpha > 0$, and \[ p_{t+1} \in \argmin_{p \in \Delta_n} \Big\{\ell_t \T p + \frac{1}{2\alpha}\norm{p - p_{t}}_2^2 \Big\}. \]

Via optimality conditions, we have

\[ -\ell_t - \frac{1}{\alpha} (p_{t+1} + p_t) \in \cN_{\Delta_n}(p_{t+1}) \iff p_t -\alpha \ell_t - p_{t+1} \in \cN_{\Delta_n}(p_{t+1}) \]

Reminder: $z = \proj_C(x) \iff x - z \in \cN_{C}(z)$.

\[ \implies p_{t+1} = \proj_{\Delta_n}(p_t - \alpha \ell_t) \] (Gradient descent step)

Theorem If $\alpha = 1/\sqrt{nT}$, then $\Regret(T) \leq \sqrt{nT}$.

Multiplicative Weights Update

Pick $p_1 = \frac{1}{n}e$, step size $\alpha > 0$, and update weights as \[ p_{t+1} \in \argmin_{p \in \Delta_n} \Big\{\sum_{s = 1}^t \ell_s \T p + \frac{1}{\alpha}\sum_{i = 1}^n p_i \ln p_i \Big\}. \]

Alternative view: Start with $w_1 = e$. Then

Pick $p_t = w_t/\norm{w_t}_1$
Multiplicative update $w_{t+1}(i) = w_t\cdot\exp(-\alpha \ell_t(i))$

Entropy vs $\ell_2$-norm

Theorem If $\alpha =\sqrt{ 2 \ln(n)/T}$,$~$ then¹ $\Regret(T) \leq \sqrt{2T\ln n}$.

Application to Zero-Sum Games

The game is define by a payoff matrix $A \in [0,1]^{m \times n}$

We have two players: the column player and the row player. In each round

Row player selects a row $i$
Column player selects a column $j$
Row player gets $A($$i$,$j$$)$ points, column player gets $-A($$i$,$j$$)$ points

Zero-Sum Games

The game is define by a payoff matrix $A \in \R^{m \times n}$

We have two players: the row player and the column player. In each round

Row player selects a distribution $q \in \Delta_m$ over rows
Column player selects a distribution $p \in \Delta_n$ over columns
The row player (expected) payoff is

\[ \sum_{i = 1}^m \sum_{j = 1}^n A(i,j) q_i p_j = q \T A p. \]

Question: Does it matter who plays first?

Von Neumann’s Minmax Theorem \[ \max_{p\in \Delta_n} \min_{q \in \Delta_m} q\T A p = \min_{q \in \Delta_m} \max_{p\in \Delta_n} q\T A p = \OPT \]

Finding an Approximate Equilibrium with MWU

Let’s play many rounds with column player going first.

column player plays its current strategy $p_t$ (start with $p_1 = \frac{1}{n} \mathbb{1}$)
row player picks $q_j = e_j$ where row $j$ maximizes $e_j \T A p_t$
column player updates its strategy with MWU: \[ p_{t+1}(i) \propto p_t(i) \exp(-\alpha \ell_t(i)) \quad \text{with} \quad \ell_t = A \T e_j \quad\text{($j$th row of $A$}) \]

Define $\displaystyle \bar{p} = \frac{1}{T} \sum_{t = 1}^T p_t$ and $\displaystyle \bar{q} = \frac{1}{T} \sum_{t = 1}^T q_t$. Then:

\[\displaystyle \frac{\Regret(T)}{T} - \eps \leq \bar{q} \T A \bar{p} \leq \OPT + \frac{\Regret(T)}{T}\]

If $T >2 \ln(n)/\eps^2$, then $\displaystyle \OPT - \eps \leq \bar{q} \T A \bar{p} \leq \OPT + \eps$