Homework 8
Linear programming and multiplicative updates
1 Mulitplicative weights update
In class we discussed one formulation of the Multiplicative Weights Update method. Let us now see how to formulate MWU in an equivalent way that is closely related to online gradient descent formulation we saw. Let \(\ell_1, \dotsc, \ell_T \in \R^n\) be arbitrary vectors. Define \(p_1 = \frac{1}{n} \mathbb{1}\), let \(\alpha > 0\), and define \[ p_{t+1} \in \argmin_{p \in \Delta_n} \Big\{ \ell_t\T p + \frac{1}{\alpha}\mathrm{KL}(p, p_t)\Big\}, \qquad \forall t \in \{1, \dotsc, T-1\}. \] where \(\mathrm{KL}(p,q)\) is the KL-Divergence between \(p \in \Delta_n\) and \(q \in \Delta_n\) (with \(q > 0\)) given by \[ \mathrm{KL}(p,q) = \sum_{i = 1}^n p_i \ln\Big(\frac{p_i}{q_i}\Big). \] Show that \[ p_{t+1} \in \argmin_{p \in \Delta_n}\Big\{ \sum_{s = 1}^t \ell_s \T p + \frac{1}{\alpha} \sum_{i = 1}^n p_i \ln p_i \Big\} \qquad \forall t \in \{0, \dotsc, T-1\}. \]
Hint: If \(h(p) = \sum_i p_i \ln p_i\) and \(f(p) = \mathrm{KL}(p,q)\), show that \(\nabla f(p) = \nabla h(p) - \nabla h(q)\). Moreover, feel free to use the following claim without proving it:
Claim. Let \(p \in \Delta_n\) with \(p > 0\). If \(v \in \mathcal{N}_{\Delta_{n}}(p)\), then \(-v \in \mathcal{N}_{\Delta_n}(p)\).
2 Simplex
- Convert this problem into standard form (equality constraints with nonnegative variables) and construct a basis that corresponds to the point \(x=(0,0)\):
\[ \begin{align} \min_{x_1,x_2} &\quad {-}2x_1-x_2\\ \text{subj to} &\quad x_1+x_2 \le 6,\ x\ge0. \end{align} \]
Carry out the steps of the simplex method, starting with the basic feasible solution above. At each iteration, show the basic index set, the reduced costs, and which variables enter and leave the basis.
Draw a graphical representation of the problem in terms of the original variables \(x_1,x_2\), and indicate the path taken by the simplex algorithm.
3 Robust optimization
You are the owner of a factory that makes widgets and gadgets. For each widget and gadget, there is a set of feature measurements, e.g., the weight, dimensions, material composition, etc., represented by a vector \(x\). For maximum profit, you want the combinations of these features to exactly match a set of standard specifications, e.g., \(Ax = b\), where \(A\) dictates how the features are combined, and \(b\) is the standard. However, realistically, it is too expensive to exactly satisfy \(Ax=b\), so you have to make some concessions.
Instead, you can solve one of three optimization problems:
\[\begin{equation} \min_{x}\;\|Ax-b\|_1 = \sum_{i=1,\ldots,m}|a_i^T x-b_i| \end{equation}\]
\[\begin{equation} \min_{x}\;\|Ax-b\|_\infty = \max_{i=1,\ldots,n}|a_i^T x-b_i| \end{equation}\]
\[\begin{equation} \min_{x}\;\|Ax-b\|_2 =\sqrt{ \sum_{i=1,\ldots,n}|a_i^T x-b_i|^2} \end{equation}\]
3.1 One-norm reformulation
Write the one-norm problem as a linear program of the form
\[\begin{equation}\label{eq:linprog-form} \min_{z}\ c^T z \text{ subject to } \hat{A} z \leq \hat{b}, \end{equation}\]
for some \(c\), \(\hat{A}\), and \(\hat{b}\).
3.2 Inf-norm reformulation
Similarly, write the infinity-norm problem as a linear program.
3.3 Histograms
For some randomly generated data (below), plot histograms of the residuals obtained by each of the three formulations.
Here’s the histogram for the residuals of the two-norm formulation:
Produce the historgrams for the one- and inf-norm formulations, using the same data above.