Publications
Convex and Variational Analysis
- [abs]
[bib]
Average-case thresholds for exact regularization of linear programs
M. P. Friedlander, S. Kubal, Y. Plan, M. S. Scott. arXiv:2510.13083,
2025.Small regularizers can preserve linear programming solutions exactly. This paper provides the first average-case analysis of exact regularization: with a standard Gaussian cost vector and fixed constraint set, bounds are established for the probability that exact regularization succeeds as a function of regularization strength. Failure is characterized via the Gaussian measure of inner cones, controlled by novel two-sided bounds on the measure of shifted cones. Results reveal dimension-dependent scaling laws and connect exact regularization of linear programs to their polyhedral geometry via the normal fan and the Gaussian (solid-angle) measure of its cones. Computable bounds are obtained in several canonical settings, including regularized optimal transport. Numerical experiments corroborate the predicted scalings and thresholds.
@article{friedlander2025average, title={Average-case thresholds for exact regularization of linear programs}, author={Friedlander, Michael P and Kubal, Sharvaj and Plan, Yaniv and Scott, Matthew S}, journal={arXiv preprint arXiv:2510.13083}, year={2025} } - [abs]
[bib]
From perspective maps to epigraphical projection
M. P. Friedlander, A. Goodwin, T. Hoheisel. Math. Oper. Res. 48(3): 1711-1740, 2023,
2023.
[Journal]The projection onto the epigraph or a level set of a closed proper convex function can be achieved by finding a root of a scalar equation that involves the proximal operator as a function of the proximal parameter. This paper develops the variational analysis of this scalar equation. The approach is based on a study of the variational-analytic properties of general convex optimization problems that are (partial) infimal projections of the the sum of the function in question and the perspective map of a convex kernel. When the kernel is the Euclidean norm squared, the solution map corresponds to the proximal map, and thus the variational properties derived for the general case apply to the proximal case. Properties of the value function and the corresponding solution map—including local Lipschitz continuity, directional differentiability, and semismoothness—are derived. An SC 1 optimization framework for computing epigraphical and level-set projections is thus established. Numerical experiments on 1-norm projection illustrate the effectiveness of the approach as compared with specialized algorithms.
@article{FriedlanderGoodwinHoheisel2023, author = {Friedlander, Michael P. and Goodwin, Ariel and Hoheisel, Tim}, title = {From Perspective Maps to Epigraphical Projections}, journal = {Mathematics of Operations Research}, volume = {48}, number = {3}, pages = {1711--1740}, year = {2023}, doi = {10.1287/moor.2022.1317} } - [abs]
[bib]
A perturbation view of level-set methods for convex optimization
R. Estrin, M. P. Friedlander. Optimization Letters,
2020.
[Journal]
[DOI]Level-set methods for convex optimization are predicated on the idea that certain problems can be parameterized so that their solutions can be recovered as the limiting process of a root-finding procedure. This idea emerges time and again across a range of algorithms for convex problems. Here we demonstrate that strong duality is a necessary condition for the level-set approach to succeed. In the absence of strong duality, the level-set method identifies ε-infeasible points that do not converge to a feasible point as ε tends to zero. The level-set approach is also used as a proof technique for establishing sufficient conditions for strong duality that are different from Slater’s constraint qualification.
@misc{Estrin2020Perturbation, Author = {R. Estrin and M. P. Friedlander}, Year = {2020}, Month = {June}, Number = {8}, Volume = {14}, Pages = {1989-2006}, Doi = {10.1007/s11590-020-01609-9}, Title = {A perturbation view of level-set methods for convex optimization} } - [abs]
[bib]
Polar convolution
M. P. Friedlander, I. Macêdo, T.K. Pong. SIAM J. Optimization, 29(2):1366–1391,
2019.
[DOI]The Moreau envelope is one of the key convexity-preserving functional operations in convex analysis, and it is central to the development and analysis of many approaches for solving convex optimization problems. This paper develops the theory for a parallel convolution operation, called the polar envelope, specialized to gauge functions. We show that many important properties of the Moreau envelope and the proximal map are mirrored by the polar envelope and its corresponding proximal map. These properties include smoothness of the envelope function, uniqueness and continuity of the proximal map, a role in duality and in the construction of algorithms for gauge optimization. We thus establish a suite of tools with which to build algorithms for this family of optimization problems.
@article{Friedlander2019Polar, Author = {M. P. Friedlander and I. Macêdo and T.K. Pong}, Year = {2019}, Month = {February}, Journal = {SIAM Journal on Optimization}, Number = {2}, Volume = {29}, Pages = {1366-1391}, Doi = {10.1137/18m1209088}, Title = {Polar convolution} } - [abs]
[bib]
Foundations of gauge and perspective duality
A. Y. Aravkin, J. V. Burke, D. Drusvyatskiy, M. P. Friedlander, K. MacPhee. SIAM J. Optimization, 28(3):2406–2434,
2018.
[DOI]We revisit the foundations of gauge duality and demonstrate that it can be explained using a modern approach to duality based on a perturbation framework. We therefore put gauge duality and Fenchel-Rockafellar duality on equal footing, including explaining gauge dual variables as sensitivity measures, and showing how to recover primal solutions from those of the gauge dual. This vantage point allows a direct proof that optimal solutions of the Fenchel-Rockafellar dual of the gauge dual are precisely the primal solutions rescaled by the optimal value. We extend the gauge duality framework to the setting in which the functional components are general nonnegative convex functions, including problems with piecewise linear quadratic functions and constraints that arise from generalized linear models used in regression.
@misc{Aravkin2018Foundations, Author = {A. Y. Aravkin and J. V. Burke and D. Drusvyatskiy and M. P. Friedlander and K. MacPhee}, Year = {2018}, Month = {August}, Number = {3}, Volume = {28}, Pages = {2406-2434}, Doi = {10.1137/17m1119020}, Title = {Foundations of gauge and perspective duality} } - [abs]
[bib]
Level-set methods for convex optimization
A. Y. Aravkin, J. V. Burke, D. Drusvyatskiy, M. P. Friedlander, S. Roy. Mathematical Programming, 174(1-2):359–390,
2018.
[Journal]
[DOI]Convex optimization problems arising in applications often have favorable objective functions and complicated constraints, thereby precluding first-order methods from being immediately applicable. We describe an approach that exchanges the roles of the objective and constraint functions, and instead approximately solves a sequence of parametric level-set problems. A zero-finding procedure, based on inexact function evaluations and possibly inexact derivative information, leads to an efficient solution scheme for the original problem. We describe the theoretical and practical properties of this approach for a broad range of problems, including low-rank semidefinite optimization, sparse optimization, and generalized linear models for inference.
@misc{Aravkin2018Level, Author = {A. Y. Aravkin and J. V. Burke and D. Drusvyatskiy and M. P. Friedlander and S. Roy}, Year = {2018}, Month = {April}, Number = {1-2}, Volume = {174}, Pages = {359-390}, Doi = {10.1007/s10107-018-1351-8}, Title = {Level-set methods for convex optimization} } - [abs]
[bib]
Gauge optimization and duality
M. P. Friedlander, I. Macêdo, T. K. Pong. SIAM J. Optimization, 24(4):1999–2022,
2014.Gauge functions significantly generalize the notion of a norm, and gauge optimization, as defined by Freund (1987), seeks the element of a convex set that is minimal with respect to a gauge function. This conceptually simple problem can be used to model a remarkable array of useful problems, including a special case of conic optimization, and related problems that arise in machine learning and signal processing. The gauge structure of these problems allows for a special kind of duality framework. This paper explores the duality framework proposed by Freund, and proposes a particular form of the problem that exposes some useful properties of the gauge optimization framework (such as the variational properties of its value function), and yet maintains most of the generality of the abstract form of gauge optimization.
@Article{FriedlanderMacedoPong:2014, author = {M. P. Friedlander and I. Mac\\^edo and T. K. Pong}, title = {Gauge optimization and duality}, year = 2014, journal = {SIAM Journal on Optimization}, number = 4, volume = 24, pages = {1999-2022}, doi = {10.1137/130940785} url = {http://epubs.siam.org/doi/abs/10.1137/130940785} } - [abs]
[bib]
Variational properties of value functions
A. Aravkin, J. V. Burke, M. P. Friedlander. SIAM J. Optimization, 23(3):1689–1717,
2013.
[DOI]Regularization plays a key role in a variety of optimization formulations of inverse problems. A recurring theme in regularization approaches is the selection of regularization parameters, and their effect on the solution and on the optimal value of the optimization problem. The sensitivity of the value function to the regularization parameter can be linked directly to the Lagrange multipliers. This paper characterizes the variational properties of the value functions for a broad class of convex formulations, which are not all covered by standard Lagrange multiplier theory. An inverse function theorem is given that links the value functions of different regularization formulations (not necessarily convex). These results have implications for the selection of regularization parameters, and the development of specialized algorithms. Numerical examples illustrate the theoretical results.
@article{Aravkin2013Variational, Author = {A. Aravkin and J. V. Burke and M. P. Friedlander}, Year = {2013}, Month = {August}, Journal = {SIAM Journal on Optimization}, Number = {3}, Volume = {23}, Pages = {1689-1717}, Doi = {10.1137/120899157}, Title = {Variational properties of value functions} } - [abs]
[bib]
Exact regularization of convex programs
M. P. Friedlander, P. Tseng. SIAM J. Optimization, 18(4):1326–1350,
2007.
[Data & Code]
[DOI]
[more]The regularization of a convex program is exact if all solutions of the regularized problem are also solutions of the original problem for all values of the regularization parameter below some positive threshold. For a general convex program, we show that the regularization is exact if and only if a certain selection problem has a Lagrange multiplier. Moreover, the regularization parameter threshold is inversely related to the Lagrange multiplier. We use this result to generalize an exact regularization result of Ferris and Mangasarian [Appl. Math. Optim., 23 (1991), pp. 266-273] involving a linearized selection problem. We also use it to derive necessary and sufficient conditions for exact penalization, similar to those obtained by Bertsekas [Math. Programming, 9 (1975), pp. 87-99] and by Bertsekas, Nedić, and Ozdaglar [Convex Analysis and Optimization, Athena Scientific, Belmont, MA, 2003]. When the regularization is not exact, we derive error bounds on the distance from the regularized solution to the original solution set. We also show that existence of a "weak sharp minimum" is in some sense close to being necessary for exact regularization. We illustrate the main result with numerical experiments on the \(\ell_1\) regularization of benchmark (degenerate) linear programs and semidefinite/second-order cone programs. The experiments demonstrate the usefulness of \(\ell_1\) regularization in finding sparse solutions.
@article{Friedlander2007Exact, Author = {M. P. Friedlander and P. Tseng}, Year = {2007}, Month = {November}, Journal = {SIAM Journal on Optimization}, Number = {4}, Volume = {18}, Pages = {1326-1350}, Doi = {10.1137/060675320}, Title = {Exact regularization of convex programs} } - [abs]
[bib]
Exact regularization of linear programs
M. P. Friedlander. Tech. Rep. TR-2005-31, Dept. of Computer Science, UBC,
2005.
[Data]
[more]We show that linear programs (LPs) admit regularizations that either contract the original (primal) solution set or leave it unchanged. Any regularization function that is convex and has compact level sets is allowed; differentiability is not required. This is an extension of the result first described by Mangasarian and Meyer (SIAM J. Control Optim., 17(6), pp. 745-752, 1979). We show that there always exist positive values of the regularization parameter such that a solution of the regularized problem simultaneously minimizes the original LP and minimizes the regularization function over the original solution set. We illustrate the main result using the nondifferentiable one-norm regularization function on a set of degenerate LPs. Numerical results demonstrate how such an approach yields sparse solutions from the application of an interior-point method.
@techreport{Frie:2005, Address = {University of British Columbia, Vancouver}, Author = {M. P. Friedlander}, Institution = {Dept. Computer Science}, Month = {December}, Number = {TR-2005-31}, Title = {Exact regularization of linear programs}, Type = {Tech. Rep.}, Year = 2005 }
Scientific and Engineering Applications
- [abs]
[bib]
Tracking Cu-fertile sediment sources via multivariate petrochronological mixture modeling of detrital zircons
J. E. Saylor, N. Richardson, N. Graham, R. G. Lee, M. P. Friedlander. J. Geophys. Res.: Earth Surface, 130(10):e2025JF008406,
2025.Whereas the ability to acquire petrochronological data from detrital minerals has exploded, development of tools to analyze and interpret the multivariate data sets has not kept pace. Herein, we present a case study, which applies the recently developed non-negative Tucker-1 decomposition (NNT1) method to a multivariate detrital zircon data set from till samples collected above the Cu-bearing Guichon Creek Batholith in southern British Columbia, Canada. Zircon composition variables that we consider include age, Ce anomaly, CeN/NdN, DyN/YbN, ΔFMQ, Eu anomaly, ΣHREE/ΣMREE, Hf, Th/U, Ti temperature, and YbN/GdN. The NNT1 approach successfully deconvolves the multivariate data set into two endmembers, which are consistent with derivation either from non-oxidized and relatively anhydrous (i.e., low Cu-ore potential, Source 1) or oxidized and hydrous (i.e., potential Cu-ore bodies, Source 2) igneous rocks. Furthermore, we attribute each of the zircon grains to either the Source 1 or 2 endmember based on maximization of the likelihood that their measured multivariate geochemistry was drawn from one or the other of the learned multivariate endmembers. Finally, we demonstrate that the proportions of the Source 2 endmember decrease with increasing distance from the ore bodies, as expected due to down-ice or off-axis zircon mixing and dilution. We conclude that the NNT1 approach provides insight into geologically meaningful sediment transport processes and multivariate sediment sources even when those sources are unknown. It thus provides a basis for future petrochronological interpretations with applied and pure geoscience applications.
@article{saylor2025tracking, title={Tracking Cu-fertile sediment sources via multivariate petrochronological mixture modeling of detrital zircons}, author={Saylor, Joel E and Richardson, Nicholas and Graham, Naomi and Lee, Robert G and Friedlander, Michael P}, journal={Journal of Geophysical Research: Earth Surface}, volume={130}, number={10}, pages={e2025JF008406}, year={2025}, publisher={Wiley Online Library} } - [abs]
[bib]
Estimates of the dynamic structure factor for the finite temperature electron liquid via analytic continuation of path integral Monte Carlo data
T. Chuna, N. Barnfield, J. Vorberger, M. P. Friedlander, T. Hoheisel, T. Dornheim. Phys. Rev. B, 112(12):125112,
2025.Understanding the dynamic properties of uniform electron gas (UEG) is important for numerous applications ranging from semiconductor physics to exotic warm dense matter. In this work, we apply the maximum entropy method (MEM), as implemented by Thomas Chuna et al., J. Phys. A Math. Theor. 58, 335203 (2025), to ab initio path integral Monte Carlo (PIMC) results for the imaginary-time correlation function 𝐹(𝑞,𝜏) to estimate the dynamic structure factor 𝑆(𝑞,𝜔) over an unprecedented range of densities at the electronic Fermi temperature. To conduct the MEM, we propose to construct the Bayesian prior 𝜇 from the PIMC data. Constructing the static approximation leads to a drastic improvement in 𝑆(𝑞,𝜔) estimate over using the simpler random phase approximation (RPA) as the Bayesian prior. We present results for the strongly coupled electron liquid regime with 𝑟𝑠=50,⋯,200, which reveal a pronounced roton-type feature and an incipient double peak structure in 𝑆(𝑞,𝜔) for intermediate wave numbers at 𝑟𝑠=200. We also find that our dynamic structure factors satisfy known sum rules, even though these sum rules are not enforced explicitly. To verify our results, we show that our MEM estimates converge to the RPA limit at higher densities and weaker coupling 𝑟𝑠=2 and 5. Further, we compare with two different existing results at intermediate density and coupling strength 𝑟𝑠=10 and 20, and we find good agreement with more conservative estimates. Combining all of our results for 𝑟𝑠=2,5,10,20,50,100,200, we present estimates of a dispersion relation that show a continuous deepening of its minimum value at higher coupling. An advantage of our setup is that it is not specific to the UEG, thereby opening up new avenues to study the dynamics of real warm dense matter systems based on cutting-edge PIMC simulations in future works.
@article{Chuna2025Estimates, Author = {T. Chuna and N. Barnfield and J. Vorberger and M. P. Friedlander and T. Hoheisel and T. Dornheim}, Year = {2025}, Month = {September}, Journal = {Physical review. B./Physical review. B}, Number = {12}, Volume = {112}, Doi = {10.1103/4d4b-kgtk}, Title = {Estimates of the dynamic structure factor for the finite temperature electron liquid via analytic continuation of path integral Monte Carlo data} } - [abs]
[bib]
Dual formulation of the maximum entropy method applied to analytic continuation of quantum Monte Carlo data
T. Chuna, N. Barnfield, T. Dornheim, M. P. Friedlander, T. Hoheisel. J. Phys. A: Math. Theor., 58(33):335203,
2025.
[GitHub]
[DOI]Many fields of physics use quantum Monte Carlo techniques, but struggle to estimate dynamic spectra via the analytic continuation of imaginary-time quantum Monte Carlo data. One of the most ubiquitous approaches to analytic continuation is the maximum entropy method (MEM). We supply a dual Newton optimization algorithm to be used within the MEM and provide analytic bounds for the algorithm’s error. The optimization algorithm is freely available on github. The MEM is typically used with Bryan’s controversial algorithm (Rothkopf 2020 Data 5 55). We present new theoretical issues that are not yet in the literature. Our algorithm has all the theoretical benefits of Bryan’s algorithm without these theoretical issues the implementation of the dual Newton optimizer within the MEM is freely available on github. We compare the MEM with Bryan’s optimization to the MEM with our dual Newton optimization on test problems from lattice quantum chromodynamics and plasma physics. These comparisons show that in the presence of noise the dual Newton algorithm produces better estimates and error bars; this indicates the limits of Bryan’s algorithm’s applicability. We use the MEM to investigate authentic quantum Monte Carlo data for the uniform electron gas at warm dense matter conditions and further substantiate the roton-type feature in the dispersion relation.
@misc{Chuna2025Dual, Author = {T. Chuna and N. Barnfield and T. Dornheim and M. P. Friedlander and T. Hoheisel}, Year = {2025}, Month = {August}, Doi = {10.2139/ssrn.5098440}, Title = {Dual formulation of the maximum entropy method applied to analytic continuation of quantum Monte Carlo data} } - [abs]
[bib]
Tracing Sedimentary Origins in Multivariate Geochronology via Constrained Tensor Factorization
N. Graham, N. Richardson, M. P. Friedlander, J. Saylor. Mathematical Geosciences, 57(4):601-628,
2025.
[DOI]We devise a novel statistical method for deconvolving multivariate geochronology and geochemistry datasets to characterize sediment sources. The approach employs a third-order constrained Tucker-1 tensor decomposition to estimate the probability distributions of multiple features in sediment samples. By integrating kernel density estimation with matrix-tensor factorization, the model quantitatively determines the distributions and mixing proportions of sediment sources. The methodology introduces a numerical test for rank estimation to define the number of latent sources, and uses a maximum-likelihood approach to correlate individual detrital grains to these latent sources based on an arbitrary number of features. The method’s efficacy is validated through a numerical experiment with detrital zircon data that captures natural variability associated with temporal changes in crustal thickness in the Andes. The findings hold potential implications for resolving sediment sources, determining sediment mixing, enhancing the understanding of sediment transport processes, characterizing the lithology, tectonic motion, or metallogenic potential of sediment sources. This method is adaptable to other data de-mixing problems and is available in a publicly released software package.
@article{Graham2025Tracing, Author = {N. Graham and N. Richardson and M. P. Friedlander and J. Saylor}, Year = {2025}, Month = {May}, Journal = {Mathematical Geosciences}, Number = {4}, Volume = {57}, Pages = {601-628}, Doi = {10.1007/s11004-024-10175-0}, Title = {Tracing Sedimentary Origins in Multivariate Geochronology via Constrained Tensor Factorization} } - [abs]
[bib]
Social resistance
M. P. Friedlander, N. Krislock, T. K. Pong. IEEE Comput. Sci. Eng., 8(2):98-103; reprinted in Computing Edge,
2016.
[DOI]
[Computing Edge]This case study concerns an important question in social networks: How closely related are two people in a network? We measure the strength of the relationship between two people in terms of their resistance distance, a notion of distance that has its origin in the study of electrical networks. In that context, the resistance distance measures the electrical resistance between the two nodes in a network, assuming unit resistance on each edge. Its relationship with the information distance in social networks was recently identified. For large networks, this distance measure can be expensive to compute. As the network changes, it’s useful to have an inexpensive way to make updates. We’ll study how various techniques for updating matrix factorizations can help efficiently compute the resistance distances with small changes in the network. We focus on Cholesky rank-one updates, and also briefly discuss the Sherman-Morrison-Woodbury formula (as one of the last exercises). We expect that you’re familiar with the idea of the Cholesky factorization of positive definite matrices. We’ll guide you through a derivation of the formula for the Cholesky rank-one update, and introduce two aspects that maybe new for you. First, you’ll have to figure out the vector used for the rank-one update when one additional connection in the network is identified. Second, when matrices are large, you’ll observe that there can be a big difference in computational effort when the computation involves matrix-matrix multiplications rather than only matrix-vector multiplications.
@article{Friedlander2016Social, Author = {M. P. Friedlander and N. Krislock and T. K. Pong}, Year = {2016}, Month = {August}, Journal = {Computing in Science & Engineering}, Number = {2}, Volume = {18}, Pages = {98-103}, Doi = {10.1109/mcse.2016.30}, Title = {Social resistance} } - [abs]
[bib]
Fighting the curse of dimensionality: compressive sensing in exploration seismology
F. J. Herrmann, M. P. Friedlander, Ö. Yılmaz. IEEE Signal Processing Magazine, 29(3):88–100,
2012.
[DOI]Many seismic exploration techniques rely on the collection of massive data volumes that are mined for information during processing. This approach has been extremely successful, but current efforts toward higher resolution images in increasingly complicated regions of Earth continue to reveal fundamental shortcomings in our typical workflows. The “curse” of dimensionality is the main roadblock and is exemplified by Nyquist’s sampling criterion, which disproportionately strains current acquisition and processing systems as the size and desired resolution of our survey areas continues to increase.
@article{Herrmann2012Fighting, Author = {F. J. Herrmann and M. P. Friedlander and Ö. Yılmaz}, Year = {2012}, Month = {January}, Journal = {IEEE Signal Processing Magazine}, Number = {3}, Volume = {29}, Pages = {88-100}, Doi = {10.1109/msp.2012.2185859}, Title = {Fighting the curse of dimensionality: compressive sensing in exploration seismology} } - [abs]
[bib]
Robust inversion via semistochastic dimensionality reduction
A. Aravkin, M. P. Friedlander, T. van Leeuwen. Intern. Conf. Acoustics, Speech, and Signal Processing (ICASSP),
2012.
[DOI]We consider a class of inverse problems where it is possible to aggregate the results of multiple experiments. This class includes problems where the forward model is the solution operator to linear ODEs or PDEs. The tremendous size of such problems motivates dimensionality reduction techniques based on randomly mixing experiments. These techniques break down, however, when robust data-fitting formulations are used, which are essential in cases of missing data, unusually large errors, and systematic features in the data unexplained by the forward model. We survey robust methods within a statistical framework, and propose a semistochastic optimization approach that allows dimensionality reduction. The efficacy of the methods are demonstrated for a large-scale seismic inverse problem using the robust Student’s t-distribution, where a useful synthetic velocity model is recovered in the extreme scenario of 60% data missing at random. The semistochastic approach achieves this recovery using 20% of the effort required by a direct robust approach.
@misc{Aravkin2012Robust, Author = {A. Aravkin and M. P. Friedlander and T. van Leeuwen}, Year = {2012}, Month = {January}, Doi = {10.48550/arxiv.1110.0895}, Title = {Robust inversion via semistochastic dimensionality reduction} } - [abs]
[bib]
Robust inversion, dimensionality reduction, and randomized sampling
A. Aravkin, M. P. Friedlander, F. Herrmann, T. van Leeuwen. Mathematical Programming, 134(1):101–125, 2012,
2012.
[DOI]We consider a class of inverse problems in which the forward model is the solution operator to linear ODEs or PDEs. This class admits several dimensionality-reduction techniques based on data averaging or sampling, which are especially useful for large-scale problems. We survey these approaches and their connection to stochastic optimization. The data-averaging approach is only viable, however, for a least-squares misfit, which is sensitive to outliers in the data and artifacts unexplained by the forward model. This motivates us to propose a robust formulation based on the Student’s t-distribution of the error. We demonstrate how the corresponding penalty function, together with the sampling approach, can obtain good results for a large-scale seismic inverse problem with 50% corrupted data.
@article{Aravkin2012Robust, Author = {A. Aravkin and M. P. Friedlander and F. Herrmann and T. van Leeuwen}, Year = {2012}, Month = {January}, Journal = {Mathematical Programming}, Number = {1}, Volume = {134}, Pages = {101-125}, Doi = {10.1007/s10107-012-0571-6}, Title = {Robust inversion, dimensionality reduction, and randomized sampling} } - [abs]
[bib]
Diffuse optical fluorescence tomography using time-resolved data acquired in transmission
F. Leblond, S. Fortier, M. P. Friedlander. Multimodal Biomedical Imaging II, vol. 6431. Proc. Intern. Society Optimal Imaging,
2007.
[Code]
[DOI]
[more]We present an algorithm using data acquired with a time-resolved system with the goal of reconstructing sources of fluorescence emanating from the deep interior of highly scattering biological tissues. A novelty in our tomography algorithm is the integration of a light transport model adapted to rodent geometries. For small volumes, our analysis suggest that neglecting the index of refraction mismatch between diffusive and non-diffusive regions, as well as the curved nature of the boundary, can have a profound impact on fluorescent images and spectroscopic applications relying on diffusion curve fitting. Moreover, we introduce a new least-squares solver with bound constraints adapted for optical problems where a physical non-negative constraint can be imposed. Finally, we find that maximizing the time-related information content of the data in the reconstruction process significantly enhances the quality of fluorescence images. Preliminary noise propagation and detector placement optimization analysis are also presented.
@article{Leblond2007Diffuse, Author = {F. Leblond and S. Fortier and M. P. Friedlander}, Year = {2007}, Month = {February}, Journal = {Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE}, Volume = {6431}, Pages = {643106}, Doi = {10.1117/12.700841}, Title = {Diffuse optical fluorescence tomography using time-resolved data acquired in transmission} }
Algorithms for Machine Learning
- [abs]
[bib]
Decentralized Optimization with Topology-Independent Communication
Y. Lin, Y. Kuang, A. Alacaoglu, M. P. Friedlander. arXiv:2509.14488,
2025.Distributed optimization requires nodes to coordinate, yet full synchronization scales poorly. When $n$ nodes collaborate through $m$ pairwise regularizers, standard methods demand $\mathcal{O}(m)$ communications per iteration. This paper proposes randomized local coordination: each node independently samples one regularizer uniformly and coordinates only with nodes sharing that term. This exploits partial separability, where each regularizer $G_j$ depends on a subset $S_j \subseteq \{1,\ldots,n\}$ of nodes. For graph-guided regularizers where $|S_j|=2$, expected communication drops to exactly 2 messages per iteration. This method achieves $\tilde{\mathcal{O}}(\varepsilon^{-2})$ iterations for convex objectives and under strong convexity, $\mathcal{O}(\varepsilon^{-1})$ to an $\varepsilon$-solution and $\mathcal{O}(\log(1/\varepsilon))$ to a neighborhood. Replacing the proximal map of the sum $\sum_j G_j$ with the proximal map of a single randomly selected regularizer $G_j$ preserves convergence while eliminating global coordination. Experiments validate both convergence rates and communication efficiency across synthetic and real-world datasets.
@misc{LinKuangAlacaogluFriedlander2025, author = {Lin, Ying and Kuang, Yao and Alacaoglu, Ahmet and Friedlander, Michael P.}, title = {Decentralized Optimization with Topology-Independent Communication}, year = {2025}, eprint = {2509.14488}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, doi = {10.48550/arXiv.2509.14488} } - [abs]
[bib]
Scalable Data-Driven Basis Selection for Linear Machine Learning Interatomic Potentials
T. Torabi, M. Militzer, M. P. Friedlander, C. Ortner. arXiv:2504.16418,
2025.
[arXiv]Machine learning interatomic potentials (MLIPs) pro- vide an effective approach for accurately and efficiently modeling atomic interactions, expanding the capabilities of atomistic sim- ulations to complex systems. However, a priori feature selection leads to high complexity, which can be detrimental to both computational cost and generalization, resulting in a need for hyperparameter tuning. We demonstrate the benefits of active set algorithms for automated data-driven feature selection. The proposed methods are implemented within the Atomic Cluster Expansion (ACE) framework. Computational tests conducted on a variety of benchmark datasets indicate that sparse ACE models consistently enhance computational efficiency, generalization accuracy and interpretability over dense ACE models. An added benefit of the proposed algorithms is that they produce entire paths of models with varying cost/accuracy ratio.
@article{torabi2025scalable, title={Scalable Data-Driven Basis Selection for Linear Machine Learning Interatomic Potentials}, author={Torabi, Tina and Militzer, Matthias and Friedlander, Michael P and Ortner, Christoph}, journal={arXiv preprint arXiv:2504.16418}, year={2025} } - [abs]
[bib]
Knowledge-injected federated learning
Z. Fan, H. Fang, Z. Zhou, J. Pei, M. P. Friedlander, J. Hu, C. Li, Y. Zhang. arXiv:2208.07530,
2022.
[arXiv]
[GitHub]
[DOI]Federated learning is an emerging technique for training models from decentralized data sets. In many applications, data owners participating in the federated learning system hold not only the data but also a set of domain knowledge. Such knowledge includes human know-how and craftsmanship that can be extremely helpful to the federated learning task. In this work, we propose a federated learning framework that allows the injection of participants’ domain knowledge, where the key idea is to refine the global model with knowledge locally. The scenario we consider is motivated by a real industry-level application, and we demonstrate the effectiveness of our approach to this application.
@misc{Fan2022Knowledge, Author = {Z. Fan and H. Fang and Z. Zhou and J. Pei and M. P. Friedlander and J. Hu and C. Li and Y. Zhang}, Year = {2022}, Month = {August}, Doi = {10.48550/arxiv.2208.07530}, Title = {Knowledge-injected federated learning} } - [abs]
[bib]
Online mirror descent and dual averaging: keeping pace in the dynamic case
H. Fang, N J. A. Harvey, V. S. Portella, M. P. Friedlander. J. Machine Learning Research,
2022.Online mirror descent (OMD) and dual averaging (DA) are two fundamental algorithms for online convex optimization. They are known to have very similar (or even identical) performance guarantees in most scenarios when a fixed learning rate is used. However, for dynamic learning rates OMD is provably inferior to DA. It is known that, with a dynamic learning rate, OMD can suffer linear regret, even in common settings such as prediction with expert advice. This hints that the relationship between OMD and DA is not fully understood at present. In this paper, we modify the OMD algorithm by a simple technique that we call stabilization. We give essentially the same abstract regret bound for stabilized OMD and DA by modifying the classical OMD convergence analysis in a careful and modular way, yielding proofs that we believe to be clean and flexible. Simple corollaries of these bounds show that OMD with stabilization and DA enjoy the same performance guarantees in many applications even under dynamic learning rates. We also shed some light on the similarities between OMD and DA and show simple conditions under which stabilized OMD and DA generate the same iterates.
@article{fang2022online, author = {Huang Fang and Nicholas J. A. Harvey and Victor S. Portella and Michael P. Friedlander}, title = {Online Mirror Descent and Dual Averaging: Keeping Pace in the Dynamic Case}, journal = {Journal of Machine Learning Research}, year = {2022}, volume = {23}, number = {121}, pages = {1-38}, url = {http://jmlr.org/papers/v23/21-1027.html} } - [abs]
[bib]
A dual approach for federated learning
Z. Fan, H. Fang, M. P. Friedlander. arXiv 2201.11183,
2022.
[arXiv]
[GitHub]
[DOI]We study the federated optimization problem from a dual perspective and propose a new algorithm termed federated dual coordinate descent (FedDCD), which is based on a type of coordinate descent method developed by Necora et al. [Journal of Optimization Theory and Applications, 2017]. Additionally, we enhance the FedDCD method with inexact gradient oracles and Nesterov’s acceleration. We demonstrate theoretically that our proposed approach achieves better convergence rates than the state-of-the-art primal federated optimization algorithms under mild conditions. Numerical experiments on real-world datasets support our analysis.
@misc{Fan2022Dual, Author = {Z. Fan and H. Fang and M. P. Friedlander}, Year = {2022}, Month = {January}, Doi = {10.48550/arxiv.2201.11183}, Title = {A dual approach for federated learning} } - [abs]
[bib]
Improving fairness for data valuation in horizontal federated learning
Z. Fan, H. Fang, Z. Zhou, J. Pei, M. P. Friedlander, C. Liu, Y. Zhang. Intern Conf Data Engineering (ICDE),
2022.
[IEEE]
[arXiv]
[DOI]Federated learning is an emerging decentralized machine learning scheme that allows multiple data owners to work collaboratively while ensuring data privacy. The success of federated learning depends largely on the participation of data owners. To sustain and encourage data owners’ participation, it is crucial to fairly evaluate the quality of the data provided by the data owners and reward them correspondingly. Federated Shapley value, recently proposed by Wang et al. [Federated Learning, 2020], is a measure for data value under the framework of federated learning that satisfies many desired properties for data valuation. However, there are still factors of potential unfairness in the design of federated Shapley value because two data owners with the same local data may not receive the same evaluation. We propose a new measure called completed federated Shapley value to improve the fairness of federated Shapley value. The design depends on completing a matrix consisting of all the possible contributions by different subsets of the data owners. It is shown under mild conditions that this matrix is approximately low-rank by leveraging concepts and tools from optimization. Both theoretical analysis and empirical evaluation verify that the proposed measure does improve fairness in many circumstances.
@article{Fan2022Improving, Author = {Z. Fan and H. Fang and Z. Zhou and J. Pei and M. P. Friedlander and C. Liu and Y. Zhang}, Year = {2022}, Month = {January}, Journal = {2022 IEEE 38th International Conference on Data Engineering (ICDE)}, Pages = {2440-2453}, Doi = {10.1109/icde53745.2022.00228}, Title = {Improving fairness for data valuation in horizontal federated learning} } - [abs]
[bib]
Fair and efficient contribution valuation for vertical federated learning
Z. Fan, H. Fang, Z. Zhou, J. Pei, M. P. Friedlander, Y. Zhang. arXiv:2201.02658,
2022.
[arXiv]
[DOI]Federated learning is a popular technology for training machine learning models on distributed data sources without sharing data. Vertical federated learning or feature-based federated learning applies to the cases that different data sources share the same sample ID space but differ in feature space. To ensure the data owners’ long-term engagement, it is critical to objectively assess the contribution from each data source and recompense them accordingly. The Shapley value (SV) is a provably fair contribution valuation metric originated from cooperative game theory. However, computing the SV requires extensively retraining the model on each subset of data sources, which causes prohibitively high communication costs in federated learning. We propose a contribution valuation metric called vertical federated Shapley value (VerFedSV) based on SV. We show that VerFedSV not only satisfies many desirable properties for fairness but is also efficient to compute, and can be adapted to both synchronous and asynchronous vertical federated learning algorithms. Both theoretical analysis and extensive experimental results verify the fairness, efficiency, and adaptability of VerFedSV.
@misc{Fan2022Fair, Author = {Z. Fan and H. Fang and Z. Zhou and J. Pei and M. P. Friedlander and Y. Zhang}, Year = {2022}, Month = {January}, Doi = {10.48550/arxiv.2201.02658}, Title = {Fair and efficient contribution valuation for vertical federated learning} } - [abs]
Fast convergence of the stochastic subgradient method under interpolation
H. Fang, Z. Fan, M. P. Friedlander. Intern. Conf. Learning Representations (ICLR),
2021.This paper studies the behaviour of the stochastic subgradient descent (SSGD) method applied to over-parameterized nonsmooth optimization problems that satisfy an interpolation condition. By leveraging the composite structure of the empirical risk minimization problems, we prove that when interpolation holds, SSGD converges at the same rate as the stochastic gradient descent (SGD) method applied to smooth problems. Our analysis provides a partial explanation for the empirical observation that sometimes SGD and SSGD behave similarly for training smooth and nonsmooth machine learning models. We also prove that the rates we derive are optimal for the subgradient method in the convex and interpolation setting.
- [abs]
[bib]
Online mirror descent and dual averaging: keeping pace in the dynamic case
H. Fang, N J. A. Harvey, V. S. Portella, M. P. Friedlander. Intern. Conf. Machine Learning (ICML),
2020.
[arXiv]
[video]
[DOI]Online mirror descent (OMD) and dual averaging (DA) are two fundamental algorithms for online convex optimization. They are known to have very similar (or even identical) performance guarantees in most scenarios when a fixed learning rate is used. However, for dynamic learning rates OMD is provably inferior to DA. It is known that, with a dynamic learning rate, OMD can suffer linear regret, even in common settings such as prediction with expert advice. This hints that the relationship between OMD and DA is not fully understood at present. In this paper, we modify the OMD algorithm by a simple technique that we call stabilization. We give essentially the same abstract regret bound for stabilized OMD and DA by modifying the classical OMD convergence analysis in a careful and modular way, yielding proofs that we believe to be clean and flexible. Simple corollaries of these bounds show that OMD with stabilization and DA enjoy the same performance guarantees in many applications even under dynamic learning rates. We also shed some light on the similarities between OMD and DA and show simple conditions under which stabilized OMD and DA generate the same iterates.
@article{Fang2020Online, Author = {H. Fang and N J. A. Harvey and V. S. Portella and M. P. Friedlander}, Year = {2020}, Month = {June}, Journal = {International Conference on Machine Learning}, Volume = {1}, Pages = {3008-3017}, Title = {Online mirror descent and dual averaging: keeping pace in the dynamic case} } - [abs]
[bib]
Greed meets sparsity: understanding and improving greedy coordinate descent for sparse optimization
H. Fang, Z. Fan, Y. Sun, M. P. Friedlander. Intern. Conf. Artificial Intelligence and Statistics (AISTATS),
2020.We consider greedy coordinate descent (GCD) for composite problems with sparsity inducing regularizers, including 1-norm regularization and non-negative constraints. Empirical evidence strongly suggests that GCD, when initialized with the zero vector, has an implicit screening ability that usually selects at each iteration coordinates that at are nonzero at the solution. Thus, for problems with sparse solutions, GCD can converge significantly faster than randomized coordinate descent. We present an improved convergence analysis of GCD for sparse optimization, and a formal analysis of its screening properties. We also propose and analyze an improved selection rule with stronger ability to produce sparse iterates. Numerical experiments on both synthetic and real-world data support our analysis and the effectiveness of the proposed selection rule.
@article{Fang2020Greed, Author = {H. Fang and Z. Fan and Y. Sun and M. P. Friedlander}, Year = {2020}, Month = {February}, Journal = {International Conference on Artificial Intelligence and Statistics}, Pages = {434-444}, Title = {Greed meets sparsity: understanding and improving greedy coordinate descent for sparse optimization} } - [abs]
[bib]
Fast training for large-scale one-versus-all linear classifiers using tree-structured initialization
H. Fang, M. Cheng, C.-J. Hsieh, M. P. Friedlander. SIAM Intern. Conf. Data Mining (SDM),
2019.
[DOI]We consider the problem of training one-versus-all (OVA) linear classifiers for multiclass or multilabel classification when the number of labels is large. A naive extension of OVA to this problem, even with hundreds of cores, usually requires hours for training on large real world datasets. We propose a novel algorithm called OVA-Primal++ that speeds up the training of OVA by using a tree-structured training order, where each classifier is trained using its parent’s classifier as initialization. OVA-Primal++ is both theoretically and empirically faster than the naive OVA algorithm, and yet still enjoys the same highly parallelizability and small memory footprint. Extensive experiments on multiclass and multilabel classification datasets validate the effectiveness of our method.
@incollection{Fang2019Fast, Author = {H. Fang and M. Cheng and C.-J. Hsieh and M. P. Friedlander}, Year = {2019}, Month = {February}, Pages = {280-288}, Doi = {10.1137/1.9781611975673.32}, Title = {Fast training for large-scale one-versus-all linear classifiers using tree-structured initialization} } - [abs]
[bib]
Satisfying real-world goals with dataset constraints
G. Goh, A. Cotter, M. Gupta, M. P. Friedlander. Advances in Neural Information Processing Systems 29 (NIPS),
2016.
[DOI]The goal of minimizing misclassification error on a training set is often just one of several real-world goals that might be defined on different datasets. For example, one may require a classifier to also make positive predictions at some specified rate for some subpopulation (fairness), or to achieve a specified empirical recall. Other real-world goals include reducing churn with respect to a previously deployed model, or stabilizing online training. In this paper we propose handling multiple goals on multiple datasets by training with dataset constraints, using the ramp penalty to accurately quantify costs, and present an efficient algorithm to approximately optimize the resulting non-convex constrained optimization problem. Experiments on both benchmark and real-world industry datasets demonstrate the effectiveness of our approach.
@article{Goh2016Satisfying, Author = {G. Goh and A. Cotter and M. Gupta and M. P. Friedlander}, Year = {2016}, Month = {December}, Journal = {Neural Information Processing Systems}, Volume = {29}, Pages = {2423-2431}, Title = {Satisfying real-world goals with dataset constraints} } - [abs]
[bib]
Coordinate descent converges faster with the Gauss-Southwell rule than random selection
J. Nutini, M. Schmidt, I. H. Laradji, M. P. Friedlander, H. Koepke. Intern. Conf. Machine Learning (ICML),
2015.
[Preprint]
[NIPS PDF]
[ICML PDF]
[ICML Supplementary]
[DOI]There has been significant recent work on the theory and application of randomized coordinate descent algorithms, beginning with the work of Nesterov [SIAM J. Optim., 22(2), 2012], who showed that a random-coordinate selection rule achieves the same convergence rate as the Gauss-Southwell selection rule. This result suggests that we should never use the Gauss-Southwell rule, as it is typically much more expensive than random selection. However, the empirical behaviours of these algorithms contradict this theoretical result: in applications where the computational costs of the selection rules are comparable, the Gauss-Southwell selection rule tends to perform substantially better than random coordinate selection. We give a simple analysis of the Gauss-Southwell rule showing that–except in extreme cases–its convergence rate is faster than choosing random coordinates. Further, in this work we (i) show that exact coordinate optimization improves the convergence rate for certain sparse problems, (ii) propose a Gauss-Southwell-Lipschitz rule that gives an even faster convergence rate given knowledge of the Lipschitz constants of the partial derivatives, (iii) analyze the effect of approximate Gauss-Southwell rules, and (iv) analyze proximal-gradient variants of the Gauss-Southwell rule.
@article{Nutini2015Coordinate, Author = {J. Nutini and M. Schmidt and I. H. Laradji and M. P. Friedlander and H. Koepke}, Year = {2015}, Month = {July}, Pages = {1632-1641}, Title = {Coordinate descent converges faster with the Gauss-Southwell rule than random selection} } - [abs]
[bib]
Fast dual variational inference for non-conjugate latent gaussian models
M. E. Khan, A. Y. Aravkin, M. P. Friedlander, M. Seeger. Intern. Conf. Machine Learning (ICML),
2013.Latent Gaussian models (LGMs) are widely used in statistics and machine learning. Bayesian inference in non-conjugate LGMs is difficult due to intractable integrals involving the Gaussian prior and non-conjugate likelihoods. Algorithms based on variational Gaussian (VG) approximations are widely employed since they strike a favorable balance between accuracy, generality, speed, and ease of use. However, the structure of the optimization problems associated with these approximations remains poorly understood, and standard solvers take too long to converge. We derive a novel dual variational inference approach that exploits the convexity property of the VG approximations. We obtain an algorithm that solves a convex optimization problem, reduces the number of variational parameters, and converges much faster than previous methods. Using realworld data, we demonstrate these advantages on a variety of LGMs, including Gaussian process classification, and latent Gaussian Markov random fields.
@article{Khan2013Fast, Author = {M. E. Khan and A. Y. Aravkin and M. P. Friedlander and M. Seeger}, Year = {2013}, Month = {May}, Pages = {951-959}, Title = {Fast dual variational inference for non-conjugate latent gaussian models} } - [abs]
[bib]
Tail bounds for stochastic approximation
M. P. Friedlander, G. Goh. arXiv:1304.5586,
2013.
[arXiv]
[DOI]Stochastic-approximation gradient methods are attractive for large-scale convex optimization because they offer inexpensive iterations. They are especially popular in data-fitting and machine-learning applications where the data arrives in a continuous stream, or it is necessary to minimize large sums of functions. It is known that by appropriately decreasing the variance of the error at each iteration, the expected rate of convergence matches that of the underlying deterministic gradient method. Conditions are given under which this happens with overwhelming probability.
@misc{Friedlander2013Tail, Author = {M. P. Friedlander and G. Goh}, Year = {2013}, Month = {April}, Doi = {10.48550/arxiv.1304.5586}, Title = {Tail bounds for stochastic approximation} } - [abs]
[bib]
Hybrid deterministic-stochastic methods for data fitting
M. P. Friedlander, M. Schmidt. SIAM J. Scientific Computing, 34(3):A1380–A1405,
2012.
[more]Many structured data-fitting applications require the solution of an optimization problem involving a sum over a potentially large number of measurements. Incremental gradient algorithms offer inexpensive iterations by sampling a subset of the terms in the sum; these methods can make great progress initially, but often slow as they approach a solution. In contrast, full-gradient methods achieve steady convergence at the expense of evaluating the full objective and gradient on each iteration. We explore hybrid methods that exhibit the benefits of both approaches. Rate-of-convergence analysis shows that by controlling the sample size in an incremental-gradient algorithm, it is possible to maintain the steady convergence rates of full-gradient methods. We detail a practical quasi-Newton implementation based on this approach. Numerical experiments illustrate its potential benefits.
@article{FriedlanderSchmidt2012, author = {M. P. Friedlander and M. Schmidt}, title = {Hybrid deterministic-stochastic methods for data fitting}, journal = {SIAM J. Scientific Computing}, volume = {34}, number = {3}, pages = {A1380–A1405}, year = {2012} } - [abs]
[bib]
Optimizing costly functions with simple constraints: a limited-memory projected quasi-Newton algorithm
M. Schmidt, E. van den Berg, M. P. Friedlander, K. Murphy. Intern. Conf. Artificial Intelligence and Statistics (AISTATS),
2009.
[Slides]
[Software]An optimization algorithm for minimizing a smooth function over a convex set is described. Each iteration of the method computes a descent direction by minimizing, over the original constraints, a diagonal plus low-rank quadratic approximation to the function. The quadratic approximation is constructed using a limited-memory quasi-Newton update. The method is suitable for large-scale problems where evaluation of the function is substantially more expensive than projection onto the constraint set. Numerical experiments on one-norm regularized test problems indicate that the proposed method is competitive with state-of-the-art methods such as bound-constrained L-BFGS and orthant-wise descent. We further show that the method generalizes to a wide class of problems, and substantially improves on state-of-the-art methods for problems such as learning the structure of Gaussian graphical models and Markov random fields.
@article{Schmidt2009Optimizing, Author = {M. Schmidt and E. van den Berg and M. P. Friedlander and K. Murphy}, Year = {2009}, Month = {April}, Pages = {456-463}, Title = {Optimizing costly functions with simple constraints: a limited-memory projected quasi-Newton algorithm} }
Statistical Signal Processing
- [abs]
[bib]
Endmember modelling of detrital zircon petrochronology data via multivariate Tucker-1 tensor decomposition
J. E. Saylor, N. Richardson, N. Graham, R. G. Lee, M. P. Friedlander. EGU General Assembly 2025,
2025.
[EGU Abstract]
[Web App]
[GitHub]
Detrital petrochronology is a powerful method of characterizing sediment and potentially sediment sources. The recently developed Tucker-1 decomposition method holds promise of using detrital petrochronology to identify both sediment-source characteristics and the proportions in which sources are present in sink samples even when sediment sources are unknown or unavailable for sampling. However, the correlation between endmember characteristics and lithological sources or proportions and sedimentary processes has not been established. Herein we present a case study of the recently developed Tucker-1 decomposition method to a multivariate geochemical data set from detrital zircons in till samples collected above the Cu-bearing Guichon Creek Batholith (GCB) in southern British Columbia, Canada. Data include a suite of eleven variables, including age, Ce anomaly, CeN/NdN, DyN/YbN, ΔFMQ, Eu anomaly, ΣHREE/ΣMREE, Hf, Th/U, Ti temperature, and YbN/GdN, from 12 samples from collected at a range of distances in the down ice-flow direction from the GCB.
We demonstrate that endmember modelling using the Tucker-1 decomposition method successfully deconvolves the multivariate data sets into two endmembers in which the geochemical distributions are consistent with derivation from either non-oxidized and relatively anhydrous (i.e., low ore potential, Source 1) or oxidized and hydrous (i.e., potential ore bodies, Source 2) igneous rocks. Furthermore, we demonstrate that the proportions of the Source 2 endmember decrease with increasing distance from the ore bodies, as expected due to downstream zircon mixing and dilution.
Finally, we attribute each of the zircon grains to either the Source 1 or 2 endmember based on maximization of the likelihood that their measured multivariate geochemistry was drawn from one or the other of the learned multivariate endmembers. We compared these grain attributions to the results of an independent Classification and Regression Tree (CART) analysis designed to characterize zircon grains as either “fertile” or “barren” with respect to copper based on their geochemistry. We find that there is ~80% overlap between the source attributions based on the CART analysis and the grain-source identification based on the Tucker-1 decomposition.
We conclude that the novel Tucker-1 decomposition approach provides a flexible, precise, and accurate method of characterizing multivariate sediment sources even when those sources are unknown. It thus provides a basis for future petrochronological interpretations with applied and pure geoscience applications. All of the analyses presented herein can be freely accessed through a web application (https://dzgrainalyzer.eoas.ubc.ca/) or open-source Julia code (https://github.com/MPF-Optimization-Laboratory/MatrixTensorFactor.jl).
@inproceedings{saylor2025endmember, title={Endmember modelling of detrital zircon petrochronology data via multivariate Tucker-1 tensor decomposition}, author={Saylor, Joel E and Richardson, Nicholas and Graham, Naomi and Lee, Robert G and Friedlander, Michael P}, booktitle={EGU General Assembly Conference Abstracts}, pages={EGU25--4081}, year={2025} } - [abs]
[bib]
NBIHT: An efficient algorithm for 1-bit compressed sensing with optimal error decay rate
M. P. Friedlander, H. Jeong, Y. Plan, O. Yιlmaz. IEEE Trans Info Theory,
2021.
[DOI]The Binary Iterative Hard Thresholding (BIHT) algorithm is a popular reconstruction method for one-bit compressed sensing due to its simplicity and fast empirical convergence. There have been several works about BIHT but a theoretical understanding of the corresponding approximation error and convergence rate still remains open. This paper shows that the normalized version of BIHT (NBHIT) achieves an approximation error rate optimal up to logarithmic factors. More precisely, using m one-bit measurements of an s-sparse vector x, we prove that the approximation error of NBIHT is of order Ο(1/m) up to logarithmic factors, which matches the information-theoretic lower bound Ω(1/m) proved by Jacques, Laska, Boufounos, and Baraniuk in 2013. To our knowledge, this is the first theoretical analysis of a BIHT-type algorithm that explains the optimal rate of error decay empirically observed in the literature. This also makes NBIHT the first provable computationally-efficient one-bit compressed sensing algorithm that breaks the inverse square root error decay rate Ο(1/√m).
@misc{Friedlander2021Nbiht, Author = {M. P. Friedlander and H. Jeong and Y. Plan and O. Yιlmaz}, Year = {2021}, Month = {November}, Number = {2}, Volume = {68}, Pages = {1157-1177}, Doi = {10.1109/tit.2021.3124598}, Title = {NBIHT: An efficient algorithm for 1-bit compressed sensing with optimal error decay rate} } - [abs]
[bib]
Computing nonnegative tensor factorizations
M. P. Friedlander, K. Hatz. Optimization Methods and Software, 23(4):631–647,
2008.
[Journal Link]
[DOI]
[more]Nonnegative tensor factorization (NTF) is a technique for computing a parts-based representation of high-dimensional data. NTF excels at exposing latent structures in datasets, and at finding good low-rank approximations to the data. We describe an approach for computing the NTF of a dataset that relies only on iterative linear-algebra techniques and that is comparable in cost to the nonnegative matrix factorization. (The better-known nonnegative matrix factorization is a special case of NTF and is also handled by our implementation.) Some important features of our implementation include mechanisms for encouraging sparse factors and for ensuring that they are equilibrated in norm. The complete Matlab software package is available under the GPL license.
@article{Friedlander2008Computing, Author = {M. P. Friedlander and K. Hatz}, Year = {2008}, Month = {August}, Journal = {Optimization methods & software}, Number = {4}, Volume = {23}, Pages = {631-647}, Doi = {10.1080/10556780801996244}, Title = {Computing nonnegative tensor factorizations} } - [abs]
[bib]
On minimizing distortion and relative entropy
M. P. Friedlander, M. R. Gupta. IEEE Trans. Info. Theory, 52(1):238–245,
2006.
[DOI]A common approach for estimating a probability mass function \(w\) when given a prior \(q\) and moment constraints given by \(Aw\le b\) is to minimize the relative entropy between \(w\) and \(q\) subject to the set of linear constraints. In such cases, the solution \(w\) is known to have exponential form. We consider the case in which the linear constraints are noisy, uncertain, infeasible, or otherwise “soft.” A solution can then be obtained by minimizing both the relative entropy and violation of the constraints \(Aw\le b\). A penalty parameter \(\sigma\) weights the relative importance of these two objectives. We show that this penalty formulation also yields a solution \(w\) with exponential form. If the distortion is based on an \(\ell_p\) norm, then the exponential form of \(w\) is shown to have exponential decay parameters that are bounded as a function of \(\sigma\). We also state conditions under which the solution \(w\) to the penalty formulation will result in zero distortion, so that the moment constraints hold exactly. These properties are useful in choosing penalty parameters, evaluating the impact of chosen penalty parameters, and proving properties about methods that use such penalty formulations. to maximizing entropy.
@article{Friedlander2006Minimizing, Author = {M. P. Friedlander and M. R. Gupta}, Year = {2006}, Month = {January}, Journal = {IEEE Transactions on Information Theory}, Number = {1}, Volume = {52}, Pages = {238-245}, Doi = {10.1109/tit.2005.860448}, Title = {On minimizing distortion and relative entropy} } - [abs]
[bib]
Maximum entropy classification applied to speech
M. R. Gupta, M. P. Friedlander, R. M. Gray. Asilomar Conf. Signals, Systems, Computers (ACSSC), vol. 2, 1480–1483,
2000.
[DOI]We present a new method for classification using the maximum entropy principle, allowing full use of relevant training data and smoothing the data space. To classify a test point we compute a maximum entropy weight distribution over a subset of training data and constrain the weights to exactly reconstruct the test point. The classification problem is formulated as a linearly constrained optimization problem and solved using a primal-dual logarithmic barrier method, well suited for high-dimensional data. We discuss theoretical advantages and present experimental results on vowel data which demonstrate that the method performs competitively for speech classification tasks.
@article{Gupta2000Maximum, Author = {M. R. Gupta and M. P. Friedlander and R. M. Gray}, Year = {2000}, Month = {October}, Volume = {2}, Pages = {1480-1483}, Doi = {10.1109/acssc.2000.911236}, Title = {Maximum entropy classification applied to speech} }
Structured Optimization
- [abs]
[bib]
Cardinality-constrained structured data-fitting problems
Z. Fan, H. Fang, M. P. Friedlander. Open J. Math. Optim.,
2023.
[Journal]
[DOI]A memory-efficient solution framework is proposed for the cardinality-constrained structured data-fitting problem. Dual-based atom-identification rules reveal the structure of the optimal primal solution from near-optimal dual solutions, which allows for a simple and computationally efficient algorithm that translates any feasible dual solution into a primal solution satisfying the cardinality constraint. Rigorous guarantees bound the quality of a near-optimal primal solution given any dual-based method that generates dual iterates converging to an optimal dual solution. Numerical experiments on real-world datasets support the analysis and demonstrate the efficiency of the proposed approach.
@misc{Fan2023Cardinality, Author = {Z. Fan and H. Fang and M. P. Friedlander}, Year = {2023}, Month = {July}, Doi = {10.48550/arxiv.2107.11373}, Title = {Cardinality-constrained structured data-fitting problems} } - [abs]
[bib]
Polar deconvolution of mixed signals
Z. Fan, H. Jeong, B. Joshi, M. P. Friedlander. IEEE Trans. Signal Processing,
2022.
[Journal]
[GitHub]The signal demixing problem seeks to separate a superposition of multiple signals into its constituent components. This paper describes a two-stage approach that first decompresses and subsequently deconvolves the noisy and undersampled observations of the superposition. Probabilistic error bounds are given on the accuracy with which this process approximates the individual signals. The theory of polar convolution of convex sets and gauge functions plays a central role in the analysis and solution process. If the measurements are random and the noise is bounded, this approach stably recovers low-complexity and mutually incoherent signals, with high probability and with optimal sample complexity. We develop an efficient algorithm, based on level-set and conditional-gradient methods, that solves the convex optimization problems with sublinear iteration complexity and linear space requirements. Numerical experiments on both real and synthetic data confirm the theory and the efficiency of the approach.
@article{fan2022deconvolution, author = {Z. Fan and H. Jeong and B. Joshi and M. P. Friedlander}, title = {Polar deconvolution of mixed signal}, journal = {IEEE Transactions on Signal Processing}, volume = {70}, pages = {2713--2727}, year = {2022}, month = {May}, doi = {10.1109/TSP.2022.3178191} } - [abs]
[bib]
Atomic decomposition via polar alignment: the geometry of structured optimization
Z. Fan, H. Jeong, Y. Sun, M. P. Friedlander. Foundations and Trends in Optimization, 3(4):280–366,
2020.
[Journal]
[DOI]Structured optimization uses a prescribed set of atoms to assemble a solution that fits a model to data. Polarity, which extends the familiar notion of orthogonality from linear sets to general convex sets, plays a special role in a simple and geometric form of convex duality. This duality correspondence yields a general notion of alignment that leads to an intuitive and complete description of how atoms participate in the final decomposition of the solution. The resulting geometric perspective leads to variations of existing algorithms effective for large-scale problems. We illustrate these ideas with many examples, including applications in matrix completion and morphological component analysis for the separation of mixtures of signals.
@article{Fan2020Atomic, Author = {Z. Fan and H. Jeong and Y. Sun and M. P. Friedlander}, Year = {2020}, Month = {November}, Journal = {Foundations and Trends® in Optimization}, Number = {4}, Volume = {3}, Pages = {280-366}, Doi = {10.1561/2400000028}, Title = {Atomic decomposition via polar alignment: the geometry of structured optimization} } - [abs]
[bib]
Bundle methods for dual atomic pursuit
Z. Fan, Y. Sun, M. P. Friedlander. Asilomar Conf. Signals, Systems, Computers (ACSSC),
2019.
[arXiv]
[DOI]The aim of structured optimization is to assemble a solution, using a given set of (possibly uncountably infinite) atoms, to fit a model to data. A two-stage algorithm based on gauge duality and bundle method is proposed. The first stage discovers the optimal atomic support for the primal problem by solving a sequence of approximations of the dual problem using a bundle-type method. The second stage recovers the approximate primal solution using the atoms discovered in the first stage. The overall approach leads to implementable and efficient algorithms for large problems.
@article{Fan2019Bundle, Author = {Z. Fan and Y. Sun and M. P. Friedlander}, Year = {2019}, Month = {November}, Volume = {59}, Pages = {264-270}, Doi = {10.1109/ieeeconf44664.2019.9048711}, Title = {Bundle methods for dual atomic pursuit} } - [abs]
[bib]
One-shot atomic detection
Y. Sun, M. P. Friedlander. IEEE Intern. Workshop Comput. Adv. Multi-Sensor Adaptive Proc. (CAMSAP),
2019.
[DOI]Feature selection in data science involves identifying the most prominent and uncorrelated features in the data, which can be useful for compression and interpretability. If these feature can be easily extracted, then a model can be trained over a reduced set of weights, which leads to more efficient training and possibly more robust classifiers. There are many approaches to feature selection; in this work, we propose screening the “atoms” of a gradient of a loss function taken at a random point. We illustrate this approach on sparse and low-rank optimization problems. Despite the simplicity of the approach, we are often able to select the dominant features easily, and greatly improve the runtime and robustness in training overparametrized models.
@article{Sun2019One, Author = {Y. Sun and M. P. Friedlander}, Year = {2019}, Month = {September}, Volume = {16}, Pages = {1-5}, Doi = {10.1109/camsap45676.2019.9022441}, Title = {One-shot atomic detection} } - [abs]
[bib]
Efficient evaluation of scaled proximal operators
M. P. Friedlander, G. Goh. Electronic Trans. Numerical Analysis, 46:1–22,
2017.
[DOI]Quadratic-support functions, cf. Aravkin, Burke, and Pillonetto [J. Mach. Learn. Res., 14 (2013)], constitute a parametric family of convex functions that includes a range of useful regularization terms found in applications of convex optimization. We show how an interior method can be used to efficiently compute the proximal operator of a quadratic-support function under different metrics. When the metric and the function have the right structure, the proximal map can be computed with costs nearly linear in the input size. We describe how to use this approach to implement quasi-Newton methods for a rich class of nonsmooth problems that arise, for example, in sparse optimization, image denoising, and sparse logistic regression.
@misc{Friedlander2017Efficient, Author = {M. P. Friedlander and G. Goh}, Year = {2017}, Month = {February}, Doi = {10.48550/arxiv.1603.05719}, Title = {Efficient evaluation of scaled proximal operators} } - [abs]
[bib]
Low-rank spectral optimization via gauge duality
M. P. Friedlander, I. Macêdo. SIAM J. Scientific Computing, 38(3):A1616–A1638,
2016.
[DOI]
[more]Various applications in signal processing and machine learning give rise to highly structured spectral optimization problems characterized by low-rank solutions. Two important examples that motivate this work are optimization problems from phase retrieval and from blind deconvolution, which are designed to yield rank-1 solutions. An algorithm is described that is based on solving a certain constrained eigenvalue optimization problem that corresponds to the gauge dual which, unlike the more typical Lagrange dual, has an especially simple constraint. The dominant cost at each iteration is the computation of rightmost eigenpairs of a Hermitian operator. A range of numerical examples illustrate the scalability of the approach.
@article{Friedlander2016Low, Author = {M. P. Friedlander and I. Macêdo}, Year = {2016}, Month = {June}, Journal = {SIAM Journal on Scientific Computing}, Number = {3}, Volume = {38}, Pages = {A1616-A1638}, Doi = {10.1137/15m1034283}, Title = {Low-rank spectral optimization via gauge duality} } - [abs]
[bib]
Recovering compressively sampled signals using partial support information
M. P. Friedlander, H. Mansour, R. Saab, Ö. Yılmaz. IEEE Trans. Info. Theory, 58(2):1122–1134,
2012.
[Journal Link]
[DOI]In this paper we study recovery conditions of weighted \(\ell_1\) minimization for signal reconstruction from compressed sensing measurements when partial support information is available. We show that if at least 50% of the (partial) support information is accurate, then weighted \(\ell_1\) minimization is stable and robust under weaker conditions than the analogous conditions for standard \(\ell_1\) minimization. Moreover, weighted \(\ell_1\) minimization provides better bounds on the reconstruction error in terms of the measurement noise and the compressibility of the signal to be recovered. We illustrate our results with extensive numerical experiments on synthetic data and real audio and video signals.
@article{Friedlander2012Recovering, Author = {M. P. Friedlander and H. Mansour and R. Saab and Ö. Yılmaz}, Year = {2012}, Month = {February}, Journal = {IEEE Transactions on Information Theory}, Number = {2}, Volume = {58}, Pages = {1122-1134}, Doi = {10.1109/tit.2011.2167214}, Title = {Recovering compressively sampled signals using partial support information} } - [abs]
[bib]
Sparse optimization with least-squares constraints
E. van den Berg, M. P. Friedlander. SIAM J. Optimization, 21(4):1201–1229,
2011.
[DOI]
[more]The use of convex optimization for the recovery of sparse signals from incomplete or compressed data is now common practice. Motivated by the success of basis pursuit in recovering sparse vectors, new formulations have been proposed that take advantage of different types of sparsity. In this paper we propose an efficient algorithm for solving a general class of sparsifying formulations. For several common types of sparsity we provide applications, along with details on how to apply the algorithm, and experimental results.
@article{Berg2011Sparse, Author = {E. van den Berg and M. P. Friedlander}, Year = {2011}, Month = {October}, Journal = {SIAM Journal on Optimization}, Number = {4}, Volume = {21}, Pages = {1201-1229}, Doi = {10.1137/100785028}, Title = {Sparse optimization with least-squares constraints} } - [abs]
[bib]
Theoretical and empirical results for recovery from multiple measurements
E. van den Berg, M. P. Friedlander. IEEE Trans. Info. Theory, 56(5):2516–2527,
2010.
[Code & Data]
[DOI]
[more]The joint-sparse recovery problem aims to recover, from sets of compressed measurements, unknown sparse matrices with nonzero entries restricted to a subset of rows. This is an extension of the single-measurement-vector (SMV) problem widely studied in compressed sensing. We analyze the recovery properties for two types of recovery algorithms. First, we show that recovery using sum-of-norm minimization cannot exceed the uniform recovery rate of sequential SMV using L1 minimization, and that there are problems that can be solved with one approach but not with the other. Second, we analyze the performance of the ReMBo algorithm [M. Mishali and Y. Eldar, IEEE Trans. Sig. Proc., 56 (2008)] in combination with L1 minimization, and show how recovery improves as more measurements are taken. From this analysis it follows that having more measurements than number of nonzero rows does not improve the potential theoretical recovery rate.
@article{Berg2010Theoretical, Author = {E. van den Berg and M. P. Friedlander}, Year = {2010}, Month = {May}, Journal = {IEEE Transactions on Information Theory}, Number = {5}, Volume = {56}, Pages = {2516-2527}, Doi = {10.1109/tit.2010.2043876}, Title = {Theoretical and empirical results for recovery from multiple measurements} } - [abs]
[bib]
Sparco: a testing framework for sparse reconstruction
E. van den Berg, M. P. Friedlander, G. Hennenfent, F. Herrmann, R. Saab, Ö. Yılmaz. ACM Trans. Math. Software, 35(4):1–16,
2009.
[Code]
[more]Sparco is a framework for testing and benchmarking algorithms for sparse reconstruction. It includes a large collection of sparse reconstruction problems drawn from the imaging, compressed sensing, and geophysics literature. Sparco is also a framework for implementing new test problems and can be used as a tool for reproducible research. Sparco is implemented entirely in Matlab, and is released as open-source software under the GNU Public License.
@article{Berg2009Sparco, Author = {E. van den Berg and M. P. Friedlander and G. Hennenfent and F. Herrmann and R. Saab and Ö. Yılmaz}, Year = {2009}, Month = {February}, Title = {Sparco: a testing framework for sparse reconstruction} } - [abs]
[bib]
Probing the Pareto frontier for basis pursuit solutions
E. van den Berg, M. P. Friedlander. SIAM J. Scientific Computing, 31(2):890–912,
2008.
[SPGL1 Software]
[DOI]
[more]The basis pursuit problem seeks a minimum one-norm solution of an underdetermined least-squares problem. Basis pursuit denoise (BPDN) fits the least-squares problem only approximately, and a single parameter determines a curve that traces the optimal trade-off between the least-squares fit and the one-norm of the solution. We prove that this curve is convex and continuously differentiable over all points of interest, and show that it gives an explicit relationship to two other optimization problems closely related to BPDN. We describe a root-finding algorithm for finding arbitrary points on this curve; the algorithm is suitable for problems that are large scale and for those that are in the complex domain. At each iteration, a spectral gradient-projection method approximately minimizes a least-squares problem with an explicit one-norm constraint. Only matrix-vector operations are required. The primal-dual solution of this problem gives function and derivative information needed for the root-finding method. Numerical experiments on a comprehensive set of test problems demonstrate that the method scales well to large problems.
@article{Berg2008Probing, Author = {E. van den Berg and M. P. Friedlander}, Year = {2008}, Month = {November}, Journal = {SIAM Journal on Scientific Computing}, Number = {2}, Volume = {31}, Pages = {890-912}, Doi = {10.1137/080714488}, Title = {Probing the Pareto frontier for basis pursuit solutions} } - [abs]
[bib]
New insights into one-norm solvers from the Pareto curve
G. Hennenfent, E. van den Berg, M. P. Friedlander, F. Herrmann. Geophysics, 73(4):A23–A26,
2008.
[DOI]Geophysical inverse problems typically involve a trade off between data misfit and some prior. Pareto curves trace the optimal trade off between these two competing aims. These curves are commonly used in problems with two-norm priors where they are plotted on a log-log scale and are known as L-curves. For other priors, such as the sparsity-promoting one norm, Pareto curves remain relatively unexplored. We show how these curves lead to new insights in one-norm regularization. First, we confirm the theoretical properties of smoothness and convexity of these curves from a stylized and a geophysical example. Second, we exploit these crucial properties to approximate the Pareto curve for a large-scale problem. Third, we show how Pareto curves provide an objective criterion to gauge how different one-norm solvers advance towards the solution.
@article{Hennenfent2008New, Author = {G. Hennenfent and E. van den Berg and M. P. Friedlander and F. Herrmann}, Year = {2008}, Month = {July}, Journal = {Geophysics}, Number = {4}, Volume = {73}, Pages = {A23-A26}, Doi = {10.1190/1.2944169}, Title = {New insights into one-norm solvers from the Pareto curve} } - [abs]
[bib]
Group sparsity via linear-time projection
E. van den Berg, M. Schmidt, M. P. Friedlander, K. Murphy. Tech. Rep. TR-2008-09, Dept of Computer Science, UBC,
2008.We present an efficient spectral projected-gradient algorithm for optimization subject to a group l1-norm constraint. Our approach is based on a novel linear-time algorithm for Euclidean projection onto the l1- and group l1-norm constraints. Numerical experiments on large data sets suggest that the proposed method is substantially more efficient and scalable than existing methods.
@article{Berg2008Group, Author = {E. van den Berg and M. Schmidt and M. P. Friedlander and K. Murphy}, Year = {2008}, Month = {June}, Title = {Group sparsity via linear-time projection} } - [bib]
Discussion: The Dantzig selector: Statistical estimation when p is much larger than n
M. P. Friedlander, M. A. Saunders. Annals of Statistics, 35(6):2385–2391,
2007.
[Journal]
[Slides]
[DOI]
@article{Friedlander2007Discussion, Author = {M. P. Friedlander and M. A. Saunders}, Year = {2007}, Month = {December}, Journal = {The Annals of Statistics}, Number = {6}, Volume = {35}, Doi = {10.1214/009053607000000479}, Title = {Discussion: The Dantzig selector: Statistical estimation when p is much larger than n} } - [abs]
[bib]
In pursuit of a root
E. van den Berg, M. P. Friedlander. Tech. Rep. TR-2007-19, Dept. of Computer Science, UBC,
2007.
[more]The basis pursuit technique is used to find a minimum one-norm solution of an underdetermined least-squares problem. Basis pursuit denoise fits the least-squares problem only approximately, and a single parameter determines a curve that traces the trade-off between the least-squares fit and the one-norm of the solution. We show that the function that describes this curve is convex and continuously differentiable over all points of interest. The dual solution of a least-squares problem with an explicit one-norm constraint gives function and derivative information needed for a root-finding method. As a result, we can compute arbitrary points on this curve. Numerical experiments demonstrate that our method, which relies on only matrix-vector operations, scales well to large problems.
@article{Berg2007Pursuit, Author = {E. van den Berg and M. P. Friedlander}, Year = {2007}, Month = {June}, Title = {In pursuit of a root} }
Quantum Algorithms
- [abs]
[bib]
Quantum algorithms for structured prediction
B. Sephehry, E. Iranmanesh, M. P. Friedlander, P. Ronagh. Quantum Machine Intelligence,
2022.
[Journal]
[DOI]We introduce two quantum algorithms for solving structured prediction problems. We first show that a stochastic gradient descent that uses the quantum minimum finding algorithm and takes its probabilistic failure into account solves the structured prediction problem with a runtime that scales with the square root of the size of the label space, and in O(1/𝜖) with respect to the precision, 𝜖, of the solution. Motivated by robust inference techniques in machine learning, we then introduce another quantum algorithm that solves a smooth approximation of the structured prediction problem with a similar quantum speedup in the size of the label space and a similar scaling in the precision parameter. In doing so, we analyze a variant of stochastic gradient descent for convex optimization in the presence of an additive error in the calculation of the gradients, and show that its convergence rate does not deteriorate if the additive errors are of the order 𝑂(√ϵ). This algorithm uses quantum Gibbs sampling at temperature Ω(𝜖) as a subroutine. Based on these theoretical observations, we propose a method for using quantum Gibbs samplers to combine feedforward neural networks with probabilistic graphical models for quantum machine learning. Our numerical results using Monte Carlo simulations on an image tagging task demonstrate the benefit of the approach.
@article{Sephehry2022Quantum, Author = {B. Sephehry and E. Iranmanesh and M. P. Friedlander and P. Ronagh}, Year = {2022}, Month = {June}, Journal = {Quantum Machine Intelligence}, Number = {2}, Volume = {4}, Doi = {10.1007/s42484-022-00078-w}, Title = {Quantum algorithms for structured prediction} } - [abs]
[bib]
Smooth structured prediction using quantum and classical Gibbs samplers
B. Sephehry, E. Iranmanesh, M. P. Friedlander, P. Ronagh. Adiabatic Quantum Computing Conference (AQC), arXiv:1809.04091,
2019.We introduce a quantum algorithm for solving structured prediction problems with a runtime that scales with the square root of the size of the label space, but scales in \(\tilde{O}(\epsilon^{-3.5})\) with respect to the precision, \(\epsilon\), of the solution. In doing so, we analyze a stochastic gradient algorithm for convex optimization in the presence of an additive error in the calculation of the gradients, and show that its convergence rate does not deteriorate if the additive errors are of the order \(O(\sqrt{\epsilon})\). Our algorithm uses quantum Gibbs sampling at temperature \(\Omega(\epsilon)\) as a subroutine. Based on these theoretical observations, we propose a method for using quantum Gibbs samplers to combine feedforward neural networks with probabilistic graphical models for quantum machine learning. Numerical results using Monte Carlo simulations on an image tagging task demonstrate the benefit of the approach
@article{Sephehry2019Smooth, Author = {B. Sephehry and E. Iranmanesh and M. P. Friedlander and P. Ronagh}, Year = {2019}, Month = {January}, Journal = {arXiv (Cornell University)}, Title = {Smooth structured prediction using quantum and classical Gibbs samplers} }
Nonlinear Programming
- [abs]
[bib]
Implementing a smooth exact penalty function for equality-constrained nonlinear optimization
R. Estrin, M. P. Friedlander, D. Orban, M. A. Saunders. SIAM J. Sci. Comput., 42(3), A1809–A1835,
2020.
[Journal]
[DOI]
[more]We develop a general equality-constrained nonlinear optimization algorithm based on a smooth penalty function proposed by Fletcher in 1970. Although it was historically considered to be computationally prohibitive in practice, we demonstrate that the computational kernels required are no more expensive than other widely accepted methods for nonlinear optimization. The main kernel required to evaluate the penalty function and its derivatives is solving a structured linear system. We show how to solve this system efficiently by storing a single factorization at each iteration when the matrices are available explicitly. We further show how to adapt the penalty function to the class of factorization-free algorithms by solving the linear system iteratively. The penalty function therefore has promise when the linear system can be solved efficiently, e.g., for PDE-constrained optimization problems where efficient preconditioners exist. We discuss extensions including handling simple constraints explicitly, regularizing the penalty function, and inexact evaluation of the penalty function and its gradients. We demonstrate the merits of the approach and its various features on some nonlinear programs from a standard test set, and some PDE-constrained optimization problems.
@article{Estrin2020Implementing, Author = {R. Estrin and M. P. Friedlander and D. Orban and M. A. Saunders}, Year = {2020}, Month = {June}, Journal = {arXiv (Cornell University)}, Doi = {10.1137/19m1238265}, Title = {Implementing a smooth exact penalty function for equality-constrained nonlinear optimization} } - [abs]
[bib]
Implementing a smooth exact penalty function for general constrained nonlinear optimization
R. Estrin, M. P. Friedlander, D. Orban, M. A. Saunders. SIAM J. Sci. Comput., 42(3), A1836–A1859,
2020.
[Journal]
[DOI]
[more]We build upon Estrin et al. (2019) to develop a general constrained nonlinear optimization algorithm based on a smooth penalty function proposed by Fletcher (1970, 1973b). Although Fletcher’s approach has historically been considered impractical, we show that the computational kernels required are no more expensive than those in other widely accepted methods for nonlinear optimization. The main kernel for evaluating the penalty function and its derivatives solves structured linear systems. When the matrices are available explicitly, we store a single factorization each iteration. Otherwise, we obtain a factorization-free optimization algorithm by solving each linear system iteratively. The penalty function shows promise in cases where the linear systems can be solved efficiently, e.g., PDE-constrained optimization problems when efficient preconditioners exist. We demonstrate the merits of the approach, and give numerical results on several PDE-constrained and standard test problems.
@article{Estrin2020Implementing, Author = {R. Estrin and M. P. Friedlander and D. Orban and M. A. Saunders}, Year = {2020}, Month = {June}, Journal = {SIAM Journal on Scientific Computing}, Number = {3}, Volume = {42}, Pages = {A1836-A1859}, Doi = {10.1137/19m1255069}, Title = {Implementing a smooth exact penalty function for general constrained nonlinear optimization} } - [abs]
[bib]
A primal-dual regularized interior-point method for convex quadratic programs
M. P. Friedlander, D. Orban. Mathematical Programming Computation, 4(1):71–107,
2012.
[Preprint]
[Code]
[DOI]
[more]Interior-point methods in augmented form for linear and convex quadratic programming require the solution of a sequence of symmetric indefinite linear systems which are used to derive search directions. Safeguards are typically required in order to handle free variables or rank-deficient Jacobians. We propose a consistent framework and accompanying theoretical justification for regularizing these linear systems. Our approach can be interpreted as a simultaneous proximal-point regularization of the primal and dual problems. The regularization is termed exact to emphasize that, although the problems are regularized, the algorithm recovers a solution of the original problem, for appropriate values of the regularization parameters.
@article{Friedlander2012Primal, Author = {M. P. Friedlander and D. Orban}, Year = {2012}, Month = {February}, Journal = {Mathematical Programming Computation}, Number = {1}, Volume = {4}, Pages = {71-107}, Doi = {10.1007/s12532-012-0035-2}, Title = {A primal-dual regularized interior-point method for convex quadratic programs} } - [abs]
[bib]
Global and finite termination of a two-phase augmented Lagrangian filter method for general quadratic programs
M. P. Friedlander, S. Leyffer. SIAM J. Scientific Computing, 30(4):1706–1726,
2008.
[DOI]We present a two-phase algorithm for solving large-scale quadratic programs (QPs). In the first phase, gradient-projection iterations approximately minimize an augmented Lagrangian function and provide an estimate of the optimal active set. In the second phase, an equality-constrained QP defined by the current inactive variables is approximately minimized in order to generate a second-order search direction. A filter determines the required accuracy of the subproblem solutions and provides an acceptance criterion for the search directions. The resulting algorithm is globally and finitely convergent. The algorithm is suitable for large-scale problems with many degrees of freedom, and provides an alternative to interior-point methods when iterative methods must be used to solve the underlying linear systems. Numerical experiments on a subset of the CUTEr QP test problems demonstrate the effectiveness of the approach.
@article{Friedlander2008Global, Author = {M. P. Friedlander and S. Leyffer}, Year = {2008}, Month = {April}, Journal = {SIAM Journal on Scientific Computing}, Number = {4}, Volume = {30}, Pages = {1706-1729}, Doi = {10.1137/060669930}, Title = {Global and finite termination of a two-phase augmented Lagrangian filter method for general quadratic programs} } - [abs]
[bib]
A filter active-set trust-region method
M. P. Friedlander, N. I. M. Gould, S. Leyffer, T. S. Munson. Preprint ANL/MCS-P1456-0907, Argonne National Laboratory,
2007.
[Preprint]We propose a two-phase active-set method for nonlinearly constrained optimization. The first phase solves a regularized linear program (LP) that serves to estimate an optimal active set. The second phase uses this active-set estimate to determine an equality-constrained quadratic program which provides for fast local convergence. A filter promotes global convergence. The regularization term in the first phase bounds the LP solution and plays role similar to an explicit ellipsoid trust-region constraint. We prove that the resulting method is globally convergent, and that an optimal active set is identified near a solution. We discuss alternative regularization functions that incorporate curvature information into the active-set identification phase. Preliminary numerical experiments on a subset of the CUTEr test problems illustrate the effectiveness of the approach.
@article{Friedlander2007Filter, Author = {M. P. Friedlander and N. I. M. Gould and S. Leyffer and T. S. Munson}, Year = {2007}, Month = {September}, Title = {A filter active-set trust-region method} } - [abs]
[bib]
A two-sided relaxation scheme for mathematical programs with equilibrium constraints
V. Demiguel, M. P. Friedlander, F. J. Nogales, S. Scholtes. SIAM J. Optimization, 16(1):587–609,
2005.
[DOI]We propose a relaxation scheme for mathematical programs with equilibrium constraints (MPECs). In contrast to previous approaches, our relaxation is two-sided: both the complementarity and the nonnegativity constraints are relaxed. The proposed relaxation update rule guarantees (under certain conditions) that the sequence of relaxed subproblems will maintain a strictly feasible interior–even in the limit. We show how the relaxation scheme can be used in combination with a standard interior-point method to achieve superlinear convergence. Numerical results on the MacMPEC test problem set demonstrate the fast local convergence properties of the approach.
@article{Demiguel2005Two, Author = {V. Demiguel and M. P. Friedlander and F. J. Nogales and S. Scholtes}, Year = {2005}, Month = {December}, Journal = {SIAM Journal on Optimization}, Number = {2}, Volume = {16}, Pages = {587-609}, Doi = {10.1137/04060754x}, Title = {A two-sided relaxation scheme for mathematical programs with equilibrium constraints} } - [abs]
[bib]
A globally convergent linearly constrained Lagrangian method for nonlinear optimization
M. P. Friedlander, M. A. Saunders. SIAM J. Optimization 15(3):863–897,
2005.
[DOI]For optimization problems with nonlinear constraints, linearly constrained Lagrangian (LCL) methods solve a sequence of subproblems of the form "minimize an augmented Lagrangian function subject to linearized constraints." Such methods converge rapidly near a solution but may not be reliable from arbitrary starting points. Nevertheless, the well-known software package MINOS has proved effective on many large problems. Its success motivates us to derive a related LCL algorithm that possesses three important properties: it is globally convergent, the subproblem constraints are always feasible, and the subproblems may be solved inexactly. The new algorithm has been implemented in Matlab, with an option to use either MINOS or SNOPT (Fortran codes) to solve the linearly constrained subproblems. Only first derivatives are required. We present numerical results on a subset of the COPS, HS, and CUTE test problems, which include many large examples. The results demonstrate the robustness and efficiency of the stabilized LCL procedure.
@article{Friedlander2005Globally, Author = {M. P. Friedlander and M. A. Saunders}, Year = {2005}, Month = {April}, Journal = {SIAM Journal on Optimization}, Number = {3}, Volume = {15}, Pages = {863-897}, Doi = {10.1137/s1052623402419789}, Title = {A globally convergent linearly constrained Lagrangian method for nonlinear optimization} }
Selected Publications
- [abs]
[bib]
Polar deconvolution of mixed signals
Z. Fan, H. Jeong, B. Joshi, M. P. Friedlander. IEEE Trans. Signal Processing,
2022.
[Journal]
[GitHub]The signal demixing problem seeks to separate a superposition of multiple signals into its constituent components. This paper describes a two-stage approach that first decompresses and subsequently deconvolves the noisy and undersampled observations of the superposition. Probabilistic error bounds are given on the accuracy with which this process approximates the individual signals. The theory of polar convolution of convex sets and gauge functions plays a central role in the analysis and solution process. If the measurements are random and the noise is bounded, this approach stably recovers low-complexity and mutually incoherent signals, with high probability and with optimal sample complexity. We develop an efficient algorithm, based on level-set and conditional-gradient methods, that solves the convex optimization problems with sublinear iteration complexity and linear space requirements. Numerical experiments on both real and synthetic data confirm the theory and the efficiency of the approach.
@article{fan2022deconvolution, author = {Z. Fan and H. Jeong and B. Joshi and M. P. Friedlander}, title = {Polar deconvolution of mixed signal}, journal = {IEEE Transactions on Signal Processing}, volume = {70}, pages = {2713--2727}, year = {2022}, month = {May}, doi = {10.1109/TSP.2022.3178191} } - [abs]
[bib]
Online mirror descent and dual averaging: keeping pace in the dynamic case
H. Fang, N J. A. Harvey, V. S. Portella, M. P. Friedlander. J. Machine Learning Research,
2022.Online mirror descent (OMD) and dual averaging (DA) are two fundamental algorithms for online convex optimization. They are known to have very similar (or even identical) performance guarantees in most scenarios when a fixed learning rate is used. However, for dynamic learning rates OMD is provably inferior to DA. It is known that, with a dynamic learning rate, OMD can suffer linear regret, even in common settings such as prediction with expert advice. This hints that the relationship between OMD and DA is not fully understood at present. In this paper, we modify the OMD algorithm by a simple technique that we call stabilization. We give essentially the same abstract regret bound for stabilized OMD and DA by modifying the classical OMD convergence analysis in a careful and modular way, yielding proofs that we believe to be clean and flexible. Simple corollaries of these bounds show that OMD with stabilization and DA enjoy the same performance guarantees in many applications even under dynamic learning rates. We also shed some light on the similarities between OMD and DA and show simple conditions under which stabilized OMD and DA generate the same iterates.
@article{fang2022online, author = {Huang Fang and Nicholas J. A. Harvey and Victor S. Portella and Michael P. Friedlander}, title = {Online Mirror Descent and Dual Averaging: Keeping Pace in the Dynamic Case}, journal = {Journal of Machine Learning Research}, year = {2022}, volume = {23}, number = {121}, pages = {1-38}, url = {http://jmlr.org/papers/v23/21-1027.html} } - [abs]
[bib]
Atomic decomposition via polar alignment: the geometry of structured optimization
Z. Fan, H. Jeong, Y. Sun, M. P. Friedlander. Foundations and Trends in Optimization, 3(4):280–366,
2020.
[Journal]
[DOI]Structured optimization uses a prescribed set of atoms to assemble a solution that fits a model to data. Polarity, which extends the familiar notion of orthogonality from linear sets to general convex sets, plays a special role in a simple and geometric form of convex duality. This duality correspondence yields a general notion of alignment that leads to an intuitive and complete description of how atoms participate in the final decomposition of the solution. The resulting geometric perspective leads to variations of existing algorithms effective for large-scale problems. We illustrate these ideas with many examples, including applications in matrix completion and morphological component analysis for the separation of mixtures of signals.
@article{Fan2020Atomic, Author = {Z. Fan and H. Jeong and Y. Sun and M. P. Friedlander}, Year = {2020}, Month = {November}, Journal = {Foundations and Trends® in Optimization}, Number = {4}, Volume = {3}, Pages = {280-366}, Doi = {10.1561/2400000028}, Title = {Atomic decomposition via polar alignment: the geometry of structured optimization} } - [abs]
[bib]
Polar convolution
M. P. Friedlander, I. Macêdo, T.K. Pong. SIAM J. Optimization, 29(2):1366–1391,
2019.
[DOI]The Moreau envelope is one of the key convexity-preserving functional operations in convex analysis, and it is central to the development and analysis of many approaches for solving convex optimization problems. This paper develops the theory for a parallel convolution operation, called the polar envelope, specialized to gauge functions. We show that many important properties of the Moreau envelope and the proximal map are mirrored by the polar envelope and its corresponding proximal map. These properties include smoothness of the envelope function, uniqueness and continuity of the proximal map, a role in duality and in the construction of algorithms for gauge optimization. We thus establish a suite of tools with which to build algorithms for this family of optimization problems.
@article{Friedlander2019Polar, Author = {M. P. Friedlander and I. Macêdo and T.K. Pong}, Year = {2019}, Month = {February}, Journal = {SIAM Journal on Optimization}, Number = {2}, Volume = {29}, Pages = {1366-1391}, Doi = {10.1137/18m1209088}, Title = {Polar convolution} } - [abs]
[bib]
Gauge optimization and duality
M. P. Friedlander, I. Macêdo, T. K. Pong. SIAM J. Optimization, 24(4):1999–2022,
2014.Gauge functions significantly generalize the notion of a norm, and gauge optimization, as defined by Freund (1987), seeks the element of a convex set that is minimal with respect to a gauge function. This conceptually simple problem can be used to model a remarkable array of useful problems, including a special case of conic optimization, and related problems that arise in machine learning and signal processing. The gauge structure of these problems allows for a special kind of duality framework. This paper explores the duality framework proposed by Freund, and proposes a particular form of the problem that exposes some useful properties of the gauge optimization framework (such as the variational properties of its value function), and yet maintains most of the generality of the abstract form of gauge optimization.
@Article{FriedlanderMacedoPong:2014, author = {M. P. Friedlander and I. Mac\\^edo and T. K. Pong}, title = {Gauge optimization and duality}, year = 2014, journal = {SIAM Journal on Optimization}, number = 4, volume = 24, pages = {1999-2022}, doi = {10.1137/130940785} url = {http://epubs.siam.org/doi/abs/10.1137/130940785} } - [abs]
[bib]
Probing the Pareto frontier for basis pursuit solutions
E. van den Berg, M. P. Friedlander. SIAM J. Scientific Computing, 31(2):890–912,
2008.
[SPGL1 Software]
[DOI]
[more]The basis pursuit problem seeks a minimum one-norm solution of an underdetermined least-squares problem. Basis pursuit denoise (BPDN) fits the least-squares problem only approximately, and a single parameter determines a curve that traces the optimal trade-off between the least-squares fit and the one-norm of the solution. We prove that this curve is convex and continuously differentiable over all points of interest, and show that it gives an explicit relationship to two other optimization problems closely related to BPDN. We describe a root-finding algorithm for finding arbitrary points on this curve; the algorithm is suitable for problems that are large scale and for those that are in the complex domain. At each iteration, a spectral gradient-projection method approximately minimizes a least-squares problem with an explicit one-norm constraint. Only matrix-vector operations are required. The primal-dual solution of this problem gives function and derivative information needed for the root-finding method. Numerical experiments on a comprehensive set of test problems demonstrate that the method scales well to large problems.
@article{Berg2008Probing, Author = {E. van den Berg and M. P. Friedlander}, Year = {2008}, Month = {November}, Journal = {SIAM Journal on Scientific Computing}, Number = {2}, Volume = {31}, Pages = {890-912}, Doi = {10.1137/080714488}, Title = {Probing the Pareto frontier for basis pursuit solutions} } - [abs]
[bib]
Exact regularization of convex programs
M. P. Friedlander, P. Tseng. SIAM J. Optimization, 18(4):1326–1350,
2007.
[Data & Code]
[DOI]
[more]The regularization of a convex program is exact if all solutions of the regularized problem are also solutions of the original problem for all values of the regularization parameter below some positive threshold. For a general convex program, we show that the regularization is exact if and only if a certain selection problem has a Lagrange multiplier. Moreover, the regularization parameter threshold is inversely related to the Lagrange multiplier. We use this result to generalize an exact regularization result of Ferris and Mangasarian [Appl. Math. Optim., 23 (1991), pp. 266-273] involving a linearized selection problem. We also use it to derive necessary and sufficient conditions for exact penalization, similar to those obtained by Bertsekas [Math. Programming, 9 (1975), pp. 87-99] and by Bertsekas, Nedić, and Ozdaglar [Convex Analysis and Optimization, Athena Scientific, Belmont, MA, 2003]. When the regularization is not exact, we derive error bounds on the distance from the regularized solution to the original solution set. We also show that existence of a "weak sharp minimum" is in some sense close to being necessary for exact regularization. We illustrate the main result with numerical experiments on the \(\ell_1\) regularization of benchmark (degenerate) linear programs and semidefinite/second-order cone programs. The experiments demonstrate the usefulness of \(\ell_1\) regularization in finding sparse solutions.
@article{Friedlander2007Exact, Author = {M. P. Friedlander and P. Tseng}, Year = {2007}, Month = {November}, Journal = {SIAM Journal on Optimization}, Number = {4}, Volume = {18}, Pages = {1326-1350}, Doi = {10.1137/060675320}, Title = {Exact regularization of convex programs} } - [abs]
[bib]
A globally convergent linearly constrained Lagrangian method for nonlinear optimization
M. P. Friedlander, M. A. Saunders. SIAM J. Optimization 15(3):863–897,
2005.
[DOI]For optimization problems with nonlinear constraints, linearly constrained Lagrangian (LCL) methods solve a sequence of subproblems of the form "minimize an augmented Lagrangian function subject to linearized constraints." Such methods converge rapidly near a solution but may not be reliable from arbitrary starting points. Nevertheless, the well-known software package MINOS has proved effective on many large problems. Its success motivates us to derive a related LCL algorithm that possesses three important properties: it is globally convergent, the subproblem constraints are always feasible, and the subproblems may be solved inexactly. The new algorithm has been implemented in Matlab, with an option to use either MINOS or SNOPT (Fortran codes) to solve the linearly constrained subproblems. Only first derivatives are required. We present numerical results on a subset of the COPS, HS, and CUTE test problems, which include many large examples. The results demonstrate the robustness and efficiency of the stabilized LCL procedure.
@article{Friedlander2005Globally, Author = {M. P. Friedlander and M. A. Saunders}, Year = {2005}, Month = {April}, Journal = {SIAM Journal on Optimization}, Number = {3}, Volume = {15}, Pages = {863-897}, Doi = {10.1137/s1052623402419789}, Title = {A globally convergent linearly constrained Lagrangian method for nonlinear optimization} }
2025
- [abs]
[bib]
Average-case thresholds for exact regularization of linear programs
M. P. Friedlander, S. Kubal, Y. Plan, M. S. Scott. arXiv:2510.13083,
2025.Small regularizers can preserve linear programming solutions exactly. This paper provides the first average-case analysis of exact regularization: with a standard Gaussian cost vector and fixed constraint set, bounds are established for the probability that exact regularization succeeds as a function of regularization strength. Failure is characterized via the Gaussian measure of inner cones, controlled by novel two-sided bounds on the measure of shifted cones. Results reveal dimension-dependent scaling laws and connect exact regularization of linear programs to their polyhedral geometry via the normal fan and the Gaussian (solid-angle) measure of its cones. Computable bounds are obtained in several canonical settings, including regularized optimal transport. Numerical experiments corroborate the predicted scalings and thresholds.
@article{friedlander2025average, title={Average-case thresholds for exact regularization of linear programs}, author={Friedlander, Michael P and Kubal, Sharvaj and Plan, Yaniv and Scott, Matthew S}, journal={arXiv preprint arXiv:2510.13083}, year={2025} } - [abs]
[bib]
Tracking Cu-fertile sediment sources via multivariate petrochronological mixture modeling of detrital zircons
J. E. Saylor, N. Richardson, N. Graham, R. G. Lee, M. P. Friedlander. J. Geophys. Res.: Earth Surface, 130(10):e2025JF008406,
2025.Whereas the ability to acquire petrochronological data from detrital minerals has exploded, development of tools to analyze and interpret the multivariate data sets has not kept pace. Herein, we present a case study, which applies the recently developed non-negative Tucker-1 decomposition (NNT1) method to a multivariate detrital zircon data set from till samples collected above the Cu-bearing Guichon Creek Batholith in southern British Columbia, Canada. Zircon composition variables that we consider include age, Ce anomaly, CeN/NdN, DyN/YbN, ΔFMQ, Eu anomaly, ΣHREE/ΣMREE, Hf, Th/U, Ti temperature, and YbN/GdN. The NNT1 approach successfully deconvolves the multivariate data set into two endmembers, which are consistent with derivation either from non-oxidized and relatively anhydrous (i.e., low Cu-ore potential, Source 1) or oxidized and hydrous (i.e., potential Cu-ore bodies, Source 2) igneous rocks. Furthermore, we attribute each of the zircon grains to either the Source 1 or 2 endmember based on maximization of the likelihood that their measured multivariate geochemistry was drawn from one or the other of the learned multivariate endmembers. Finally, we demonstrate that the proportions of the Source 2 endmember decrease with increasing distance from the ore bodies, as expected due to down-ice or off-axis zircon mixing and dilution. We conclude that the NNT1 approach provides insight into geologically meaningful sediment transport processes and multivariate sediment sources even when those sources are unknown. It thus provides a basis for future petrochronological interpretations with applied and pure geoscience applications.
@article{saylor2025tracking, title={Tracking Cu-fertile sediment sources via multivariate petrochronological mixture modeling of detrital zircons}, author={Saylor, Joel E and Richardson, Nicholas and Graham, Naomi and Lee, Robert G and Friedlander, Michael P}, journal={Journal of Geophysical Research: Earth Surface}, volume={130}, number={10}, pages={e2025JF008406}, year={2025}, publisher={Wiley Online Library} } - [abs]
[bib]
Decentralized Optimization with Topology-Independent Communication
Y. Lin, Y. Kuang, A. Alacaoglu, M. P. Friedlander. arXiv:2509.14488,
2025.Distributed optimization requires nodes to coordinate, yet full synchronization scales poorly. When $n$ nodes collaborate through $m$ pairwise regularizers, standard methods demand $\mathcal{O}(m)$ communications per iteration. This paper proposes randomized local coordination: each node independently samples one regularizer uniformly and coordinates only with nodes sharing that term. This exploits partial separability, where each regularizer $G_j$ depends on a subset $S_j \subseteq \{1,\ldots,n\}$ of nodes. For graph-guided regularizers where $|S_j|=2$, expected communication drops to exactly 2 messages per iteration. This method achieves $\tilde{\mathcal{O}}(\varepsilon^{-2})$ iterations for convex objectives and under strong convexity, $\mathcal{O}(\varepsilon^{-1})$ to an $\varepsilon$-solution and $\mathcal{O}(\log(1/\varepsilon))$ to a neighborhood. Replacing the proximal map of the sum $\sum_j G_j$ with the proximal map of a single randomly selected regularizer $G_j$ preserves convergence while eliminating global coordination. Experiments validate both convergence rates and communication efficiency across synthetic and real-world datasets.
@misc{LinKuangAlacaogluFriedlander2025, author = {Lin, Ying and Kuang, Yao and Alacaoglu, Ahmet and Friedlander, Michael P.}, title = {Decentralized Optimization with Topology-Independent Communication}, year = {2025}, eprint = {2509.14488}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, doi = {10.48550/arXiv.2509.14488} } - [abs]
[bib]
Estimates of the dynamic structure factor for the finite temperature electron liquid via analytic continuation of path integral Monte Carlo data
T. Chuna, N. Barnfield, J. Vorberger, M. P. Friedlander, T. Hoheisel, T. Dornheim. Phys. Rev. B, 112(12):125112,
2025.Understanding the dynamic properties of uniform electron gas (UEG) is important for numerous applications ranging from semiconductor physics to exotic warm dense matter. In this work, we apply the maximum entropy method (MEM), as implemented by Thomas Chuna et al., J. Phys. A Math. Theor. 58, 335203 (2025), to ab initio path integral Monte Carlo (PIMC) results for the imaginary-time correlation function 𝐹(𝑞,𝜏) to estimate the dynamic structure factor 𝑆(𝑞,𝜔) over an unprecedented range of densities at the electronic Fermi temperature. To conduct the MEM, we propose to construct the Bayesian prior 𝜇 from the PIMC data. Constructing the static approximation leads to a drastic improvement in 𝑆(𝑞,𝜔) estimate over using the simpler random phase approximation (RPA) as the Bayesian prior. We present results for the strongly coupled electron liquid regime with 𝑟𝑠=50,⋯,200, which reveal a pronounced roton-type feature and an incipient double peak structure in 𝑆(𝑞,𝜔) for intermediate wave numbers at 𝑟𝑠=200. We also find that our dynamic structure factors satisfy known sum rules, even though these sum rules are not enforced explicitly. To verify our results, we show that our MEM estimates converge to the RPA limit at higher densities and weaker coupling 𝑟𝑠=2 and 5. Further, we compare with two different existing results at intermediate density and coupling strength 𝑟𝑠=10 and 20, and we find good agreement with more conservative estimates. Combining all of our results for 𝑟𝑠=2,5,10,20,50,100,200, we present estimates of a dispersion relation that show a continuous deepening of its minimum value at higher coupling. An advantage of our setup is that it is not specific to the UEG, thereby opening up new avenues to study the dynamics of real warm dense matter systems based on cutting-edge PIMC simulations in future works.
@article{Chuna2025Estimates, Author = {T. Chuna and N. Barnfield and J. Vorberger and M. P. Friedlander and T. Hoheisel and T. Dornheim}, Year = {2025}, Month = {September}, Journal = {Physical review. B./Physical review. B}, Number = {12}, Volume = {112}, Doi = {10.1103/4d4b-kgtk}, Title = {Estimates of the dynamic structure factor for the finite temperature electron liquid via analytic continuation of path integral Monte Carlo data} } - [abs]
[bib]
Dual formulation of the maximum entropy method applied to analytic continuation of quantum Monte Carlo data
T. Chuna, N. Barnfield, T. Dornheim, M. P. Friedlander, T. Hoheisel. J. Phys. A: Math. Theor., 58(33):335203,
2025.
[GitHub]
[DOI]Many fields of physics use quantum Monte Carlo techniques, but struggle to estimate dynamic spectra via the analytic continuation of imaginary-time quantum Monte Carlo data. One of the most ubiquitous approaches to analytic continuation is the maximum entropy method (MEM). We supply a dual Newton optimization algorithm to be used within the MEM and provide analytic bounds for the algorithm’s error. The optimization algorithm is freely available on github. The MEM is typically used with Bryan’s controversial algorithm (Rothkopf 2020 Data 5 55). We present new theoretical issues that are not yet in the literature. Our algorithm has all the theoretical benefits of Bryan’s algorithm without these theoretical issues the implementation of the dual Newton optimizer within the MEM is freely available on github. We compare the MEM with Bryan’s optimization to the MEM with our dual Newton optimization on test problems from lattice quantum chromodynamics and plasma physics. These comparisons show that in the presence of noise the dual Newton algorithm produces better estimates and error bars; this indicates the limits of Bryan’s algorithm’s applicability. We use the MEM to investigate authentic quantum Monte Carlo data for the uniform electron gas at warm dense matter conditions and further substantiate the roton-type feature in the dispersion relation.
@misc{Chuna2025Dual, Author = {T. Chuna and N. Barnfield and T. Dornheim and M. P. Friedlander and T. Hoheisel}, Year = {2025}, Month = {August}, Doi = {10.2139/ssrn.5098440}, Title = {Dual formulation of the maximum entropy method applied to analytic continuation of quantum Monte Carlo data} } - [abs]
[bib]
Tracing Sedimentary Origins in Multivariate Geochronology via Constrained Tensor Factorization
N. Graham, N. Richardson, M. P. Friedlander, J. Saylor. Mathematical Geosciences, 57(4):601-628,
2025.
[DOI]We devise a novel statistical method for deconvolving multivariate geochronology and geochemistry datasets to characterize sediment sources. The approach employs a third-order constrained Tucker-1 tensor decomposition to estimate the probability distributions of multiple features in sediment samples. By integrating kernel density estimation with matrix-tensor factorization, the model quantitatively determines the distributions and mixing proportions of sediment sources. The methodology introduces a numerical test for rank estimation to define the number of latent sources, and uses a maximum-likelihood approach to correlate individual detrital grains to these latent sources based on an arbitrary number of features. The method’s efficacy is validated through a numerical experiment with detrital zircon data that captures natural variability associated with temporal changes in crustal thickness in the Andes. The findings hold potential implications for resolving sediment sources, determining sediment mixing, enhancing the understanding of sediment transport processes, characterizing the lithology, tectonic motion, or metallogenic potential of sediment sources. This method is adaptable to other data de-mixing problems and is available in a publicly released software package.
@article{Graham2025Tracing, Author = {N. Graham and N. Richardson and M. P. Friedlander and J. Saylor}, Year = {2025}, Month = {May}, Journal = {Mathematical Geosciences}, Number = {4}, Volume = {57}, Pages = {601-628}, Doi = {10.1007/s11004-024-10175-0}, Title = {Tracing Sedimentary Origins in Multivariate Geochronology via Constrained Tensor Factorization} } - [abs]
[bib]
Endmember modelling of detrital zircon petrochronology data via multivariate Tucker-1 tensor decomposition
J. E. Saylor, N. Richardson, N. Graham, R. G. Lee, M. P. Friedlander. EGU General Assembly 2025,
2025.
[EGU Abstract]
[Web App]
[GitHub]
Detrital petrochronology is a powerful method of characterizing sediment and potentially sediment sources. The recently developed Tucker-1 decomposition method holds promise of using detrital petrochronology to identify both sediment-source characteristics and the proportions in which sources are present in sink samples even when sediment sources are unknown or unavailable for sampling. However, the correlation between endmember characteristics and lithological sources or proportions and sedimentary processes has not been established. Herein we present a case study of the recently developed Tucker-1 decomposition method to a multivariate geochemical data set from detrital zircons in till samples collected above the Cu-bearing Guichon Creek Batholith (GCB) in southern British Columbia, Canada. Data include a suite of eleven variables, including age, Ce anomaly, CeN/NdN, DyN/YbN, ΔFMQ, Eu anomaly, ΣHREE/ΣMREE, Hf, Th/U, Ti temperature, and YbN/GdN, from 12 samples from collected at a range of distances in the down ice-flow direction from the GCB.
We demonstrate that endmember modelling using the Tucker-1 decomposition method successfully deconvolves the multivariate data sets into two endmembers in which the geochemical distributions are consistent with derivation from either non-oxidized and relatively anhydrous (i.e., low ore potential, Source 1) or oxidized and hydrous (i.e., potential ore bodies, Source 2) igneous rocks. Furthermore, we demonstrate that the proportions of the Source 2 endmember decrease with increasing distance from the ore bodies, as expected due to downstream zircon mixing and dilution.
Finally, we attribute each of the zircon grains to either the Source 1 or 2 endmember based on maximization of the likelihood that their measured multivariate geochemistry was drawn from one or the other of the learned multivariate endmembers. We compared these grain attributions to the results of an independent Classification and Regression Tree (CART) analysis designed to characterize zircon grains as either “fertile” or “barren” with respect to copper based on their geochemistry. We find that there is ~80% overlap between the source attributions based on the CART analysis and the grain-source identification based on the Tucker-1 decomposition.
We conclude that the novel Tucker-1 decomposition approach provides a flexible, precise, and accurate method of characterizing multivariate sediment sources even when those sources are unknown. It thus provides a basis for future petrochronological interpretations with applied and pure geoscience applications. All of the analyses presented herein can be freely accessed through a web application (https://dzgrainalyzer.eoas.ubc.ca/) or open-source Julia code (https://github.com/MPF-Optimization-Laboratory/MatrixTensorFactor.jl).
@inproceedings{saylor2025endmember, title={Endmember modelling of detrital zircon petrochronology data via multivariate Tucker-1 tensor decomposition}, author={Saylor, Joel E and Richardson, Nicholas and Graham, Naomi and Lee, Robert G and Friedlander, Michael P}, booktitle={EGU General Assembly Conference Abstracts}, pages={EGU25--4081}, year={2025} } - [abs]
[bib]
Scalable Data-Driven Basis Selection for Linear Machine Learning Interatomic Potentials
T. Torabi, M. Militzer, M. P. Friedlander, C. Ortner. arXiv:2504.16418,
2025.
[arXiv]Machine learning interatomic potentials (MLIPs) pro- vide an effective approach for accurately and efficiently modeling atomic interactions, expanding the capabilities of atomistic sim- ulations to complex systems. However, a priori feature selection leads to high complexity, which can be detrimental to both computational cost and generalization, resulting in a need for hyperparameter tuning. We demonstrate the benefits of active set algorithms for automated data-driven feature selection. The proposed methods are implemented within the Atomic Cluster Expansion (ACE) framework. Computational tests conducted on a variety of benchmark datasets indicate that sparse ACE models consistently enhance computational efficiency, generalization accuracy and interpretability over dense ACE models. An added benefit of the proposed algorithms is that they produce entire paths of models with varying cost/accuracy ratio.
@article{torabi2025scalable, title={Scalable Data-Driven Basis Selection for Linear Machine Learning Interatomic Potentials}, author={Torabi, Tina and Militzer, Matthias and Friedlander, Michael P and Ortner, Christoph}, journal={arXiv preprint arXiv:2504.16418}, year={2025} }
2023
- [abs]
[bib]
Cardinality-constrained structured data-fitting problems
Z. Fan, H. Fang, M. P. Friedlander. Open J. Math. Optim.,
2023.
[Journal]
[DOI]A memory-efficient solution framework is proposed for the cardinality-constrained structured data-fitting problem. Dual-based atom-identification rules reveal the structure of the optimal primal solution from near-optimal dual solutions, which allows for a simple and computationally efficient algorithm that translates any feasible dual solution into a primal solution satisfying the cardinality constraint. Rigorous guarantees bound the quality of a near-optimal primal solution given any dual-based method that generates dual iterates converging to an optimal dual solution. Numerical experiments on real-world datasets support the analysis and demonstrate the efficiency of the proposed approach.
@misc{Fan2023Cardinality, Author = {Z. Fan and H. Fang and M. P. Friedlander}, Year = {2023}, Month = {July}, Doi = {10.48550/arxiv.2107.11373}, Title = {Cardinality-constrained structured data-fitting problems} } - [abs]
[bib]
From perspective maps to epigraphical projection
M. P. Friedlander, A. Goodwin, T. Hoheisel. Math. Oper. Res. 48(3): 1711-1740, 2023,
2023.
[Journal]The projection onto the epigraph or a level set of a closed proper convex function can be achieved by finding a root of a scalar equation that involves the proximal operator as a function of the proximal parameter. This paper develops the variational analysis of this scalar equation. The approach is based on a study of the variational-analytic properties of general convex optimization problems that are (partial) infimal projections of the the sum of the function in question and the perspective map of a convex kernel. When the kernel is the Euclidean norm squared, the solution map corresponds to the proximal map, and thus the variational properties derived for the general case apply to the proximal case. Properties of the value function and the corresponding solution map—including local Lipschitz continuity, directional differentiability, and semismoothness—are derived. An SC 1 optimization framework for computing epigraphical and level-set projections is thus established. Numerical experiments on 1-norm projection illustrate the effectiveness of the approach as compared with specialized algorithms.
@article{FriedlanderGoodwinHoheisel2023, author = {Friedlander, Michael P. and Goodwin, Ariel and Hoheisel, Tim}, title = {From Perspective Maps to Epigraphical Projections}, journal = {Mathematics of Operations Research}, volume = {48}, number = {3}, pages = {1711--1740}, year = {2023}, doi = {10.1287/moor.2022.1317} }
2022
- [abs]
[bib]
Knowledge-injected federated learning
Z. Fan, H. Fang, Z. Zhou, J. Pei, M. P. Friedlander, J. Hu, C. Li, Y. Zhang. arXiv:2208.07530,
2022.
[arXiv]
[GitHub]
[DOI]Federated learning is an emerging technique for training models from decentralized data sets. In many applications, data owners participating in the federated learning system hold not only the data but also a set of domain knowledge. Such knowledge includes human know-how and craftsmanship that can be extremely helpful to the federated learning task. In this work, we propose a federated learning framework that allows the injection of participants’ domain knowledge, where the key idea is to refine the global model with knowledge locally. The scenario we consider is motivated by a real industry-level application, and we demonstrate the effectiveness of our approach to this application.
@misc{Fan2022Knowledge, Author = {Z. Fan and H. Fang and Z. Zhou and J. Pei and M. P. Friedlander and J. Hu and C. Li and Y. Zhang}, Year = {2022}, Month = {August}, Doi = {10.48550/arxiv.2208.07530}, Title = {Knowledge-injected federated learning} } - [abs]
[bib]
Quantum algorithms for structured prediction
B. Sephehry, E. Iranmanesh, M. P. Friedlander, P. Ronagh. Quantum Machine Intelligence,
2022.
[Journal]
[DOI]We introduce two quantum algorithms for solving structured prediction problems. We first show that a stochastic gradient descent that uses the quantum minimum finding algorithm and takes its probabilistic failure into account solves the structured prediction problem with a runtime that scales with the square root of the size of the label space, and in O(1/𝜖) with respect to the precision, 𝜖, of the solution. Motivated by robust inference techniques in machine learning, we then introduce another quantum algorithm that solves a smooth approximation of the structured prediction problem with a similar quantum speedup in the size of the label space and a similar scaling in the precision parameter. In doing so, we analyze a variant of stochastic gradient descent for convex optimization in the presence of an additive error in the calculation of the gradients, and show that its convergence rate does not deteriorate if the additive errors are of the order 𝑂(√ϵ). This algorithm uses quantum Gibbs sampling at temperature Ω(𝜖) as a subroutine. Based on these theoretical observations, we propose a method for using quantum Gibbs samplers to combine feedforward neural networks with probabilistic graphical models for quantum machine learning. Our numerical results using Monte Carlo simulations on an image tagging task demonstrate the benefit of the approach.
@article{Sephehry2022Quantum, Author = {B. Sephehry and E. Iranmanesh and M. P. Friedlander and P. Ronagh}, Year = {2022}, Month = {June}, Journal = {Quantum Machine Intelligence}, Number = {2}, Volume = {4}, Doi = {10.1007/s42484-022-00078-w}, Title = {Quantum algorithms for structured prediction} } - [abs]
[bib]
Polar deconvolution of mixed signals
Z. Fan, H. Jeong, B. Joshi, M. P. Friedlander. IEEE Trans. Signal Processing,
2022.
[Journal]
[GitHub]The signal demixing problem seeks to separate a superposition of multiple signals into its constituent components. This paper describes a two-stage approach that first decompresses and subsequently deconvolves the noisy and undersampled observations of the superposition. Probabilistic error bounds are given on the accuracy with which this process approximates the individual signals. The theory of polar convolution of convex sets and gauge functions plays a central role in the analysis and solution process. If the measurements are random and the noise is bounded, this approach stably recovers low-complexity and mutually incoherent signals, with high probability and with optimal sample complexity. We develop an efficient algorithm, based on level-set and conditional-gradient methods, that solves the convex optimization problems with sublinear iteration complexity and linear space requirements. Numerical experiments on both real and synthetic data confirm the theory and the efficiency of the approach.
@article{fan2022deconvolution, author = {Z. Fan and H. Jeong and B. Joshi and M. P. Friedlander}, title = {Polar deconvolution of mixed signal}, journal = {IEEE Transactions on Signal Processing}, volume = {70}, pages = {2713--2727}, year = {2022}, month = {May}, doi = {10.1109/TSP.2022.3178191} } - [abs]
[bib]
Online mirror descent and dual averaging: keeping pace in the dynamic case
H. Fang, N J. A. Harvey, V. S. Portella, M. P. Friedlander. J. Machine Learning Research,
2022.Online mirror descent (OMD) and dual averaging (DA) are two fundamental algorithms for online convex optimization. They are known to have very similar (or even identical) performance guarantees in most scenarios when a fixed learning rate is used. However, for dynamic learning rates OMD is provably inferior to DA. It is known that, with a dynamic learning rate, OMD can suffer linear regret, even in common settings such as prediction with expert advice. This hints that the relationship between OMD and DA is not fully understood at present. In this paper, we modify the OMD algorithm by a simple technique that we call stabilization. We give essentially the same abstract regret bound for stabilized OMD and DA by modifying the classical OMD convergence analysis in a careful and modular way, yielding proofs that we believe to be clean and flexible. Simple corollaries of these bounds show that OMD with stabilization and DA enjoy the same performance guarantees in many applications even under dynamic learning rates. We also shed some light on the similarities between OMD and DA and show simple conditions under which stabilized OMD and DA generate the same iterates.
@article{fang2022online, author = {Huang Fang and Nicholas J. A. Harvey and Victor S. Portella and Michael P. Friedlander}, title = {Online Mirror Descent and Dual Averaging: Keeping Pace in the Dynamic Case}, journal = {Journal of Machine Learning Research}, year = {2022}, volume = {23}, number = {121}, pages = {1-38}, url = {http://jmlr.org/papers/v23/21-1027.html} } - [abs]
[bib]
A dual approach for federated learning
Z. Fan, H. Fang, M. P. Friedlander. arXiv 2201.11183,
2022.
[arXiv]
[GitHub]
[DOI]We study the federated optimization problem from a dual perspective and propose a new algorithm termed federated dual coordinate descent (FedDCD), which is based on a type of coordinate descent method developed by Necora et al. [Journal of Optimization Theory and Applications, 2017]. Additionally, we enhance the FedDCD method with inexact gradient oracles and Nesterov’s acceleration. We demonstrate theoretically that our proposed approach achieves better convergence rates than the state-of-the-art primal federated optimization algorithms under mild conditions. Numerical experiments on real-world datasets support our analysis.
@misc{Fan2022Dual, Author = {Z. Fan and H. Fang and M. P. Friedlander}, Year = {2022}, Month = {January}, Doi = {10.48550/arxiv.2201.11183}, Title = {A dual approach for federated learning} } - [abs]
[bib]
Improving fairness for data valuation in horizontal federated learning
Z. Fan, H. Fang, Z. Zhou, J. Pei, M. P. Friedlander, C. Liu, Y. Zhang. Intern Conf Data Engineering (ICDE),
2022.
[IEEE]
[arXiv]
[DOI]Federated learning is an emerging decentralized machine learning scheme that allows multiple data owners to work collaboratively while ensuring data privacy. The success of federated learning depends largely on the participation of data owners. To sustain and encourage data owners’ participation, it is crucial to fairly evaluate the quality of the data provided by the data owners and reward them correspondingly. Federated Shapley value, recently proposed by Wang et al. [Federated Learning, 2020], is a measure for data value under the framework of federated learning that satisfies many desired properties for data valuation. However, there are still factors of potential unfairness in the design of federated Shapley value because two data owners with the same local data may not receive the same evaluation. We propose a new measure called completed federated Shapley value to improve the fairness of federated Shapley value. The design depends on completing a matrix consisting of all the possible contributions by different subsets of the data owners. It is shown under mild conditions that this matrix is approximately low-rank by leveraging concepts and tools from optimization. Both theoretical analysis and empirical evaluation verify that the proposed measure does improve fairness in many circumstances.
@article{Fan2022Improving, Author = {Z. Fan and H. Fang and Z. Zhou and J. Pei and M. P. Friedlander and C. Liu and Y. Zhang}, Year = {2022}, Month = {January}, Journal = {2022 IEEE 38th International Conference on Data Engineering (ICDE)}, Pages = {2440-2453}, Doi = {10.1109/icde53745.2022.00228}, Title = {Improving fairness for data valuation in horizontal federated learning} } - [abs]
[bib]
Fair and efficient contribution valuation for vertical federated learning
Z. Fan, H. Fang, Z. Zhou, J. Pei, M. P. Friedlander, Y. Zhang. arXiv:2201.02658,
2022.
[arXiv]
[DOI]Federated learning is a popular technology for training machine learning models on distributed data sources without sharing data. Vertical federated learning or feature-based federated learning applies to the cases that different data sources share the same sample ID space but differ in feature space. To ensure the data owners’ long-term engagement, it is critical to objectively assess the contribution from each data source and recompense them accordingly. The Shapley value (SV) is a provably fair contribution valuation metric originated from cooperative game theory. However, computing the SV requires extensively retraining the model on each subset of data sources, which causes prohibitively high communication costs in federated learning. We propose a contribution valuation metric called vertical federated Shapley value (VerFedSV) based on SV. We show that VerFedSV not only satisfies many desirable properties for fairness but is also efficient to compute, and can be adapted to both synchronous and asynchronous vertical federated learning algorithms. Both theoretical analysis and extensive experimental results verify the fairness, efficiency, and adaptability of VerFedSV.
@misc{Fan2022Fair, Author = {Z. Fan and H. Fang and Z. Zhou and J. Pei and M. P. Friedlander and Y. Zhang}, Year = {2022}, Month = {January}, Doi = {10.48550/arxiv.2201.02658}, Title = {Fair and efficient contribution valuation for vertical federated learning} }
2021
- [abs]
[bib]
NBIHT: An efficient algorithm for 1-bit compressed sensing with optimal error decay rate
M. P. Friedlander, H. Jeong, Y. Plan, O. Yιlmaz. IEEE Trans Info Theory,
2021.
[DOI]The Binary Iterative Hard Thresholding (BIHT) algorithm is a popular reconstruction method for one-bit compressed sensing due to its simplicity and fast empirical convergence. There have been several works about BIHT but a theoretical understanding of the corresponding approximation error and convergence rate still remains open. This paper shows that the normalized version of BIHT (NBHIT) achieves an approximation error rate optimal up to logarithmic factors. More precisely, using m one-bit measurements of an s-sparse vector x, we prove that the approximation error of NBIHT is of order Ο(1/m) up to logarithmic factors, which matches the information-theoretic lower bound Ω(1/m) proved by Jacques, Laska, Boufounos, and Baraniuk in 2013. To our knowledge, this is the first theoretical analysis of a BIHT-type algorithm that explains the optimal rate of error decay empirically observed in the literature. This also makes NBIHT the first provable computationally-efficient one-bit compressed sensing algorithm that breaks the inverse square root error decay rate Ο(1/√m).
@misc{Friedlander2021Nbiht, Author = {M. P. Friedlander and H. Jeong and Y. Plan and O. Yιlmaz}, Year = {2021}, Month = {November}, Number = {2}, Volume = {68}, Pages = {1157-1177}, Doi = {10.1109/tit.2021.3124598}, Title = {NBIHT: An efficient algorithm for 1-bit compressed sensing with optimal error decay rate} } - [abs]
Fast convergence of the stochastic subgradient method under interpolation
H. Fang, Z. Fan, M. P. Friedlander. Intern. Conf. Learning Representations (ICLR),
2021.This paper studies the behaviour of the stochastic subgradient descent (SSGD) method applied to over-parameterized nonsmooth optimization problems that satisfy an interpolation condition. By leveraging the composite structure of the empirical risk minimization problems, we prove that when interpolation holds, SSGD converges at the same rate as the stochastic gradient descent (SGD) method applied to smooth problems. Our analysis provides a partial explanation for the empirical observation that sometimes SGD and SSGD behave similarly for training smooth and nonsmooth machine learning models. We also prove that the rates we derive are optimal for the subgradient method in the convex and interpolation setting.
2020
- [abs]
[bib]
Atomic decomposition via polar alignment: the geometry of structured optimization
Z. Fan, H. Jeong, Y. Sun, M. P. Friedlander. Foundations and Trends in Optimization, 3(4):280–366,
2020.
[Journal]
[DOI]Structured optimization uses a prescribed set of atoms to assemble a solution that fits a model to data. Polarity, which extends the familiar notion of orthogonality from linear sets to general convex sets, plays a special role in a simple and geometric form of convex duality. This duality correspondence yields a general notion of alignment that leads to an intuitive and complete description of how atoms participate in the final decomposition of the solution. The resulting geometric perspective leads to variations of existing algorithms effective for large-scale problems. We illustrate these ideas with many examples, including applications in matrix completion and morphological component analysis for the separation of mixtures of signals.
@article{Fan2020Atomic, Author = {Z. Fan and H. Jeong and Y. Sun and M. P. Friedlander}, Year = {2020}, Month = {November}, Journal = {Foundations and Trends® in Optimization}, Number = {4}, Volume = {3}, Pages = {280-366}, Doi = {10.1561/2400000028}, Title = {Atomic decomposition via polar alignment: the geometry of structured optimization} } - [abs]
[bib]
Implementing a smooth exact penalty function for equality-constrained nonlinear optimization
R. Estrin, M. P. Friedlander, D. Orban, M. A. Saunders. SIAM J. Sci. Comput., 42(3), A1809–A1835,
2020.
[Journal]
[DOI]
[more]We develop a general equality-constrained nonlinear optimization algorithm based on a smooth penalty function proposed by Fletcher in 1970. Although it was historically considered to be computationally prohibitive in practice, we demonstrate that the computational kernels required are no more expensive than other widely accepted methods for nonlinear optimization. The main kernel required to evaluate the penalty function and its derivatives is solving a structured linear system. We show how to solve this system efficiently by storing a single factorization at each iteration when the matrices are available explicitly. We further show how to adapt the penalty function to the class of factorization-free algorithms by solving the linear system iteratively. The penalty function therefore has promise when the linear system can be solved efficiently, e.g., for PDE-constrained optimization problems where efficient preconditioners exist. We discuss extensions including handling simple constraints explicitly, regularizing the penalty function, and inexact evaluation of the penalty function and its gradients. We demonstrate the merits of the approach and its various features on some nonlinear programs from a standard test set, and some PDE-constrained optimization problems.
@article{Estrin2020Implementing, Author = {R. Estrin and M. P. Friedlander and D. Orban and M. A. Saunders}, Year = {2020}, Month = {June}, Journal = {arXiv (Cornell University)}, Doi = {10.1137/19m1238265}, Title = {Implementing a smooth exact penalty function for equality-constrained nonlinear optimization} } - [abs]
[bib]
Implementing a smooth exact penalty function for general constrained nonlinear optimization
R. Estrin, M. P. Friedlander, D. Orban, M. A. Saunders. SIAM J. Sci. Comput., 42(3), A1836–A1859,
2020.
[Journal]
[DOI]
[more]We build upon Estrin et al. (2019) to develop a general constrained nonlinear optimization algorithm based on a smooth penalty function proposed by Fletcher (1970, 1973b). Although Fletcher’s approach has historically been considered impractical, we show that the computational kernels required are no more expensive than those in other widely accepted methods for nonlinear optimization. The main kernel for evaluating the penalty function and its derivatives solves structured linear systems. When the matrices are available explicitly, we store a single factorization each iteration. Otherwise, we obtain a factorization-free optimization algorithm by solving each linear system iteratively. The penalty function shows promise in cases where the linear systems can be solved efficiently, e.g., PDE-constrained optimization problems when efficient preconditioners exist. We demonstrate the merits of the approach, and give numerical results on several PDE-constrained and standard test problems.
@article{Estrin2020Implementing, Author = {R. Estrin and M. P. Friedlander and D. Orban and M. A. Saunders}, Year = {2020}, Month = {June}, Journal = {SIAM Journal on Scientific Computing}, Number = {3}, Volume = {42}, Pages = {A1836-A1859}, Doi = {10.1137/19m1255069}, Title = {Implementing a smooth exact penalty function for general constrained nonlinear optimization} } - [abs]
[bib]
A perturbation view of level-set methods for convex optimization
R. Estrin, M. P. Friedlander. Optimization Letters,
2020.
[Journal]
[DOI]Level-set methods for convex optimization are predicated on the idea that certain problems can be parameterized so that their solutions can be recovered as the limiting process of a root-finding procedure. This idea emerges time and again across a range of algorithms for convex problems. Here we demonstrate that strong duality is a necessary condition for the level-set approach to succeed. In the absence of strong duality, the level-set method identifies ε-infeasible points that do not converge to a feasible point as ε tends to zero. The level-set approach is also used as a proof technique for establishing sufficient conditions for strong duality that are different from Slater’s constraint qualification.
@misc{Estrin2020Perturbation, Author = {R. Estrin and M. P. Friedlander}, Year = {2020}, Month = {June}, Number = {8}, Volume = {14}, Pages = {1989-2006}, Doi = {10.1007/s11590-020-01609-9}, Title = {A perturbation view of level-set methods for convex optimization} } - [abs]
[bib]
Online mirror descent and dual averaging: keeping pace in the dynamic case
H. Fang, N J. A. Harvey, V. S. Portella, M. P. Friedlander. Intern. Conf. Machine Learning (ICML),
2020.
[arXiv]
[video]
[DOI]Online mirror descent (OMD) and dual averaging (DA) are two fundamental algorithms for online convex optimization. They are known to have very similar (or even identical) performance guarantees in most scenarios when a fixed learning rate is used. However, for dynamic learning rates OMD is provably inferior to DA. It is known that, with a dynamic learning rate, OMD can suffer linear regret, even in common settings such as prediction with expert advice. This hints that the relationship between OMD and DA is not fully understood at present. In this paper, we modify the OMD algorithm by a simple technique that we call stabilization. We give essentially the same abstract regret bound for stabilized OMD and DA by modifying the classical OMD convergence analysis in a careful and modular way, yielding proofs that we believe to be clean and flexible. Simple corollaries of these bounds show that OMD with stabilization and DA enjoy the same performance guarantees in many applications even under dynamic learning rates. We also shed some light on the similarities between OMD and DA and show simple conditions under which stabilized OMD and DA generate the same iterates.
@article{Fang2020Online, Author = {H. Fang and N J. A. Harvey and V. S. Portella and M. P. Friedlander}, Year = {2020}, Month = {June}, Journal = {International Conference on Machine Learning}, Volume = {1}, Pages = {3008-3017}, Title = {Online mirror descent and dual averaging: keeping pace in the dynamic case} } - [abs]
[bib]
Greed meets sparsity: understanding and improving greedy coordinate descent for sparse optimization
H. Fang, Z. Fan, Y. Sun, M. P. Friedlander. Intern. Conf. Artificial Intelligence and Statistics (AISTATS),
2020.We consider greedy coordinate descent (GCD) for composite problems with sparsity inducing regularizers, including 1-norm regularization and non-negative constraints. Empirical evidence strongly suggests that GCD, when initialized with the zero vector, has an implicit screening ability that usually selects at each iteration coordinates that at are nonzero at the solution. Thus, for problems with sparse solutions, GCD can converge significantly faster than randomized coordinate descent. We present an improved convergence analysis of GCD for sparse optimization, and a formal analysis of its screening properties. We also propose and analyze an improved selection rule with stronger ability to produce sparse iterates. Numerical experiments on both synthetic and real-world data support our analysis and the effectiveness of the proposed selection rule.
@article{Fang2020Greed, Author = {H. Fang and Z. Fan and Y. Sun and M. P. Friedlander}, Year = {2020}, Month = {February}, Journal = {International Conference on Artificial Intelligence and Statistics}, Pages = {434-444}, Title = {Greed meets sparsity: understanding and improving greedy coordinate descent for sparse optimization} }
2019
- [abs]
[bib]
Bundle methods for dual atomic pursuit
Z. Fan, Y. Sun, M. P. Friedlander. Asilomar Conf. Signals, Systems, Computers (ACSSC),
2019.
[arXiv]
[DOI]The aim of structured optimization is to assemble a solution, using a given set of (possibly uncountably infinite) atoms, to fit a model to data. A two-stage algorithm based on gauge duality and bundle method is proposed. The first stage discovers the optimal atomic support for the primal problem by solving a sequence of approximations of the dual problem using a bundle-type method. The second stage recovers the approximate primal solution using the atoms discovered in the first stage. The overall approach leads to implementable and efficient algorithms for large problems.
@article{Fan2019Bundle, Author = {Z. Fan and Y. Sun and M. P. Friedlander}, Year = {2019}, Month = {November}, Volume = {59}, Pages = {264-270}, Doi = {10.1109/ieeeconf44664.2019.9048711}, Title = {Bundle methods for dual atomic pursuit} } - [abs]
[bib]
One-shot atomic detection
Y. Sun, M. P. Friedlander. IEEE Intern. Workshop Comput. Adv. Multi-Sensor Adaptive Proc. (CAMSAP),
2019.
[DOI]Feature selection in data science involves identifying the most prominent and uncorrelated features in the data, which can be useful for compression and interpretability. If these feature can be easily extracted, then a model can be trained over a reduced set of weights, which leads to more efficient training and possibly more robust classifiers. There are many approaches to feature selection; in this work, we propose screening the “atoms” of a gradient of a loss function taken at a random point. We illustrate this approach on sparse and low-rank optimization problems. Despite the simplicity of the approach, we are often able to select the dominant features easily, and greatly improve the runtime and robustness in training overparametrized models.
@article{Sun2019One, Author = {Y. Sun and M. P. Friedlander}, Year = {2019}, Month = {September}, Volume = {16}, Pages = {1-5}, Doi = {10.1109/camsap45676.2019.9022441}, Title = {One-shot atomic detection} } - [abs]
[bib]
Fast training for large-scale one-versus-all linear classifiers using tree-structured initialization
H. Fang, M. Cheng, C.-J. Hsieh, M. P. Friedlander. SIAM Intern. Conf. Data Mining (SDM),
2019.
[DOI]We consider the problem of training one-versus-all (OVA) linear classifiers for multiclass or multilabel classification when the number of labels is large. A naive extension of OVA to this problem, even with hundreds of cores, usually requires hours for training on large real world datasets. We propose a novel algorithm called OVA-Primal++ that speeds up the training of OVA by using a tree-structured training order, where each classifier is trained using its parent’s classifier as initialization. OVA-Primal++ is both theoretically and empirically faster than the naive OVA algorithm, and yet still enjoys the same highly parallelizability and small memory footprint. Extensive experiments on multiclass and multilabel classification datasets validate the effectiveness of our method.
@incollection{Fang2019Fast, Author = {H. Fang and M. Cheng and C.-J. Hsieh and M. P. Friedlander}, Year = {2019}, Month = {February}, Pages = {280-288}, Doi = {10.1137/1.9781611975673.32}, Title = {Fast training for large-scale one-versus-all linear classifiers using tree-structured initialization} } - [abs]
[bib]
Polar convolution
M. P. Friedlander, I. Macêdo, T.K. Pong. SIAM J. Optimization, 29(2):1366–1391,
2019.
[DOI]The Moreau envelope is one of the key convexity-preserving functional operations in convex analysis, and it is central to the development and analysis of many approaches for solving convex optimization problems. This paper develops the theory for a parallel convolution operation, called the polar envelope, specialized to gauge functions. We show that many important properties of the Moreau envelope and the proximal map are mirrored by the polar envelope and its corresponding proximal map. These properties include smoothness of the envelope function, uniqueness and continuity of the proximal map, a role in duality and in the construction of algorithms for gauge optimization. We thus establish a suite of tools with which to build algorithms for this family of optimization problems.
@article{Friedlander2019Polar, Author = {M. P. Friedlander and I. Macêdo and T.K. Pong}, Year = {2019}, Month = {February}, Journal = {SIAM Journal on Optimization}, Number = {2}, Volume = {29}, Pages = {1366-1391}, Doi = {10.1137/18m1209088}, Title = {Polar convolution} } - [abs]
[bib]
Smooth structured prediction using quantum and classical Gibbs samplers
B. Sephehry, E. Iranmanesh, M. P. Friedlander, P. Ronagh. Adiabatic Quantum Computing Conference (AQC), arXiv:1809.04091,
2019.We introduce a quantum algorithm for solving structured prediction problems with a runtime that scales with the square root of the size of the label space, but scales in \(\tilde{O}(\epsilon^{-3.5})\) with respect to the precision, \(\epsilon\), of the solution. In doing so, we analyze a stochastic gradient algorithm for convex optimization in the presence of an additive error in the calculation of the gradients, and show that its convergence rate does not deteriorate if the additive errors are of the order \(O(\sqrt{\epsilon})\). Our algorithm uses quantum Gibbs sampling at temperature \(\Omega(\epsilon)\) as a subroutine. Based on these theoretical observations, we propose a method for using quantum Gibbs samplers to combine feedforward neural networks with probabilistic graphical models for quantum machine learning. Numerical results using Monte Carlo simulations on an image tagging task demonstrate the benefit of the approach
@article{Sephehry2019Smooth, Author = {B. Sephehry and E. Iranmanesh and M. P. Friedlander and P. Ronagh}, Year = {2019}, Month = {January}, Journal = {arXiv (Cornell University)}, Title = {Smooth structured prediction using quantum and classical Gibbs samplers} }
2018
- [abs]
[bib]
Foundations of gauge and perspective duality
A. Y. Aravkin, J. V. Burke, D. Drusvyatskiy, M. P. Friedlander, K. MacPhee. SIAM J. Optimization, 28(3):2406–2434,
2018.
[DOI]We revisit the foundations of gauge duality and demonstrate that it can be explained using a modern approach to duality based on a perturbation framework. We therefore put gauge duality and Fenchel-Rockafellar duality on equal footing, including explaining gauge dual variables as sensitivity measures, and showing how to recover primal solutions from those of the gauge dual. This vantage point allows a direct proof that optimal solutions of the Fenchel-Rockafellar dual of the gauge dual are precisely the primal solutions rescaled by the optimal value. We extend the gauge duality framework to the setting in which the functional components are general nonnegative convex functions, including problems with piecewise linear quadratic functions and constraints that arise from generalized linear models used in regression.
@misc{Aravkin2018Foundations, Author = {A. Y. Aravkin and J. V. Burke and D. Drusvyatskiy and M. P. Friedlander and K. MacPhee}, Year = {2018}, Month = {August}, Number = {3}, Volume = {28}, Pages = {2406-2434}, Doi = {10.1137/17m1119020}, Title = {Foundations of gauge and perspective duality} } - [abs]
[bib]
Level-set methods for convex optimization
A. Y. Aravkin, J. V. Burke, D. Drusvyatskiy, M. P. Friedlander, S. Roy. Mathematical Programming, 174(1-2):359–390,
2018.
[Journal]
[DOI]Convex optimization problems arising in applications often have favorable objective functions and complicated constraints, thereby precluding first-order methods from being immediately applicable. We describe an approach that exchanges the roles of the objective and constraint functions, and instead approximately solves a sequence of parametric level-set problems. A zero-finding procedure, based on inexact function evaluations and possibly inexact derivative information, leads to an efficient solution scheme for the original problem. We describe the theoretical and practical properties of this approach for a broad range of problems, including low-rank semidefinite optimization, sparse optimization, and generalized linear models for inference.
@misc{Aravkin2018Level, Author = {A. Y. Aravkin and J. V. Burke and D. Drusvyatskiy and M. P. Friedlander and S. Roy}, Year = {2018}, Month = {April}, Number = {1-2}, Volume = {174}, Pages = {359-390}, Doi = {10.1007/s10107-018-1351-8}, Title = {Level-set methods for convex optimization} }
2017
- [abs]
[bib]
Efficient evaluation of scaled proximal operators
M. P. Friedlander, G. Goh. Electronic Trans. Numerical Analysis, 46:1–22,
2017.
[DOI]Quadratic-support functions, cf. Aravkin, Burke, and Pillonetto [J. Mach. Learn. Res., 14 (2013)], constitute a parametric family of convex functions that includes a range of useful regularization terms found in applications of convex optimization. We show how an interior method can be used to efficiently compute the proximal operator of a quadratic-support function under different metrics. When the metric and the function have the right structure, the proximal map can be computed with costs nearly linear in the input size. We describe how to use this approach to implement quasi-Newton methods for a rich class of nonsmooth problems that arise, for example, in sparse optimization, image denoising, and sparse logistic regression.
@misc{Friedlander2017Efficient, Author = {M. P. Friedlander and G. Goh}, Year = {2017}, Month = {February}, Doi = {10.48550/arxiv.1603.05719}, Title = {Efficient evaluation of scaled proximal operators} }
2016
- [abs]
[bib]
Satisfying real-world goals with dataset constraints
G. Goh, A. Cotter, M. Gupta, M. P. Friedlander. Advances in Neural Information Processing Systems 29 (NIPS),
2016.
[DOI]The goal of minimizing misclassification error on a training set is often just one of several real-world goals that might be defined on different datasets. For example, one may require a classifier to also make positive predictions at some specified rate for some subpopulation (fairness), or to achieve a specified empirical recall. Other real-world goals include reducing churn with respect to a previously deployed model, or stabilizing online training. In this paper we propose handling multiple goals on multiple datasets by training with dataset constraints, using the ramp penalty to accurately quantify costs, and present an efficient algorithm to approximately optimize the resulting non-convex constrained optimization problem. Experiments on both benchmark and real-world industry datasets demonstrate the effectiveness of our approach.
@article{Goh2016Satisfying, Author = {G. Goh and A. Cotter and M. Gupta and M. P. Friedlander}, Year = {2016}, Month = {December}, Journal = {Neural Information Processing Systems}, Volume = {29}, Pages = {2423-2431}, Title = {Satisfying real-world goals with dataset constraints} } - [abs]
[bib]
Social resistance
M. P. Friedlander, N. Krislock, T. K. Pong. IEEE Comput. Sci. Eng., 8(2):98-103; reprinted in Computing Edge,
2016.
[DOI]
[Computing Edge]This case study concerns an important question in social networks: How closely related are two people in a network? We measure the strength of the relationship between two people in terms of their resistance distance, a notion of distance that has its origin in the study of electrical networks. In that context, the resistance distance measures the electrical resistance between the two nodes in a network, assuming unit resistance on each edge. Its relationship with the information distance in social networks was recently identified. For large networks, this distance measure can be expensive to compute. As the network changes, it’s useful to have an inexpensive way to make updates. We’ll study how various techniques for updating matrix factorizations can help efficiently compute the resistance distances with small changes in the network. We focus on Cholesky rank-one updates, and also briefly discuss the Sherman-Morrison-Woodbury formula (as one of the last exercises). We expect that you’re familiar with the idea of the Cholesky factorization of positive definite matrices. We’ll guide you through a derivation of the formula for the Cholesky rank-one update, and introduce two aspects that maybe new for you. First, you’ll have to figure out the vector used for the rank-one update when one additional connection in the network is identified. Second, when matrices are large, you’ll observe that there can be a big difference in computational effort when the computation involves matrix-matrix multiplications rather than only matrix-vector multiplications.
@article{Friedlander2016Social, Author = {M. P. Friedlander and N. Krislock and T. K. Pong}, Year = {2016}, Month = {August}, Journal = {Computing in Science & Engineering}, Number = {2}, Volume = {18}, Pages = {98-103}, Doi = {10.1109/mcse.2016.30}, Title = {Social resistance} } - [abs]
[bib]
Low-rank spectral optimization via gauge duality
M. P. Friedlander, I. Macêdo. SIAM J. Scientific Computing, 38(3):A1616–A1638,
2016.
[DOI]
[more]Various applications in signal processing and machine learning give rise to highly structured spectral optimization problems characterized by low-rank solutions. Two important examples that motivate this work are optimization problems from phase retrieval and from blind deconvolution, which are designed to yield rank-1 solutions. An algorithm is described that is based on solving a certain constrained eigenvalue optimization problem that corresponds to the gauge dual which, unlike the more typical Lagrange dual, has an especially simple constraint. The dominant cost at each iteration is the computation of rightmost eigenpairs of a Hermitian operator. A range of numerical examples illustrate the scalability of the approach.
@article{Friedlander2016Low, Author = {M. P. Friedlander and I. Macêdo}, Year = {2016}, Month = {June}, Journal = {SIAM Journal on Scientific Computing}, Number = {3}, Volume = {38}, Pages = {A1616-A1638}, Doi = {10.1137/15m1034283}, Title = {Low-rank spectral optimization via gauge duality} }
2015
- [abs]
[bib]
Coordinate descent converges faster with the Gauss-Southwell rule than random selection
J. Nutini, M. Schmidt, I. H. Laradji, M. P. Friedlander, H. Koepke. Intern. Conf. Machine Learning (ICML),
2015.
[Preprint]
[NIPS PDF]
[ICML PDF]
[ICML Supplementary]
[DOI]There has been significant recent work on the theory and application of randomized coordinate descent algorithms, beginning with the work of Nesterov [SIAM J. Optim., 22(2), 2012], who showed that a random-coordinate selection rule achieves the same convergence rate as the Gauss-Southwell selection rule. This result suggests that we should never use the Gauss-Southwell rule, as it is typically much more expensive than random selection. However, the empirical behaviours of these algorithms contradict this theoretical result: in applications where the computational costs of the selection rules are comparable, the Gauss-Southwell selection rule tends to perform substantially better than random coordinate selection. We give a simple analysis of the Gauss-Southwell rule showing that–except in extreme cases–its convergence rate is faster than choosing random coordinates. Further, in this work we (i) show that exact coordinate optimization improves the convergence rate for certain sparse problems, (ii) propose a Gauss-Southwell-Lipschitz rule that gives an even faster convergence rate given knowledge of the Lipschitz constants of the partial derivatives, (iii) analyze the effect of approximate Gauss-Southwell rules, and (iv) analyze proximal-gradient variants of the Gauss-Southwell rule.
@article{Nutini2015Coordinate, Author = {J. Nutini and M. Schmidt and I. H. Laradji and M. P. Friedlander and H. Koepke}, Year = {2015}, Month = {July}, Pages = {1632-1641}, Title = {Coordinate descent converges faster with the Gauss-Southwell rule than random selection} }
2014
- [abs]
[bib]
Gauge optimization and duality
M. P. Friedlander, I. Macêdo, T. K. Pong. SIAM J. Optimization, 24(4):1999–2022,
2014.Gauge functions significantly generalize the notion of a norm, and gauge optimization, as defined by Freund (1987), seeks the element of a convex set that is minimal with respect to a gauge function. This conceptually simple problem can be used to model a remarkable array of useful problems, including a special case of conic optimization, and related problems that arise in machine learning and signal processing. The gauge structure of these problems allows for a special kind of duality framework. This paper explores the duality framework proposed by Freund, and proposes a particular form of the problem that exposes some useful properties of the gauge optimization framework (such as the variational properties of its value function), and yet maintains most of the generality of the abstract form of gauge optimization.
@Article{FriedlanderMacedoPong:2014, author = {M. P. Friedlander and I. Mac\\^edo and T. K. Pong}, title = {Gauge optimization and duality}, year = 2014, journal = {SIAM Journal on Optimization}, number = 4, volume = 24, pages = {1999-2022}, doi = {10.1137/130940785} url = {http://epubs.siam.org/doi/abs/10.1137/130940785} }
2013
- [abs]
[bib]
Variational properties of value functions
A. Aravkin, J. V. Burke, M. P. Friedlander. SIAM J. Optimization, 23(3):1689–1717,
2013.
[DOI]Regularization plays a key role in a variety of optimization formulations of inverse problems. A recurring theme in regularization approaches is the selection of regularization parameters, and their effect on the solution and on the optimal value of the optimization problem. The sensitivity of the value function to the regularization parameter can be linked directly to the Lagrange multipliers. This paper characterizes the variational properties of the value functions for a broad class of convex formulations, which are not all covered by standard Lagrange multiplier theory. An inverse function theorem is given that links the value functions of different regularization formulations (not necessarily convex). These results have implications for the selection of regularization parameters, and the development of specialized algorithms. Numerical examples illustrate the theoretical results.
@article{Aravkin2013Variational, Author = {A. Aravkin and J. V. Burke and M. P. Friedlander}, Year = {2013}, Month = {August}, Journal = {SIAM Journal on Optimization}, Number = {3}, Volume = {23}, Pages = {1689-1717}, Doi = {10.1137/120899157}, Title = {Variational properties of value functions} } - [abs]
[bib]
Fast dual variational inference for non-conjugate latent gaussian models
M. E. Khan, A. Y. Aravkin, M. P. Friedlander, M. Seeger. Intern. Conf. Machine Learning (ICML),
2013.Latent Gaussian models (LGMs) are widely used in statistics and machine learning. Bayesian inference in non-conjugate LGMs is difficult due to intractable integrals involving the Gaussian prior and non-conjugate likelihoods. Algorithms based on variational Gaussian (VG) approximations are widely employed since they strike a favorable balance between accuracy, generality, speed, and ease of use. However, the structure of the optimization problems associated with these approximations remains poorly understood, and standard solvers take too long to converge. We derive a novel dual variational inference approach that exploits the convexity property of the VG approximations. We obtain an algorithm that solves a convex optimization problem, reduces the number of variational parameters, and converges much faster than previous methods. Using realworld data, we demonstrate these advantages on a variety of LGMs, including Gaussian process classification, and latent Gaussian Markov random fields.
@article{Khan2013Fast, Author = {M. E. Khan and A. Y. Aravkin and M. P. Friedlander and M. Seeger}, Year = {2013}, Month = {May}, Pages = {951-959}, Title = {Fast dual variational inference for non-conjugate latent gaussian models} } - [abs]
[bib]
Tail bounds for stochastic approximation
M. P. Friedlander, G. Goh. arXiv:1304.5586,
2013.
[arXiv]
[DOI]Stochastic-approximation gradient methods are attractive for large-scale convex optimization because they offer inexpensive iterations. They are especially popular in data-fitting and machine-learning applications where the data arrives in a continuous stream, or it is necessary to minimize large sums of functions. It is known that by appropriately decreasing the variance of the error at each iteration, the expected rate of convergence matches that of the underlying deterministic gradient method. Conditions are given under which this happens with overwhelming probability.
@misc{Friedlander2013Tail, Author = {M. P. Friedlander and G. Goh}, Year = {2013}, Month = {April}, Doi = {10.48550/arxiv.1304.5586}, Title = {Tail bounds for stochastic approximation} }
2012
- [abs]
[bib]
Hybrid deterministic-stochastic methods for data fitting
M. P. Friedlander, M. Schmidt. SIAM J. Scientific Computing, 34(3):A1380–A1405,
2012.
[more]Many structured data-fitting applications require the solution of an optimization problem involving a sum over a potentially large number of measurements. Incremental gradient algorithms offer inexpensive iterations by sampling a subset of the terms in the sum; these methods can make great progress initially, but often slow as they approach a solution. In contrast, full-gradient methods achieve steady convergence at the expense of evaluating the full objective and gradient on each iteration. We explore hybrid methods that exhibit the benefits of both approaches. Rate-of-convergence analysis shows that by controlling the sample size in an incremental-gradient algorithm, it is possible to maintain the steady convergence rates of full-gradient methods. We detail a practical quasi-Newton implementation based on this approach. Numerical experiments illustrate its potential benefits.
@article{FriedlanderSchmidt2012, author = {M. P. Friedlander and M. Schmidt}, title = {Hybrid deterministic-stochastic methods for data fitting}, journal = {SIAM J. Scientific Computing}, volume = {34}, number = {3}, pages = {A1380–A1405}, year = {2012} } - [abs]
[bib]
A primal-dual regularized interior-point method for convex quadratic programs
M. P. Friedlander, D. Orban. Mathematical Programming Computation, 4(1):71–107,
2012.
[Preprint]
[Code]
[DOI]
[more]Interior-point methods in augmented form for linear and convex quadratic programming require the solution of a sequence of symmetric indefinite linear systems which are used to derive search directions. Safeguards are typically required in order to handle free variables or rank-deficient Jacobians. We propose a consistent framework and accompanying theoretical justification for regularizing these linear systems. Our approach can be interpreted as a simultaneous proximal-point regularization of the primal and dual problems. The regularization is termed exact to emphasize that, although the problems are regularized, the algorithm recovers a solution of the original problem, for appropriate values of the regularization parameters.
@article{Friedlander2012Primal, Author = {M. P. Friedlander and D. Orban}, Year = {2012}, Month = {February}, Journal = {Mathematical Programming Computation}, Number = {1}, Volume = {4}, Pages = {71-107}, Doi = {10.1007/s12532-012-0035-2}, Title = {A primal-dual regularized interior-point method for convex quadratic programs} } - [abs]
[bib]
Recovering compressively sampled signals using partial support information
M. P. Friedlander, H. Mansour, R. Saab, Ö. Yılmaz. IEEE Trans. Info. Theory, 58(2):1122–1134,
2012.
[Journal Link]
[DOI]In this paper we study recovery conditions of weighted \(\ell_1\) minimization for signal reconstruction from compressed sensing measurements when partial support information is available. We show that if at least 50% of the (partial) support information is accurate, then weighted \(\ell_1\) minimization is stable and robust under weaker conditions than the analogous conditions for standard \(\ell_1\) minimization. Moreover, weighted \(\ell_1\) minimization provides better bounds on the reconstruction error in terms of the measurement noise and the compressibility of the signal to be recovered. We illustrate our results with extensive numerical experiments on synthetic data and real audio and video signals.
@article{Friedlander2012Recovering, Author = {M. P. Friedlander and H. Mansour and R. Saab and Ö. Yılmaz}, Year = {2012}, Month = {February}, Journal = {IEEE Transactions on Information Theory}, Number = {2}, Volume = {58}, Pages = {1122-1134}, Doi = {10.1109/tit.2011.2167214}, Title = {Recovering compressively sampled signals using partial support information} } - [abs]
[bib]
Fighting the curse of dimensionality: compressive sensing in exploration seismology
F. J. Herrmann, M. P. Friedlander, Ö. Yılmaz. IEEE Signal Processing Magazine, 29(3):88–100,
2012.
[DOI]Many seismic exploration techniques rely on the collection of massive data volumes that are mined for information during processing. This approach has been extremely successful, but current efforts toward higher resolution images in increasingly complicated regions of Earth continue to reveal fundamental shortcomings in our typical workflows. The “curse” of dimensionality is the main roadblock and is exemplified by Nyquist’s sampling criterion, which disproportionately strains current acquisition and processing systems as the size and desired resolution of our survey areas continues to increase.
@article{Herrmann2012Fighting, Author = {F. J. Herrmann and M. P. Friedlander and Ö. Yılmaz}, Year = {2012}, Month = {January}, Journal = {IEEE Signal Processing Magazine}, Number = {3}, Volume = {29}, Pages = {88-100}, Doi = {10.1109/msp.2012.2185859}, Title = {Fighting the curse of dimensionality: compressive sensing in exploration seismology} } - [abs]
[bib]
Robust inversion via semistochastic dimensionality reduction
A. Aravkin, M. P. Friedlander, T. van Leeuwen. Intern. Conf. Acoustics, Speech, and Signal Processing (ICASSP),
2012.
[DOI]We consider a class of inverse problems where it is possible to aggregate the results of multiple experiments. This class includes problems where the forward model is the solution operator to linear ODEs or PDEs. The tremendous size of such problems motivates dimensionality reduction techniques based on randomly mixing experiments. These techniques break down, however, when robust data-fitting formulations are used, which are essential in cases of missing data, unusually large errors, and systematic features in the data unexplained by the forward model. We survey robust methods within a statistical framework, and propose a semistochastic optimization approach that allows dimensionality reduction. The efficacy of the methods are demonstrated for a large-scale seismic inverse problem using the robust Student’s t-distribution, where a useful synthetic velocity model is recovered in the extreme scenario of 60% data missing at random. The semistochastic approach achieves this recovery using 20% of the effort required by a direct robust approach.
@misc{Aravkin2012Robust, Author = {A. Aravkin and M. P. Friedlander and T. van Leeuwen}, Year = {2012}, Month = {January}, Doi = {10.48550/arxiv.1110.0895}, Title = {Robust inversion via semistochastic dimensionality reduction} } - [abs]
[bib]
Robust inversion, dimensionality reduction, and randomized sampling
A. Aravkin, M. P. Friedlander, F. Herrmann, T. van Leeuwen. Mathematical Programming, 134(1):101–125, 2012,
2012.
[DOI]We consider a class of inverse problems in which the forward model is the solution operator to linear ODEs or PDEs. This class admits several dimensionality-reduction techniques based on data averaging or sampling, which are especially useful for large-scale problems. We survey these approaches and their connection to stochastic optimization. The data-averaging approach is only viable, however, for a least-squares misfit, which is sensitive to outliers in the data and artifacts unexplained by the forward model. This motivates us to propose a robust formulation based on the Student’s t-distribution of the error. We demonstrate how the corresponding penalty function, together with the sampling approach, can obtain good results for a large-scale seismic inverse problem with 50% corrupted data.
@article{Aravkin2012Robust, Author = {A. Aravkin and M. P. Friedlander and F. Herrmann and T. van Leeuwen}, Year = {2012}, Month = {January}, Journal = {Mathematical Programming}, Number = {1}, Volume = {134}, Pages = {101-125}, Doi = {10.1007/s10107-012-0571-6}, Title = {Robust inversion, dimensionality reduction, and randomized sampling} }
2011
- [abs]
[bib]
Sparse optimization with least-squares constraints
E. van den Berg, M. P. Friedlander. SIAM J. Optimization, 21(4):1201–1229,
2011.
[DOI]
[more]The use of convex optimization for the recovery of sparse signals from incomplete or compressed data is now common practice. Motivated by the success of basis pursuit in recovering sparse vectors, new formulations have been proposed that take advantage of different types of sparsity. In this paper we propose an efficient algorithm for solving a general class of sparsifying formulations. For several common types of sparsity we provide applications, along with details on how to apply the algorithm, and experimental results.
@article{Berg2011Sparse, Author = {E. van den Berg and M. P. Friedlander}, Year = {2011}, Month = {October}, Journal = {SIAM Journal on Optimization}, Number = {4}, Volume = {21}, Pages = {1201-1229}, Doi = {10.1137/100785028}, Title = {Sparse optimization with least-squares constraints} }
2010
- [abs]
[bib]
Theoretical and empirical results for recovery from multiple measurements
E. van den Berg, M. P. Friedlander. IEEE Trans. Info. Theory, 56(5):2516–2527,
2010.
[Code & Data]
[DOI]
[more]The joint-sparse recovery problem aims to recover, from sets of compressed measurements, unknown sparse matrices with nonzero entries restricted to a subset of rows. This is an extension of the single-measurement-vector (SMV) problem widely studied in compressed sensing. We analyze the recovery properties for two types of recovery algorithms. First, we show that recovery using sum-of-norm minimization cannot exceed the uniform recovery rate of sequential SMV using L1 minimization, and that there are problems that can be solved with one approach but not with the other. Second, we analyze the performance of the ReMBo algorithm [M. Mishali and Y. Eldar, IEEE Trans. Sig. Proc., 56 (2008)] in combination with L1 minimization, and show how recovery improves as more measurements are taken. From this analysis it follows that having more measurements than number of nonzero rows does not improve the potential theoretical recovery rate.
@article{Berg2010Theoretical, Author = {E. van den Berg and M. P. Friedlander}, Year = {2010}, Month = {May}, Journal = {IEEE Transactions on Information Theory}, Number = {5}, Volume = {56}, Pages = {2516-2527}, Doi = {10.1109/tit.2010.2043876}, Title = {Theoretical and empirical results for recovery from multiple measurements} }
2009
- [abs]
[bib]
Optimizing costly functions with simple constraints: a limited-memory projected quasi-Newton algorithm
M. Schmidt, E. van den Berg, M. P. Friedlander, K. Murphy. Intern. Conf. Artificial Intelligence and Statistics (AISTATS),
2009.
[Slides]
[Software]An optimization algorithm for minimizing a smooth function over a convex set is described. Each iteration of the method computes a descent direction by minimizing, over the original constraints, a diagonal plus low-rank quadratic approximation to the function. The quadratic approximation is constructed using a limited-memory quasi-Newton update. The method is suitable for large-scale problems where evaluation of the function is substantially more expensive than projection onto the constraint set. Numerical experiments on one-norm regularized test problems indicate that the proposed method is competitive with state-of-the-art methods such as bound-constrained L-BFGS and orthant-wise descent. We further show that the method generalizes to a wide class of problems, and substantially improves on state-of-the-art methods for problems such as learning the structure of Gaussian graphical models and Markov random fields.
@article{Schmidt2009Optimizing, Author = {M. Schmidt and E. van den Berg and M. P. Friedlander and K. Murphy}, Year = {2009}, Month = {April}, Pages = {456-463}, Title = {Optimizing costly functions with simple constraints: a limited-memory projected quasi-Newton algorithm} } - [abs]
[bib]
Sparco: a testing framework for sparse reconstruction
E. van den Berg, M. P. Friedlander, G. Hennenfent, F. Herrmann, R. Saab, Ö. Yılmaz. ACM Trans. Math. Software, 35(4):1–16,
2009.
[Code]
[more]Sparco is a framework for testing and benchmarking algorithms for sparse reconstruction. It includes a large collection of sparse reconstruction problems drawn from the imaging, compressed sensing, and geophysics literature. Sparco is also a framework for implementing new test problems and can be used as a tool for reproducible research. Sparco is implemented entirely in Matlab, and is released as open-source software under the GNU Public License.
@article{Berg2009Sparco, Author = {E. van den Berg and M. P. Friedlander and G. Hennenfent and F. Herrmann and R. Saab and Ö. Yılmaz}, Year = {2009}, Month = {February}, Title = {Sparco: a testing framework for sparse reconstruction} }
2008
- [abs]
[bib]
Probing the Pareto frontier for basis pursuit solutions
E. van den Berg, M. P. Friedlander. SIAM J. Scientific Computing, 31(2):890–912,
2008.
[SPGL1 Software]
[DOI]
[more]The basis pursuit problem seeks a minimum one-norm solution of an underdetermined least-squares problem. Basis pursuit denoise (BPDN) fits the least-squares problem only approximately, and a single parameter determines a curve that traces the optimal trade-off between the least-squares fit and the one-norm of the solution. We prove that this curve is convex and continuously differentiable over all points of interest, and show that it gives an explicit relationship to two other optimization problems closely related to BPDN. We describe a root-finding algorithm for finding arbitrary points on this curve; the algorithm is suitable for problems that are large scale and for those that are in the complex domain. At each iteration, a spectral gradient-projection method approximately minimizes a least-squares problem with an explicit one-norm constraint. Only matrix-vector operations are required. The primal-dual solution of this problem gives function and derivative information needed for the root-finding method. Numerical experiments on a comprehensive set of test problems demonstrate that the method scales well to large problems.
@article{Berg2008Probing, Author = {E. van den Berg and M. P. Friedlander}, Year = {2008}, Month = {November}, Journal = {SIAM Journal on Scientific Computing}, Number = {2}, Volume = {31}, Pages = {890-912}, Doi = {10.1137/080714488}, Title = {Probing the Pareto frontier for basis pursuit solutions} } - [abs]
[bib]
Computing nonnegative tensor factorizations
M. P. Friedlander, K. Hatz. Optimization Methods and Software, 23(4):631–647,
2008.
[Journal Link]
[DOI]
[more]Nonnegative tensor factorization (NTF) is a technique for computing a parts-based representation of high-dimensional data. NTF excels at exposing latent structures in datasets, and at finding good low-rank approximations to the data. We describe an approach for computing the NTF of a dataset that relies only on iterative linear-algebra techniques and that is comparable in cost to the nonnegative matrix factorization. (The better-known nonnegative matrix factorization is a special case of NTF and is also handled by our implementation.) Some important features of our implementation include mechanisms for encouraging sparse factors and for ensuring that they are equilibrated in norm. The complete Matlab software package is available under the GPL license.
@article{Friedlander2008Computing, Author = {M. P. Friedlander and K. Hatz}, Year = {2008}, Month = {August}, Journal = {Optimization methods & software}, Number = {4}, Volume = {23}, Pages = {631-647}, Doi = {10.1080/10556780801996244}, Title = {Computing nonnegative tensor factorizations} } - [abs]
[bib]
New insights into one-norm solvers from the Pareto curve
G. Hennenfent, E. van den Berg, M. P. Friedlander, F. Herrmann. Geophysics, 73(4):A23–A26,
2008.
[DOI]Geophysical inverse problems typically involve a trade off between data misfit and some prior. Pareto curves trace the optimal trade off between these two competing aims. These curves are commonly used in problems with two-norm priors where they are plotted on a log-log scale and are known as L-curves. For other priors, such as the sparsity-promoting one norm, Pareto curves remain relatively unexplored. We show how these curves lead to new insights in one-norm regularization. First, we confirm the theoretical properties of smoothness and convexity of these curves from a stylized and a geophysical example. Second, we exploit these crucial properties to approximate the Pareto curve for a large-scale problem. Third, we show how Pareto curves provide an objective criterion to gauge how different one-norm solvers advance towards the solution.
@article{Hennenfent2008New, Author = {G. Hennenfent and E. van den Berg and M. P. Friedlander and F. Herrmann}, Year = {2008}, Month = {July}, Journal = {Geophysics}, Number = {4}, Volume = {73}, Pages = {A23-A26}, Doi = {10.1190/1.2944169}, Title = {New insights into one-norm solvers from the Pareto curve} } - [abs]
[bib]
Group sparsity via linear-time projection
E. van den Berg, M. Schmidt, M. P. Friedlander, K. Murphy. Tech. Rep. TR-2008-09, Dept of Computer Science, UBC,
2008.We present an efficient spectral projected-gradient algorithm for optimization subject to a group l1-norm constraint. Our approach is based on a novel linear-time algorithm for Euclidean projection onto the l1- and group l1-norm constraints. Numerical experiments on large data sets suggest that the proposed method is substantially more efficient and scalable than existing methods.
@article{Berg2008Group, Author = {E. van den Berg and M. Schmidt and M. P. Friedlander and K. Murphy}, Year = {2008}, Month = {June}, Title = {Group sparsity via linear-time projection} } - [abs]
[bib]
Global and finite termination of a two-phase augmented Lagrangian filter method for general quadratic programs
M. P. Friedlander, S. Leyffer. SIAM J. Scientific Computing, 30(4):1706–1726,
2008.
[DOI]We present a two-phase algorithm for solving large-scale quadratic programs (QPs). In the first phase, gradient-projection iterations approximately minimize an augmented Lagrangian function and provide an estimate of the optimal active set. In the second phase, an equality-constrained QP defined by the current inactive variables is approximately minimized in order to generate a second-order search direction. A filter determines the required accuracy of the subproblem solutions and provides an acceptance criterion for the search directions. The resulting algorithm is globally and finitely convergent. The algorithm is suitable for large-scale problems with many degrees of freedom, and provides an alternative to interior-point methods when iterative methods must be used to solve the underlying linear systems. Numerical experiments on a subset of the CUTEr QP test problems demonstrate the effectiveness of the approach.
@article{Friedlander2008Global, Author = {M. P. Friedlander and S. Leyffer}, Year = {2008}, Month = {April}, Journal = {SIAM Journal on Scientific Computing}, Number = {4}, Volume = {30}, Pages = {1706-1729}, Doi = {10.1137/060669930}, Title = {Global and finite termination of a two-phase augmented Lagrangian filter method for general quadratic programs} }
2007
- [bib]
Discussion: The Dantzig selector: Statistical estimation when p is much larger than n
M. P. Friedlander, M. A. Saunders. Annals of Statistics, 35(6):2385–2391,
2007.
[Journal]
[Slides]
[DOI]
@article{Friedlander2007Discussion, Author = {M. P. Friedlander and M. A. Saunders}, Year = {2007}, Month = {December}, Journal = {The Annals of Statistics}, Number = {6}, Volume = {35}, Doi = {10.1214/009053607000000479}, Title = {Discussion: The Dantzig selector: Statistical estimation when p is much larger than n} } - [abs]
[bib]
Exact regularization of convex programs
M. P. Friedlander, P. Tseng. SIAM J. Optimization, 18(4):1326–1350,
2007.
[Data & Code]
[DOI]
[more]The regularization of a convex program is exact if all solutions of the regularized problem are also solutions of the original problem for all values of the regularization parameter below some positive threshold. For a general convex program, we show that the regularization is exact if and only if a certain selection problem has a Lagrange multiplier. Moreover, the regularization parameter threshold is inversely related to the Lagrange multiplier. We use this result to generalize an exact regularization result of Ferris and Mangasarian [Appl. Math. Optim., 23 (1991), pp. 266-273] involving a linearized selection problem. We also use it to derive necessary and sufficient conditions for exact penalization, similar to those obtained by Bertsekas [Math. Programming, 9 (1975), pp. 87-99] and by Bertsekas, Nedić, and Ozdaglar [Convex Analysis and Optimization, Athena Scientific, Belmont, MA, 2003]. When the regularization is not exact, we derive error bounds on the distance from the regularized solution to the original solution set. We also show that existence of a "weak sharp minimum" is in some sense close to being necessary for exact regularization. We illustrate the main result with numerical experiments on the \(\ell_1\) regularization of benchmark (degenerate) linear programs and semidefinite/second-order cone programs. The experiments demonstrate the usefulness of \(\ell_1\) regularization in finding sparse solutions.
@article{Friedlander2007Exact, Author = {M. P. Friedlander and P. Tseng}, Year = {2007}, Month = {November}, Journal = {SIAM Journal on Optimization}, Number = {4}, Volume = {18}, Pages = {1326-1350}, Doi = {10.1137/060675320}, Title = {Exact regularization of convex programs} } - [abs]
[bib]
A filter active-set trust-region method
M. P. Friedlander, N. I. M. Gould, S. Leyffer, T. S. Munson. Preprint ANL/MCS-P1456-0907, Argonne National Laboratory,
2007.
[Preprint]We propose a two-phase active-set method for nonlinearly constrained optimization. The first phase solves a regularized linear program (LP) that serves to estimate an optimal active set. The second phase uses this active-set estimate to determine an equality-constrained quadratic program which provides for fast local convergence. A filter promotes global convergence. The regularization term in the first phase bounds the LP solution and plays role similar to an explicit ellipsoid trust-region constraint. We prove that the resulting method is globally convergent, and that an optimal active set is identified near a solution. We discuss alternative regularization functions that incorporate curvature information into the active-set identification phase. Preliminary numerical experiments on a subset of the CUTEr test problems illustrate the effectiveness of the approach.
@article{Friedlander2007Filter, Author = {M. P. Friedlander and N. I. M. Gould and S. Leyffer and T. S. Munson}, Year = {2007}, Month = {September}, Title = {A filter active-set trust-region method} } - [abs]
[bib]
In pursuit of a root
E. van den Berg, M. P. Friedlander. Tech. Rep. TR-2007-19, Dept. of Computer Science, UBC,
2007.
[more]The basis pursuit technique is used to find a minimum one-norm solution of an underdetermined least-squares problem. Basis pursuit denoise fits the least-squares problem only approximately, and a single parameter determines a curve that traces the trade-off between the least-squares fit and the one-norm of the solution. We show that the function that describes this curve is convex and continuously differentiable over all points of interest. The dual solution of a least-squares problem with an explicit one-norm constraint gives function and derivative information needed for a root-finding method. As a result, we can compute arbitrary points on this curve. Numerical experiments demonstrate that our method, which relies on only matrix-vector operations, scales well to large problems.
@article{Berg2007Pursuit, Author = {E. van den Berg and M. P. Friedlander}, Year = {2007}, Month = {June}, Title = {In pursuit of a root} } - [abs]
[bib]
Diffuse optical fluorescence tomography using time-resolved data acquired in transmission
F. Leblond, S. Fortier, M. P. Friedlander. Multimodal Biomedical Imaging II, vol. 6431. Proc. Intern. Society Optimal Imaging,
2007.
[Code]
[DOI]
[more]We present an algorithm using data acquired with a time-resolved system with the goal of reconstructing sources of fluorescence emanating from the deep interior of highly scattering biological tissues. A novelty in our tomography algorithm is the integration of a light transport model adapted to rodent geometries. For small volumes, our analysis suggest that neglecting the index of refraction mismatch between diffusive and non-diffusive regions, as well as the curved nature of the boundary, can have a profound impact on fluorescent images and spectroscopic applications relying on diffusion curve fitting. Moreover, we introduce a new least-squares solver with bound constraints adapted for optical problems where a physical non-negative constraint can be imposed. Finally, we find that maximizing the time-related information content of the data in the reconstruction process significantly enhances the quality of fluorescence images. Preliminary noise propagation and detector placement optimization analysis are also presented.
@article{Leblond2007Diffuse, Author = {F. Leblond and S. Fortier and M. P. Friedlander}, Year = {2007}, Month = {February}, Journal = {Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE}, Volume = {6431}, Pages = {643106}, Doi = {10.1117/12.700841}, Title = {Diffuse optical fluorescence tomography using time-resolved data acquired in transmission} }
2006
- [abs]
[bib]
On minimizing distortion and relative entropy
M. P. Friedlander, M. R. Gupta. IEEE Trans. Info. Theory, 52(1):238–245,
2006.
[DOI]A common approach for estimating a probability mass function \(w\) when given a prior \(q\) and moment constraints given by \(Aw\le b\) is to minimize the relative entropy between \(w\) and \(q\) subject to the set of linear constraints. In such cases, the solution \(w\) is known to have exponential form. We consider the case in which the linear constraints are noisy, uncertain, infeasible, or otherwise “soft.” A solution can then be obtained by minimizing both the relative entropy and violation of the constraints \(Aw\le b\). A penalty parameter \(\sigma\) weights the relative importance of these two objectives. We show that this penalty formulation also yields a solution \(w\) with exponential form. If the distortion is based on an \(\ell_p\) norm, then the exponential form of \(w\) is shown to have exponential decay parameters that are bounded as a function of \(\sigma\). We also state conditions under which the solution \(w\) to the penalty formulation will result in zero distortion, so that the moment constraints hold exactly. These properties are useful in choosing penalty parameters, evaluating the impact of chosen penalty parameters, and proving properties about methods that use such penalty formulations. to maximizing entropy.
@article{Friedlander2006Minimizing, Author = {M. P. Friedlander and M. R. Gupta}, Year = {2006}, Month = {January}, Journal = {IEEE Transactions on Information Theory}, Number = {1}, Volume = {52}, Pages = {238-245}, Doi = {10.1109/tit.2005.860448}, Title = {On minimizing distortion and relative entropy} }
2005
- [abs]
[bib]
A two-sided relaxation scheme for mathematical programs with equilibrium constraints
V. Demiguel, M. P. Friedlander, F. J. Nogales, S. Scholtes. SIAM J. Optimization, 16(1):587–609,
2005.
[DOI]We propose a relaxation scheme for mathematical programs with equilibrium constraints (MPECs). In contrast to previous approaches, our relaxation is two-sided: both the complementarity and the nonnegativity constraints are relaxed. The proposed relaxation update rule guarantees (under certain conditions) that the sequence of relaxed subproblems will maintain a strictly feasible interior–even in the limit. We show how the relaxation scheme can be used in combination with a standard interior-point method to achieve superlinear convergence. Numerical results on the MacMPEC test problem set demonstrate the fast local convergence properties of the approach.
@article{Demiguel2005Two, Author = {V. Demiguel and M. P. Friedlander and F. J. Nogales and S. Scholtes}, Year = {2005}, Month = {December}, Journal = {SIAM Journal on Optimization}, Number = {2}, Volume = {16}, Pages = {587-609}, Doi = {10.1137/04060754x}, Title = {A two-sided relaxation scheme for mathematical programs with equilibrium constraints} } - [abs]
[bib]
Exact regularization of linear programs
M. P. Friedlander. Tech. Rep. TR-2005-31, Dept. of Computer Science, UBC,
2005.
[Data]
[more]We show that linear programs (LPs) admit regularizations that either contract the original (primal) solution set or leave it unchanged. Any regularization function that is convex and has compact level sets is allowed; differentiability is not required. This is an extension of the result first described by Mangasarian and Meyer (SIAM J. Control Optim., 17(6), pp. 745-752, 1979). We show that there always exist positive values of the regularization parameter such that a solution of the regularized problem simultaneously minimizes the original LP and minimizes the regularization function over the original solution set. We illustrate the main result using the nondifferentiable one-norm regularization function on a set of degenerate LPs. Numerical results demonstrate how such an approach yields sparse solutions from the application of an interior-point method.
@techreport{Frie:2005, Address = {University of British Columbia, Vancouver}, Author = {M. P. Friedlander}, Institution = {Dept. Computer Science}, Month = {December}, Number = {TR-2005-31}, Title = {Exact regularization of linear programs}, Type = {Tech. Rep.}, Year = 2005 } - [abs]
[bib]
A globally convergent linearly constrained Lagrangian method for nonlinear optimization
M. P. Friedlander, M. A. Saunders. SIAM J. Optimization 15(3):863–897,
2005.
[DOI]For optimization problems with nonlinear constraints, linearly constrained Lagrangian (LCL) methods solve a sequence of subproblems of the form "minimize an augmented Lagrangian function subject to linearized constraints." Such methods converge rapidly near a solution but may not be reliable from arbitrary starting points. Nevertheless, the well-known software package MINOS has proved effective on many large problems. Its success motivates us to derive a related LCL algorithm that possesses three important properties: it is globally convergent, the subproblem constraints are always feasible, and the subproblems may be solved inexactly. The new algorithm has been implemented in Matlab, with an option to use either MINOS or SNOPT (Fortran codes) to solve the linearly constrained subproblems. Only first derivatives are required. We present numerical results on a subset of the COPS, HS, and CUTE test problems, which include many large examples. The results demonstrate the robustness and efficiency of the stabilized LCL procedure.
@article{Friedlander2005Globally, Author = {M. P. Friedlander and M. A. Saunders}, Year = {2005}, Month = {April}, Journal = {SIAM Journal on Optimization}, Number = {3}, Volume = {15}, Pages = {863-897}, Doi = {10.1137/s1052623402419789}, Title = {A globally convergent linearly constrained Lagrangian method for nonlinear optimization} }
2000
- [abs]
[bib]
Maximum entropy classification applied to speech
M. R. Gupta, M. P. Friedlander, R. M. Gray. Asilomar Conf. Signals, Systems, Computers (ACSSC), vol. 2, 1480–1483,
2000.
[DOI]We present a new method for classification using the maximum entropy principle, allowing full use of relevant training data and smoothing the data space. To classify a test point we compute a maximum entropy weight distribution over a subset of training data and constrain the weights to exactly reconstruct the test point. The classification problem is formulated as a linearly constrained optimization problem and solved using a primal-dual logarithmic barrier method, well suited for high-dimensional data. We discuss theoretical advantages and present experimental results on vowel data which demonstrate that the method performs competitively for speech classification tasks.
@article{Gupta2000Maximum, Author = {M. R. Gupta and M. P. Friedlander and R. M. Gray}, Year = {2000}, Month = {October}, Volume = {2}, Pages = {1480-1483}, Doi = {10.1109/acssc.2000.911236}, Title = {Maximum entropy classification applied to speech} }
- [abs]
[bib]
Average-case thresholds for exact regularization of linear programs
M. P. Friedlander, S. Kubal, Y. Plan, M. S. Scott. arXiv:2510.13083,
2025.Small regularizers can preserve linear programming solutions exactly. This paper provides the first average-case analysis of exact regularization: with a standard Gaussian cost vector and fixed constraint set, bounds are established for the probability that exact regularization succeeds as a function of regularization strength. Failure is characterized via the Gaussian measure of inner cones, controlled by novel two-sided bounds on the measure of shifted cones. Results reveal dimension-dependent scaling laws and connect exact regularization of linear programs to their polyhedral geometry via the normal fan and the Gaussian (solid-angle) measure of its cones. Computable bounds are obtained in several canonical settings, including regularized optimal transport. Numerical experiments corroborate the predicted scalings and thresholds.
@article{friedlander2025average, title={Average-case thresholds for exact regularization of linear programs}, author={Friedlander, Michael P and Kubal, Sharvaj and Plan, Yaniv and Scott, Matthew S}, journal={arXiv preprint arXiv:2510.13083}, year={2025} } - [abs]
[bib]
Tracking Cu-fertile sediment sources via multivariate petrochronological mixture modeling of detrital zircons
J. E. Saylor, N. Richardson, N. Graham, R. G. Lee, M. P. Friedlander. J. Geophys. Res.: Earth Surface, 130(10):e2025JF008406,
2025.Whereas the ability to acquire petrochronological data from detrital minerals has exploded, development of tools to analyze and interpret the multivariate data sets has not kept pace. Herein, we present a case study, which applies the recently developed non-negative Tucker-1 decomposition (NNT1) method to a multivariate detrital zircon data set from till samples collected above the Cu-bearing Guichon Creek Batholith in southern British Columbia, Canada. Zircon composition variables that we consider include age, Ce anomaly, CeN/NdN, DyN/YbN, ΔFMQ, Eu anomaly, ΣHREE/ΣMREE, Hf, Th/U, Ti temperature, and YbN/GdN. The NNT1 approach successfully deconvolves the multivariate data set into two endmembers, which are consistent with derivation either from non-oxidized and relatively anhydrous (i.e., low Cu-ore potential, Source 1) or oxidized and hydrous (i.e., potential Cu-ore bodies, Source 2) igneous rocks. Furthermore, we attribute each of the zircon grains to either the Source 1 or 2 endmember based on maximization of the likelihood that their measured multivariate geochemistry was drawn from one or the other of the learned multivariate endmembers. Finally, we demonstrate that the proportions of the Source 2 endmember decrease with increasing distance from the ore bodies, as expected due to down-ice or off-axis zircon mixing and dilution. We conclude that the NNT1 approach provides insight into geologically meaningful sediment transport processes and multivariate sediment sources even when those sources are unknown. It thus provides a basis for future petrochronological interpretations with applied and pure geoscience applications.
@article{saylor2025tracking, title={Tracking Cu-fertile sediment sources via multivariate petrochronological mixture modeling of detrital zircons}, author={Saylor, Joel E and Richardson, Nicholas and Graham, Naomi and Lee, Robert G and Friedlander, Michael P}, journal={Journal of Geophysical Research: Earth Surface}, volume={130}, number={10}, pages={e2025JF008406}, year={2025}, publisher={Wiley Online Library} } - [abs]
[bib]
Decentralized Optimization with Topology-Independent Communication
Y. Lin, Y. Kuang, A. Alacaoglu, M. P. Friedlander. arXiv:2509.14488,
2025.Distributed optimization requires nodes to coordinate, yet full synchronization scales poorly. When $n$ nodes collaborate through $m$ pairwise regularizers, standard methods demand $\mathcal{O}(m)$ communications per iteration. This paper proposes randomized local coordination: each node independently samples one regularizer uniformly and coordinates only with nodes sharing that term. This exploits partial separability, where each regularizer $G_j$ depends on a subset $S_j \subseteq \{1,\ldots,n\}$ of nodes. For graph-guided regularizers where $|S_j|=2$, expected communication drops to exactly 2 messages per iteration. This method achieves $\tilde{\mathcal{O}}(\varepsilon^{-2})$ iterations for convex objectives and under strong convexity, $\mathcal{O}(\varepsilon^{-1})$ to an $\varepsilon$-solution and $\mathcal{O}(\log(1/\varepsilon))$ to a neighborhood. Replacing the proximal map of the sum $\sum_j G_j$ with the proximal map of a single randomly selected regularizer $G_j$ preserves convergence while eliminating global coordination. Experiments validate both convergence rates and communication efficiency across synthetic and real-world datasets.
@misc{LinKuangAlacaogluFriedlander2025, author = {Lin, Ying and Kuang, Yao and Alacaoglu, Ahmet and Friedlander, Michael P.}, title = {Decentralized Optimization with Topology-Independent Communication}, year = {2025}, eprint = {2509.14488}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, doi = {10.48550/arXiv.2509.14488} } - [abs]
[bib]
Estimates of the dynamic structure factor for the finite temperature electron liquid via analytic continuation of path integral Monte Carlo data
T. Chuna, N. Barnfield, J. Vorberger, M. P. Friedlander, T. Hoheisel, T. Dornheim. Phys. Rev. B, 112(12):125112,
2025.Understanding the dynamic properties of uniform electron gas (UEG) is important for numerous applications ranging from semiconductor physics to exotic warm dense matter. In this work, we apply the maximum entropy method (MEM), as implemented by Thomas Chuna et al., J. Phys. A Math. Theor. 58, 335203 (2025), to ab initio path integral Monte Carlo (PIMC) results for the imaginary-time correlation function 𝐹(𝑞,𝜏) to estimate the dynamic structure factor 𝑆(𝑞,𝜔) over an unprecedented range of densities at the electronic Fermi temperature. To conduct the MEM, we propose to construct the Bayesian prior 𝜇 from the PIMC data. Constructing the static approximation leads to a drastic improvement in 𝑆(𝑞,𝜔) estimate over using the simpler random phase approximation (RPA) as the Bayesian prior. We present results for the strongly coupled electron liquid regime with 𝑟𝑠=50,⋯,200, which reveal a pronounced roton-type feature and an incipient double peak structure in 𝑆(𝑞,𝜔) for intermediate wave numbers at 𝑟𝑠=200. We also find that our dynamic structure factors satisfy known sum rules, even though these sum rules are not enforced explicitly. To verify our results, we show that our MEM estimates converge to the RPA limit at higher densities and weaker coupling 𝑟𝑠=2 and 5. Further, we compare with two different existing results at intermediate density and coupling strength 𝑟𝑠=10 and 20, and we find good agreement with more conservative estimates. Combining all of our results for 𝑟𝑠=2,5,10,20,50,100,200, we present estimates of a dispersion relation that show a continuous deepening of its minimum value at higher coupling. An advantage of our setup is that it is not specific to the UEG, thereby opening up new avenues to study the dynamics of real warm dense matter systems based on cutting-edge PIMC simulations in future works.
@article{Chuna2025Estimates, Author = {T. Chuna and N. Barnfield and J. Vorberger and M. P. Friedlander and T. Hoheisel and T. Dornheim}, Year = {2025}, Month = {September}, Journal = {Physical review. B./Physical review. B}, Number = {12}, Volume = {112}, Doi = {10.1103/4d4b-kgtk}, Title = {Estimates of the dynamic structure factor for the finite temperature electron liquid via analytic continuation of path integral Monte Carlo data} } - [abs]
[bib]
Dual formulation of the maximum entropy method applied to analytic continuation of quantum Monte Carlo data
T. Chuna, N. Barnfield, T. Dornheim, M. P. Friedlander, T. Hoheisel. J. Phys. A: Math. Theor., 58(33):335203,
2025.
[GitHub]
[DOI]Many fields of physics use quantum Monte Carlo techniques, but struggle to estimate dynamic spectra via the analytic continuation of imaginary-time quantum Monte Carlo data. One of the most ubiquitous approaches to analytic continuation is the maximum entropy method (MEM). We supply a dual Newton optimization algorithm to be used within the MEM and provide analytic bounds for the algorithm’s error. The optimization algorithm is freely available on github. The MEM is typically used with Bryan’s controversial algorithm (Rothkopf 2020 Data 5 55). We present new theoretical issues that are not yet in the literature. Our algorithm has all the theoretical benefits of Bryan’s algorithm without these theoretical issues the implementation of the dual Newton optimizer within the MEM is freely available on github. We compare the MEM with Bryan’s optimization to the MEM with our dual Newton optimization on test problems from lattice quantum chromodynamics and plasma physics. These comparisons show that in the presence of noise the dual Newton algorithm produces better estimates and error bars; this indicates the limits of Bryan’s algorithm’s applicability. We use the MEM to investigate authentic quantum Monte Carlo data for the uniform electron gas at warm dense matter conditions and further substantiate the roton-type feature in the dispersion relation.
@misc{Chuna2025Dual, Author = {T. Chuna and N. Barnfield and T. Dornheim and M. P. Friedlander and T. Hoheisel}, Year = {2025}, Month = {August}, Doi = {10.2139/ssrn.5098440}, Title = {Dual formulation of the maximum entropy method applied to analytic continuation of quantum Monte Carlo data} } - [abs]
[bib]
Tracing Sedimentary Origins in Multivariate Geochronology via Constrained Tensor Factorization
N. Graham, N. Richardson, M. P. Friedlander, J. Saylor. Mathematical Geosciences, 57(4):601-628,
2025.
[DOI]We devise a novel statistical method for deconvolving multivariate geochronology and geochemistry datasets to characterize sediment sources. The approach employs a third-order constrained Tucker-1 tensor decomposition to estimate the probability distributions of multiple features in sediment samples. By integrating kernel density estimation with matrix-tensor factorization, the model quantitatively determines the distributions and mixing proportions of sediment sources. The methodology introduces a numerical test for rank estimation to define the number of latent sources, and uses a maximum-likelihood approach to correlate individual detrital grains to these latent sources based on an arbitrary number of features. The method’s efficacy is validated through a numerical experiment with detrital zircon data that captures natural variability associated with temporal changes in crustal thickness in the Andes. The findings hold potential implications for resolving sediment sources, determining sediment mixing, enhancing the understanding of sediment transport processes, characterizing the lithology, tectonic motion, or metallogenic potential of sediment sources. This method is adaptable to other data de-mixing problems and is available in a publicly released software package.
@article{Graham2025Tracing, Author = {N. Graham and N. Richardson and M. P. Friedlander and J. Saylor}, Year = {2025}, Month = {May}, Journal = {Mathematical Geosciences}, Number = {4}, Volume = {57}, Pages = {601-628}, Doi = {10.1007/s11004-024-10175-0}, Title = {Tracing Sedimentary Origins in Multivariate Geochronology via Constrained Tensor Factorization} } - [abs]
[bib]
Endmember modelling of detrital zircon petrochronology data via multivariate Tucker-1 tensor decomposition
J. E. Saylor, N. Richardson, N. Graham, R. G. Lee, M. P. Friedlander. EGU General Assembly 2025,
2025.
[EGU Abstract]
[Web App]
[GitHub]
Detrital petrochronology is a powerful method of characterizing sediment and potentially sediment sources. The recently developed Tucker-1 decomposition method holds promise of using detrital petrochronology to identify both sediment-source characteristics and the proportions in which sources are present in sink samples even when sediment sources are unknown or unavailable for sampling. However, the correlation between endmember characteristics and lithological sources or proportions and sedimentary processes has not been established. Herein we present a case study of the recently developed Tucker-1 decomposition method to a multivariate geochemical data set from detrital zircons in till samples collected above the Cu-bearing Guichon Creek Batholith (GCB) in southern British Columbia, Canada. Data include a suite of eleven variables, including age, Ce anomaly, CeN/NdN, DyN/YbN, ΔFMQ, Eu anomaly, ΣHREE/ΣMREE, Hf, Th/U, Ti temperature, and YbN/GdN, from 12 samples from collected at a range of distances in the down ice-flow direction from the GCB.
We demonstrate that endmember modelling using the Tucker-1 decomposition method successfully deconvolves the multivariate data sets into two endmembers in which the geochemical distributions are consistent with derivation from either non-oxidized and relatively anhydrous (i.e., low ore potential, Source 1) or oxidized and hydrous (i.e., potential ore bodies, Source 2) igneous rocks. Furthermore, we demonstrate that the proportions of the Source 2 endmember decrease with increasing distance from the ore bodies, as expected due to downstream zircon mixing and dilution.
Finally, we attribute each of the zircon grains to either the Source 1 or 2 endmember based on maximization of the likelihood that their measured multivariate geochemistry was drawn from one or the other of the learned multivariate endmembers. We compared these grain attributions to the results of an independent Classification and Regression Tree (CART) analysis designed to characterize zircon grains as either “fertile” or “barren” with respect to copper based on their geochemistry. We find that there is ~80% overlap between the source attributions based on the CART analysis and the grain-source identification based on the Tucker-1 decomposition.
We conclude that the novel Tucker-1 decomposition approach provides a flexible, precise, and accurate method of characterizing multivariate sediment sources even when those sources are unknown. It thus provides a basis for future petrochronological interpretations with applied and pure geoscience applications. All of the analyses presented herein can be freely accessed through a web application (https://dzgrainalyzer.eoas.ubc.ca/) or open-source Julia code (https://github.com/MPF-Optimization-Laboratory/MatrixTensorFactor.jl).
@inproceedings{saylor2025endmember, title={Endmember modelling of detrital zircon petrochronology data via multivariate Tucker-1 tensor decomposition}, author={Saylor, Joel E and Richardson, Nicholas and Graham, Naomi and Lee, Robert G and Friedlander, Michael P}, booktitle={EGU General Assembly Conference Abstracts}, pages={EGU25--4081}, year={2025} } - [abs]
[bib]
Scalable Data-Driven Basis Selection for Linear Machine Learning Interatomic Potentials
T. Torabi, M. Militzer, M. P. Friedlander, C. Ortner. arXiv:2504.16418,
2025.
[arXiv]Machine learning interatomic potentials (MLIPs) pro- vide an effective approach for accurately and efficiently modeling atomic interactions, expanding the capabilities of atomistic sim- ulations to complex systems. However, a priori feature selection leads to high complexity, which can be detrimental to both computational cost and generalization, resulting in a need for hyperparameter tuning. We demonstrate the benefits of active set algorithms for automated data-driven feature selection. The proposed methods are implemented within the Atomic Cluster Expansion (ACE) framework. Computational tests conducted on a variety of benchmark datasets indicate that sparse ACE models consistently enhance computational efficiency, generalization accuracy and interpretability over dense ACE models. An added benefit of the proposed algorithms is that they produce entire paths of models with varying cost/accuracy ratio.
@article{torabi2025scalable, title={Scalable Data-Driven Basis Selection for Linear Machine Learning Interatomic Potentials}, author={Torabi, Tina and Militzer, Matthias and Friedlander, Michael P and Ortner, Christoph}, journal={arXiv preprint arXiv:2504.16418}, year={2025} } - [abs]
[bib]
Cardinality-constrained structured data-fitting problems
Z. Fan, H. Fang, M. P. Friedlander. Open J. Math. Optim.,
2023.
[Journal]
[DOI]A memory-efficient solution framework is proposed for the cardinality-constrained structured data-fitting problem. Dual-based atom-identification rules reveal the structure of the optimal primal solution from near-optimal dual solutions, which allows for a simple and computationally efficient algorithm that translates any feasible dual solution into a primal solution satisfying the cardinality constraint. Rigorous guarantees bound the quality of a near-optimal primal solution given any dual-based method that generates dual iterates converging to an optimal dual solution. Numerical experiments on real-world datasets support the analysis and demonstrate the efficiency of the proposed approach.
@misc{Fan2023Cardinality, Author = {Z. Fan and H. Fang and M. P. Friedlander}, Year = {2023}, Month = {July}, Doi = {10.48550/arxiv.2107.11373}, Title = {Cardinality-constrained structured data-fitting problems} } - [abs]
[bib]
From perspective maps to epigraphical projection
M. P. Friedlander, A. Goodwin, T. Hoheisel. Math. Oper. Res. 48(3): 1711-1740, 2023,
2023.
[Journal]The projection onto the epigraph or a level set of a closed proper convex function can be achieved by finding a root of a scalar equation that involves the proximal operator as a function of the proximal parameter. This paper develops the variational analysis of this scalar equation. The approach is based on a study of the variational-analytic properties of general convex optimization problems that are (partial) infimal projections of the the sum of the function in question and the perspective map of a convex kernel. When the kernel is the Euclidean norm squared, the solution map corresponds to the proximal map, and thus the variational properties derived for the general case apply to the proximal case. Properties of the value function and the corresponding solution map—including local Lipschitz continuity, directional differentiability, and semismoothness—are derived. An SC 1 optimization framework for computing epigraphical and level-set projections is thus established. Numerical experiments on 1-norm projection illustrate the effectiveness of the approach as compared with specialized algorithms.
@article{FriedlanderGoodwinHoheisel2023, author = {Friedlander, Michael P. and Goodwin, Ariel and Hoheisel, Tim}, title = {From Perspective Maps to Epigraphical Projections}, journal = {Mathematics of Operations Research}, volume = {48}, number = {3}, pages = {1711--1740}, year = {2023}, doi = {10.1287/moor.2022.1317} } - [abs]
[bib]
Knowledge-injected federated learning
Z. Fan, H. Fang, Z. Zhou, J. Pei, M. P. Friedlander, J. Hu, C. Li, Y. Zhang. arXiv:2208.07530,
2022.
[arXiv]
[GitHub]
[DOI]Federated learning is an emerging technique for training models from decentralized data sets. In many applications, data owners participating in the federated learning system hold not only the data but also a set of domain knowledge. Such knowledge includes human know-how and craftsmanship that can be extremely helpful to the federated learning task. In this work, we propose a federated learning framework that allows the injection of participants’ domain knowledge, where the key idea is to refine the global model with knowledge locally. The scenario we consider is motivated by a real industry-level application, and we demonstrate the effectiveness of our approach to this application.
@misc{Fan2022Knowledge, Author = {Z. Fan and H. Fang and Z. Zhou and J. Pei and M. P. Friedlander and J. Hu and C. Li and Y. Zhang}, Year = {2022}, Month = {August}, Doi = {10.48550/arxiv.2208.07530}, Title = {Knowledge-injected federated learning} } - [abs]
[bib]
Quantum algorithms for structured prediction
B. Sephehry, E. Iranmanesh, M. P. Friedlander, P. Ronagh. Quantum Machine Intelligence,
2022.
[Journal]
[DOI]We introduce two quantum algorithms for solving structured prediction problems. We first show that a stochastic gradient descent that uses the quantum minimum finding algorithm and takes its probabilistic failure into account solves the structured prediction problem with a runtime that scales with the square root of the size of the label space, and in O(1/𝜖) with respect to the precision, 𝜖, of the solution. Motivated by robust inference techniques in machine learning, we then introduce another quantum algorithm that solves a smooth approximation of the structured prediction problem with a similar quantum speedup in the size of the label space and a similar scaling in the precision parameter. In doing so, we analyze a variant of stochastic gradient descent for convex optimization in the presence of an additive error in the calculation of the gradients, and show that its convergence rate does not deteriorate if the additive errors are of the order 𝑂(√ϵ). This algorithm uses quantum Gibbs sampling at temperature Ω(𝜖) as a subroutine. Based on these theoretical observations, we propose a method for using quantum Gibbs samplers to combine feedforward neural networks with probabilistic graphical models for quantum machine learning. Our numerical results using Monte Carlo simulations on an image tagging task demonstrate the benefit of the approach.
@article{Sephehry2022Quantum, Author = {B. Sephehry and E. Iranmanesh and M. P. Friedlander and P. Ronagh}, Year = {2022}, Month = {June}, Journal = {Quantum Machine Intelligence}, Number = {2}, Volume = {4}, Doi = {10.1007/s42484-022-00078-w}, Title = {Quantum algorithms for structured prediction} } - [abs]
[bib]
Polar deconvolution of mixed signals
Z. Fan, H. Jeong, B. Joshi, M. P. Friedlander. IEEE Trans. Signal Processing,
2022.
[Journal]
[GitHub]The signal demixing problem seeks to separate a superposition of multiple signals into its constituent components. This paper describes a two-stage approach that first decompresses and subsequently deconvolves the noisy and undersampled observations of the superposition. Probabilistic error bounds are given on the accuracy with which this process approximates the individual signals. The theory of polar convolution of convex sets and gauge functions plays a central role in the analysis and solution process. If the measurements are random and the noise is bounded, this approach stably recovers low-complexity and mutually incoherent signals, with high probability and with optimal sample complexity. We develop an efficient algorithm, based on level-set and conditional-gradient methods, that solves the convex optimization problems with sublinear iteration complexity and linear space requirements. Numerical experiments on both real and synthetic data confirm the theory and the efficiency of the approach.
@article{fan2022deconvolution, author = {Z. Fan and H. Jeong and B. Joshi and M. P. Friedlander}, title = {Polar deconvolution of mixed signal}, journal = {IEEE Transactions on Signal Processing}, volume = {70}, pages = {2713--2727}, year = {2022}, month = {May}, doi = {10.1109/TSP.2022.3178191} } - [abs]
[bib]
Online mirror descent and dual averaging: keeping pace in the dynamic case
H. Fang, N J. A. Harvey, V. S. Portella, M. P. Friedlander. J. Machine Learning Research,
2022.Online mirror descent (OMD) and dual averaging (DA) are two fundamental algorithms for online convex optimization. They are known to have very similar (or even identical) performance guarantees in most scenarios when a fixed learning rate is used. However, for dynamic learning rates OMD is provably inferior to DA. It is known that, with a dynamic learning rate, OMD can suffer linear regret, even in common settings such as prediction with expert advice. This hints that the relationship between OMD and DA is not fully understood at present. In this paper, we modify the OMD algorithm by a simple technique that we call stabilization. We give essentially the same abstract regret bound for stabilized OMD and DA by modifying the classical OMD convergence analysis in a careful and modular way, yielding proofs that we believe to be clean and flexible. Simple corollaries of these bounds show that OMD with stabilization and DA enjoy the same performance guarantees in many applications even under dynamic learning rates. We also shed some light on the similarities between OMD and DA and show simple conditions under which stabilized OMD and DA generate the same iterates.
@article{fang2022online, author = {Huang Fang and Nicholas J. A. Harvey and Victor S. Portella and Michael P. Friedlander}, title = {Online Mirror Descent and Dual Averaging: Keeping Pace in the Dynamic Case}, journal = {Journal of Machine Learning Research}, year = {2022}, volume = {23}, number = {121}, pages = {1-38}, url = {http://jmlr.org/papers/v23/21-1027.html} } - [abs]
[bib]
A dual approach for federated learning
Z. Fan, H. Fang, M. P. Friedlander. arXiv 2201.11183,
2022.
[arXiv]
[GitHub]
[DOI]We study the federated optimization problem from a dual perspective and propose a new algorithm termed federated dual coordinate descent (FedDCD), which is based on a type of coordinate descent method developed by Necora et al. [Journal of Optimization Theory and Applications, 2017]. Additionally, we enhance the FedDCD method with inexact gradient oracles and Nesterov’s acceleration. We demonstrate theoretically that our proposed approach achieves better convergence rates than the state-of-the-art primal federated optimization algorithms under mild conditions. Numerical experiments on real-world datasets support our analysis.
@misc{Fan2022Dual, Author = {Z. Fan and H. Fang and M. P. Friedlander}, Year = {2022}, Month = {January}, Doi = {10.48550/arxiv.2201.11183}, Title = {A dual approach for federated learning} } - [abs]
[bib]
Improving fairness for data valuation in horizontal federated learning
Z. Fan, H. Fang, Z. Zhou, J. Pei, M. P. Friedlander, C. Liu, Y. Zhang. Intern Conf Data Engineering (ICDE),
2022.
[IEEE]
[arXiv]
[DOI]Federated learning is an emerging decentralized machine learning scheme that allows multiple data owners to work collaboratively while ensuring data privacy. The success of federated learning depends largely on the participation of data owners. To sustain and encourage data owners’ participation, it is crucial to fairly evaluate the quality of the data provided by the data owners and reward them correspondingly. Federated Shapley value, recently proposed by Wang et al. [Federated Learning, 2020], is a measure for data value under the framework of federated learning that satisfies many desired properties for data valuation. However, there are still factors of potential unfairness in the design of federated Shapley value because two data owners with the same local data may not receive the same evaluation. We propose a new measure called completed federated Shapley value to improve the fairness of federated Shapley value. The design depends on completing a matrix consisting of all the possible contributions by different subsets of the data owners. It is shown under mild conditions that this matrix is approximately low-rank by leveraging concepts and tools from optimization. Both theoretical analysis and empirical evaluation verify that the proposed measure does improve fairness in many circumstances.
@article{Fan2022Improving, Author = {Z. Fan and H. Fang and Z. Zhou and J. Pei and M. P. Friedlander and C. Liu and Y. Zhang}, Year = {2022}, Month = {January}, Journal = {2022 IEEE 38th International Conference on Data Engineering (ICDE)}, Pages = {2440-2453}, Doi = {10.1109/icde53745.2022.00228}, Title = {Improving fairness for data valuation in horizontal federated learning} } - [abs]
[bib]
Fair and efficient contribution valuation for vertical federated learning
Z. Fan, H. Fang, Z. Zhou, J. Pei, M. P. Friedlander, Y. Zhang. arXiv:2201.02658,
2022.
[arXiv]
[DOI]Federated learning is a popular technology for training machine learning models on distributed data sources without sharing data. Vertical federated learning or feature-based federated learning applies to the cases that different data sources share the same sample ID space but differ in feature space. To ensure the data owners’ long-term engagement, it is critical to objectively assess the contribution from each data source and recompense them accordingly. The Shapley value (SV) is a provably fair contribution valuation metric originated from cooperative game theory. However, computing the SV requires extensively retraining the model on each subset of data sources, which causes prohibitively high communication costs in federated learning. We propose a contribution valuation metric called vertical federated Shapley value (VerFedSV) based on SV. We show that VerFedSV not only satisfies many desirable properties for fairness but is also efficient to compute, and can be adapted to both synchronous and asynchronous vertical federated learning algorithms. Both theoretical analysis and extensive experimental results verify the fairness, efficiency, and adaptability of VerFedSV.
@misc{Fan2022Fair, Author = {Z. Fan and H. Fang and Z. Zhou and J. Pei and M. P. Friedlander and Y. Zhang}, Year = {2022}, Month = {January}, Doi = {10.48550/arxiv.2201.02658}, Title = {Fair and efficient contribution valuation for vertical federated learning} } - [abs]
[bib]
NBIHT: An efficient algorithm for 1-bit compressed sensing with optimal error decay rate
M. P. Friedlander, H. Jeong, Y. Plan, O. Yιlmaz. IEEE Trans Info Theory,
2021.
[DOI]The Binary Iterative Hard Thresholding (BIHT) algorithm is a popular reconstruction method for one-bit compressed sensing due to its simplicity and fast empirical convergence. There have been several works about BIHT but a theoretical understanding of the corresponding approximation error and convergence rate still remains open. This paper shows that the normalized version of BIHT (NBHIT) achieves an approximation error rate optimal up to logarithmic factors. More precisely, using m one-bit measurements of an s-sparse vector x, we prove that the approximation error of NBIHT is of order Ο(1/m) up to logarithmic factors, which matches the information-theoretic lower bound Ω(1/m) proved by Jacques, Laska, Boufounos, and Baraniuk in 2013. To our knowledge, this is the first theoretical analysis of a BIHT-type algorithm that explains the optimal rate of error decay empirically observed in the literature. This also makes NBIHT the first provable computationally-efficient one-bit compressed sensing algorithm that breaks the inverse square root error decay rate Ο(1/√m).
@misc{Friedlander2021Nbiht, Author = {M. P. Friedlander and H. Jeong and Y. Plan and O. Yιlmaz}, Year = {2021}, Month = {November}, Number = {2}, Volume = {68}, Pages = {1157-1177}, Doi = {10.1109/tit.2021.3124598}, Title = {NBIHT: An efficient algorithm for 1-bit compressed sensing with optimal error decay rate} } - [abs]
Fast convergence of the stochastic subgradient method under interpolation
H. Fang, Z. Fan, M. P. Friedlander. Intern. Conf. Learning Representations (ICLR),
2021.This paper studies the behaviour of the stochastic subgradient descent (SSGD) method applied to over-parameterized nonsmooth optimization problems that satisfy an interpolation condition. By leveraging the composite structure of the empirical risk minimization problems, we prove that when interpolation holds, SSGD converges at the same rate as the stochastic gradient descent (SGD) method applied to smooth problems. Our analysis provides a partial explanation for the empirical observation that sometimes SGD and SSGD behave similarly for training smooth and nonsmooth machine learning models. We also prove that the rates we derive are optimal for the subgradient method in the convex and interpolation setting.
- [abs]
[bib]
Atomic decomposition via polar alignment: the geometry of structured optimization
Z. Fan, H. Jeong, Y. Sun, M. P. Friedlander. Foundations and Trends in Optimization, 3(4):280–366,
2020.
[Journal]
[DOI]Structured optimization uses a prescribed set of atoms to assemble a solution that fits a model to data. Polarity, which extends the familiar notion of orthogonality from linear sets to general convex sets, plays a special role in a simple and geometric form of convex duality. This duality correspondence yields a general notion of alignment that leads to an intuitive and complete description of how atoms participate in the final decomposition of the solution. The resulting geometric perspective leads to variations of existing algorithms effective for large-scale problems. We illustrate these ideas with many examples, including applications in matrix completion and morphological component analysis for the separation of mixtures of signals.
@article{Fan2020Atomic, Author = {Z. Fan and H. Jeong and Y. Sun and M. P. Friedlander}, Year = {2020}, Month = {November}, Journal = {Foundations and Trends® in Optimization}, Number = {4}, Volume = {3}, Pages = {280-366}, Doi = {10.1561/2400000028}, Title = {Atomic decomposition via polar alignment: the geometry of structured optimization} } - [abs]
[bib]
Implementing a smooth exact penalty function for equality-constrained nonlinear optimization
R. Estrin, M. P. Friedlander, D. Orban, M. A. Saunders. SIAM J. Sci. Comput., 42(3), A1809–A1835,
2020.
[Journal]
[DOI]
[more]We develop a general equality-constrained nonlinear optimization algorithm based on a smooth penalty function proposed by Fletcher in 1970. Although it was historically considered to be computationally prohibitive in practice, we demonstrate that the computational kernels required are no more expensive than other widely accepted methods for nonlinear optimization. The main kernel required to evaluate the penalty function and its derivatives is solving a structured linear system. We show how to solve this system efficiently by storing a single factorization at each iteration when the matrices are available explicitly. We further show how to adapt the penalty function to the class of factorization-free algorithms by solving the linear system iteratively. The penalty function therefore has promise when the linear system can be solved efficiently, e.g., for PDE-constrained optimization problems where efficient preconditioners exist. We discuss extensions including handling simple constraints explicitly, regularizing the penalty function, and inexact evaluation of the penalty function and its gradients. We demonstrate the merits of the approach and its various features on some nonlinear programs from a standard test set, and some PDE-constrained optimization problems.
@article{Estrin2020Implementing, Author = {R. Estrin and M. P. Friedlander and D. Orban and M. A. Saunders}, Year = {2020}, Month = {June}, Journal = {arXiv (Cornell University)}, Doi = {10.1137/19m1238265}, Title = {Implementing a smooth exact penalty function for equality-constrained nonlinear optimization} } - [abs]
[bib]
Implementing a smooth exact penalty function for general constrained nonlinear optimization
R. Estrin, M. P. Friedlander, D. Orban, M. A. Saunders. SIAM J. Sci. Comput., 42(3), A1836–A1859,
2020.
[Journal]
[DOI]
[more]We build upon Estrin et al. (2019) to develop a general constrained nonlinear optimization algorithm based on a smooth penalty function proposed by Fletcher (1970, 1973b). Although Fletcher’s approach has historically been considered impractical, we show that the computational kernels required are no more expensive than those in other widely accepted methods for nonlinear optimization. The main kernel for evaluating the penalty function and its derivatives solves structured linear systems. When the matrices are available explicitly, we store a single factorization each iteration. Otherwise, we obtain a factorization-free optimization algorithm by solving each linear system iteratively. The penalty function shows promise in cases where the linear systems can be solved efficiently, e.g., PDE-constrained optimization problems when efficient preconditioners exist. We demonstrate the merits of the approach, and give numerical results on several PDE-constrained and standard test problems.
@article{Estrin2020Implementing, Author = {R. Estrin and M. P. Friedlander and D. Orban and M. A. Saunders}, Year = {2020}, Month = {June}, Journal = {SIAM Journal on Scientific Computing}, Number = {3}, Volume = {42}, Pages = {A1836-A1859}, Doi = {10.1137/19m1255069}, Title = {Implementing a smooth exact penalty function for general constrained nonlinear optimization} } - [abs]
[bib]
A perturbation view of level-set methods for convex optimization
R. Estrin, M. P. Friedlander. Optimization Letters,
2020.
[Journal]
[DOI]Level-set methods for convex optimization are predicated on the idea that certain problems can be parameterized so that their solutions can be recovered as the limiting process of a root-finding procedure. This idea emerges time and again across a range of algorithms for convex problems. Here we demonstrate that strong duality is a necessary condition for the level-set approach to succeed. In the absence of strong duality, the level-set method identifies ε-infeasible points that do not converge to a feasible point as ε tends to zero. The level-set approach is also used as a proof technique for establishing sufficient conditions for strong duality that are different from Slater’s constraint qualification.
@misc{Estrin2020Perturbation, Author = {R. Estrin and M. P. Friedlander}, Year = {2020}, Month = {June}, Number = {8}, Volume = {14}, Pages = {1989-2006}, Doi = {10.1007/s11590-020-01609-9}, Title = {A perturbation view of level-set methods for convex optimization} } - [abs]
[bib]
Online mirror descent and dual averaging: keeping pace in the dynamic case
H. Fang, N J. A. Harvey, V. S. Portella, M. P. Friedlander. Intern. Conf. Machine Learning (ICML),
2020.
[arXiv]
[video]
[DOI]Online mirror descent (OMD) and dual averaging (DA) are two fundamental algorithms for online convex optimization. They are known to have very similar (or even identical) performance guarantees in most scenarios when a fixed learning rate is used. However, for dynamic learning rates OMD is provably inferior to DA. It is known that, with a dynamic learning rate, OMD can suffer linear regret, even in common settings such as prediction with expert advice. This hints that the relationship between OMD and DA is not fully understood at present. In this paper, we modify the OMD algorithm by a simple technique that we call stabilization. We give essentially the same abstract regret bound for stabilized OMD and DA by modifying the classical OMD convergence analysis in a careful and modular way, yielding proofs that we believe to be clean and flexible. Simple corollaries of these bounds show that OMD with stabilization and DA enjoy the same performance guarantees in many applications even under dynamic learning rates. We also shed some light on the similarities between OMD and DA and show simple conditions under which stabilized OMD and DA generate the same iterates.
@article{Fang2020Online, Author = {H. Fang and N J. A. Harvey and V. S. Portella and M. P. Friedlander}, Year = {2020}, Month = {June}, Journal = {International Conference on Machine Learning}, Volume = {1}, Pages = {3008-3017}, Title = {Online mirror descent and dual averaging: keeping pace in the dynamic case} } - [abs]
[bib]
Greed meets sparsity: understanding and improving greedy coordinate descent for sparse optimization
H. Fang, Z. Fan, Y. Sun, M. P. Friedlander. Intern. Conf. Artificial Intelligence and Statistics (AISTATS),
2020.We consider greedy coordinate descent (GCD) for composite problems with sparsity inducing regularizers, including 1-norm regularization and non-negative constraints. Empirical evidence strongly suggests that GCD, when initialized with the zero vector, has an implicit screening ability that usually selects at each iteration coordinates that at are nonzero at the solution. Thus, for problems with sparse solutions, GCD can converge significantly faster than randomized coordinate descent. We present an improved convergence analysis of GCD for sparse optimization, and a formal analysis of its screening properties. We also propose and analyze an improved selection rule with stronger ability to produce sparse iterates. Numerical experiments on both synthetic and real-world data support our analysis and the effectiveness of the proposed selection rule.
@article{Fang2020Greed, Author = {H. Fang and Z. Fan and Y. Sun and M. P. Friedlander}, Year = {2020}, Month = {February}, Journal = {International Conference on Artificial Intelligence and Statistics}, Pages = {434-444}, Title = {Greed meets sparsity: understanding and improving greedy coordinate descent for sparse optimization} } - [abs]
[bib]
Bundle methods for dual atomic pursuit
Z. Fan, Y. Sun, M. P. Friedlander. Asilomar Conf. Signals, Systems, Computers (ACSSC),
2019.
[arXiv]
[DOI]The aim of structured optimization is to assemble a solution, using a given set of (possibly uncountably infinite) atoms, to fit a model to data. A two-stage algorithm based on gauge duality and bundle method is proposed. The first stage discovers the optimal atomic support for the primal problem by solving a sequence of approximations of the dual problem using a bundle-type method. The second stage recovers the approximate primal solution using the atoms discovered in the first stage. The overall approach leads to implementable and efficient algorithms for large problems.
@article{Fan2019Bundle, Author = {Z. Fan and Y. Sun and M. P. Friedlander}, Year = {2019}, Month = {November}, Volume = {59}, Pages = {264-270}, Doi = {10.1109/ieeeconf44664.2019.9048711}, Title = {Bundle methods for dual atomic pursuit} } - [abs]
[bib]
One-shot atomic detection
Y. Sun, M. P. Friedlander. IEEE Intern. Workshop Comput. Adv. Multi-Sensor Adaptive Proc. (CAMSAP),
2019.
[DOI]Feature selection in data science involves identifying the most prominent and uncorrelated features in the data, which can be useful for compression and interpretability. If these feature can be easily extracted, then a model can be trained over a reduced set of weights, which leads to more efficient training and possibly more robust classifiers. There are many approaches to feature selection; in this work, we propose screening the “atoms” of a gradient of a loss function taken at a random point. We illustrate this approach on sparse and low-rank optimization problems. Despite the simplicity of the approach, we are often able to select the dominant features easily, and greatly improve the runtime and robustness in training overparametrized models.
@article{Sun2019One, Author = {Y. Sun and M. P. Friedlander}, Year = {2019}, Month = {September}, Volume = {16}, Pages = {1-5}, Doi = {10.1109/camsap45676.2019.9022441}, Title = {One-shot atomic detection} } - [abs]
[bib]
Fast training for large-scale one-versus-all linear classifiers using tree-structured initialization
H. Fang, M. Cheng, C.-J. Hsieh, M. P. Friedlander. SIAM Intern. Conf. Data Mining (SDM),
2019.
[DOI]We consider the problem of training one-versus-all (OVA) linear classifiers for multiclass or multilabel classification when the number of labels is large. A naive extension of OVA to this problem, even with hundreds of cores, usually requires hours for training on large real world datasets. We propose a novel algorithm called OVA-Primal++ that speeds up the training of OVA by using a tree-structured training order, where each classifier is trained using its parent’s classifier as initialization. OVA-Primal++ is both theoretically and empirically faster than the naive OVA algorithm, and yet still enjoys the same highly parallelizability and small memory footprint. Extensive experiments on multiclass and multilabel classification datasets validate the effectiveness of our method.
@incollection{Fang2019Fast, Author = {H. Fang and M. Cheng and C.-J. Hsieh and M. P. Friedlander}, Year = {2019}, Month = {February}, Pages = {280-288}, Doi = {10.1137/1.9781611975673.32}, Title = {Fast training for large-scale one-versus-all linear classifiers using tree-structured initialization} } - [abs]
[bib]
Polar convolution
M. P. Friedlander, I. Macêdo, T.K. Pong. SIAM J. Optimization, 29(2):1366–1391,
2019.
[DOI]The Moreau envelope is one of the key convexity-preserving functional operations in convex analysis, and it is central to the development and analysis of many approaches for solving convex optimization problems. This paper develops the theory for a parallel convolution operation, called the polar envelope, specialized to gauge functions. We show that many important properties of the Moreau envelope and the proximal map are mirrored by the polar envelope and its corresponding proximal map. These properties include smoothness of the envelope function, uniqueness and continuity of the proximal map, a role in duality and in the construction of algorithms for gauge optimization. We thus establish a suite of tools with which to build algorithms for this family of optimization problems.
@article{Friedlander2019Polar, Author = {M. P. Friedlander and I. Macêdo and T.K. Pong}, Year = {2019}, Month = {February}, Journal = {SIAM Journal on Optimization}, Number = {2}, Volume = {29}, Pages = {1366-1391}, Doi = {10.1137/18m1209088}, Title = {Polar convolution} } - [abs]
[bib]
Smooth structured prediction using quantum and classical Gibbs samplers
B. Sephehry, E. Iranmanesh, M. P. Friedlander, P. Ronagh. Adiabatic Quantum Computing Conference (AQC), arXiv:1809.04091,
2019.We introduce a quantum algorithm for solving structured prediction problems with a runtime that scales with the square root of the size of the label space, but scales in \(\tilde{O}(\epsilon^{-3.5})\) with respect to the precision, \(\epsilon\), of the solution. In doing so, we analyze a stochastic gradient algorithm for convex optimization in the presence of an additive error in the calculation of the gradients, and show that its convergence rate does not deteriorate if the additive errors are of the order \(O(\sqrt{\epsilon})\). Our algorithm uses quantum Gibbs sampling at temperature \(\Omega(\epsilon)\) as a subroutine. Based on these theoretical observations, we propose a method for using quantum Gibbs samplers to combine feedforward neural networks with probabilistic graphical models for quantum machine learning. Numerical results using Monte Carlo simulations on an image tagging task demonstrate the benefit of the approach
@article{Sephehry2019Smooth, Author = {B. Sephehry and E. Iranmanesh and M. P. Friedlander and P. Ronagh}, Year = {2019}, Month = {January}, Journal = {arXiv (Cornell University)}, Title = {Smooth structured prediction using quantum and classical Gibbs samplers} } - [abs]
[bib]
Foundations of gauge and perspective duality
A. Y. Aravkin, J. V. Burke, D. Drusvyatskiy, M. P. Friedlander, K. MacPhee. SIAM J. Optimization, 28(3):2406–2434,
2018.
[DOI]We revisit the foundations of gauge duality and demonstrate that it can be explained using a modern approach to duality based on a perturbation framework. We therefore put gauge duality and Fenchel-Rockafellar duality on equal footing, including explaining gauge dual variables as sensitivity measures, and showing how to recover primal solutions from those of the gauge dual. This vantage point allows a direct proof that optimal solutions of the Fenchel-Rockafellar dual of the gauge dual are precisely the primal solutions rescaled by the optimal value. We extend the gauge duality framework to the setting in which the functional components are general nonnegative convex functions, including problems with piecewise linear quadratic functions and constraints that arise from generalized linear models used in regression.
@misc{Aravkin2018Foundations, Author = {A. Y. Aravkin and J. V. Burke and D. Drusvyatskiy and M. P. Friedlander and K. MacPhee}, Year = {2018}, Month = {August}, Number = {3}, Volume = {28}, Pages = {2406-2434}, Doi = {10.1137/17m1119020}, Title = {Foundations of gauge and perspective duality} } - [abs]
[bib]
Level-set methods for convex optimization
A. Y. Aravkin, J. V. Burke, D. Drusvyatskiy, M. P. Friedlander, S. Roy. Mathematical Programming, 174(1-2):359–390,
2018.
[Journal]
[DOI]Convex optimization problems arising in applications often have favorable objective functions and complicated constraints, thereby precluding first-order methods from being immediately applicable. We describe an approach that exchanges the roles of the objective and constraint functions, and instead approximately solves a sequence of parametric level-set problems. A zero-finding procedure, based on inexact function evaluations and possibly inexact derivative information, leads to an efficient solution scheme for the original problem. We describe the theoretical and practical properties of this approach for a broad range of problems, including low-rank semidefinite optimization, sparse optimization, and generalized linear models for inference.
@misc{Aravkin2018Level, Author = {A. Y. Aravkin and J. V. Burke and D. Drusvyatskiy and M. P. Friedlander and S. Roy}, Year = {2018}, Month = {April}, Number = {1-2}, Volume = {174}, Pages = {359-390}, Doi = {10.1007/s10107-018-1351-8}, Title = {Level-set methods for convex optimization} } - [abs]
[bib]
Efficient evaluation of scaled proximal operators
M. P. Friedlander, G. Goh. Electronic Trans. Numerical Analysis, 46:1–22,
2017.
[DOI]Quadratic-support functions, cf. Aravkin, Burke, and Pillonetto [J. Mach. Learn. Res., 14 (2013)], constitute a parametric family of convex functions that includes a range of useful regularization terms found in applications of convex optimization. We show how an interior method can be used to efficiently compute the proximal operator of a quadratic-support function under different metrics. When the metric and the function have the right structure, the proximal map can be computed with costs nearly linear in the input size. We describe how to use this approach to implement quasi-Newton methods for a rich class of nonsmooth problems that arise, for example, in sparse optimization, image denoising, and sparse logistic regression.
@misc{Friedlander2017Efficient, Author = {M. P. Friedlander and G. Goh}, Year = {2017}, Month = {February}, Doi = {10.48550/arxiv.1603.05719}, Title = {Efficient evaluation of scaled proximal operators} } - [abs]
[bib]
Satisfying real-world goals with dataset constraints
G. Goh, A. Cotter, M. Gupta, M. P. Friedlander. Advances in Neural Information Processing Systems 29 (NIPS),
2016.
[DOI]The goal of minimizing misclassification error on a training set is often just one of several real-world goals that might be defined on different datasets. For example, one may require a classifier to also make positive predictions at some specified rate for some subpopulation (fairness), or to achieve a specified empirical recall. Other real-world goals include reducing churn with respect to a previously deployed model, or stabilizing online training. In this paper we propose handling multiple goals on multiple datasets by training with dataset constraints, using the ramp penalty to accurately quantify costs, and present an efficient algorithm to approximately optimize the resulting non-convex constrained optimization problem. Experiments on both benchmark and real-world industry datasets demonstrate the effectiveness of our approach.
@article{Goh2016Satisfying, Author = {G. Goh and A. Cotter and M. Gupta and M. P. Friedlander}, Year = {2016}, Month = {December}, Journal = {Neural Information Processing Systems}, Volume = {29}, Pages = {2423-2431}, Title = {Satisfying real-world goals with dataset constraints} } - [abs]
[bib]
Social resistance
M. P. Friedlander, N. Krislock, T. K. Pong. IEEE Comput. Sci. Eng., 8(2):98-103; reprinted in Computing Edge,
2016.
[DOI]
[Computing Edge]This case study concerns an important question in social networks: How closely related are two people in a network? We measure the strength of the relationship between two people in terms of their resistance distance, a notion of distance that has its origin in the study of electrical networks. In that context, the resistance distance measures the electrical resistance between the two nodes in a network, assuming unit resistance on each edge. Its relationship with the information distance in social networks was recently identified. For large networks, this distance measure can be expensive to compute. As the network changes, it’s useful to have an inexpensive way to make updates. We’ll study how various techniques for updating matrix factorizations can help efficiently compute the resistance distances with small changes in the network. We focus on Cholesky rank-one updates, and also briefly discuss the Sherman-Morrison-Woodbury formula (as one of the last exercises). We expect that you’re familiar with the idea of the Cholesky factorization of positive definite matrices. We’ll guide you through a derivation of the formula for the Cholesky rank-one update, and introduce two aspects that maybe new for you. First, you’ll have to figure out the vector used for the rank-one update when one additional connection in the network is identified. Second, when matrices are large, you’ll observe that there can be a big difference in computational effort when the computation involves matrix-matrix multiplications rather than only matrix-vector multiplications.
@article{Friedlander2016Social, Author = {M. P. Friedlander and N. Krislock and T. K. Pong}, Year = {2016}, Month = {August}, Journal = {Computing in Science & Engineering}, Number = {2}, Volume = {18}, Pages = {98-103}, Doi = {10.1109/mcse.2016.30}, Title = {Social resistance} } - [abs]
[bib]
Low-rank spectral optimization via gauge duality
M. P. Friedlander, I. Macêdo. SIAM J. Scientific Computing, 38(3):A1616–A1638,
2016.
[DOI]
[more]Various applications in signal processing and machine learning give rise to highly structured spectral optimization problems characterized by low-rank solutions. Two important examples that motivate this work are optimization problems from phase retrieval and from blind deconvolution, which are designed to yield rank-1 solutions. An algorithm is described that is based on solving a certain constrained eigenvalue optimization problem that corresponds to the gauge dual which, unlike the more typical Lagrange dual, has an especially simple constraint. The dominant cost at each iteration is the computation of rightmost eigenpairs of a Hermitian operator. A range of numerical examples illustrate the scalability of the approach.
@article{Friedlander2016Low, Author = {M. P. Friedlander and I. Macêdo}, Year = {2016}, Month = {June}, Journal = {SIAM Journal on Scientific Computing}, Number = {3}, Volume = {38}, Pages = {A1616-A1638}, Doi = {10.1137/15m1034283}, Title = {Low-rank spectral optimization via gauge duality} } - [abs]
[bib]
Coordinate descent converges faster with the Gauss-Southwell rule than random selection
J. Nutini, M. Schmidt, I. H. Laradji, M. P. Friedlander, H. Koepke. Intern. Conf. Machine Learning (ICML),
2015.
[Preprint]
[NIPS PDF]
[ICML PDF]
[ICML Supplementary]
[DOI]There has been significant recent work on the theory and application of randomized coordinate descent algorithms, beginning with the work of Nesterov [SIAM J. Optim., 22(2), 2012], who showed that a random-coordinate selection rule achieves the same convergence rate as the Gauss-Southwell selection rule. This result suggests that we should never use the Gauss-Southwell rule, as it is typically much more expensive than random selection. However, the empirical behaviours of these algorithms contradict this theoretical result: in applications where the computational costs of the selection rules are comparable, the Gauss-Southwell selection rule tends to perform substantially better than random coordinate selection. We give a simple analysis of the Gauss-Southwell rule showing that–except in extreme cases–its convergence rate is faster than choosing random coordinates. Further, in this work we (i) show that exact coordinate optimization improves the convergence rate for certain sparse problems, (ii) propose a Gauss-Southwell-Lipschitz rule that gives an even faster convergence rate given knowledge of the Lipschitz constants of the partial derivatives, (iii) analyze the effect of approximate Gauss-Southwell rules, and (iv) analyze proximal-gradient variants of the Gauss-Southwell rule.
@article{Nutini2015Coordinate, Author = {J. Nutini and M. Schmidt and I. H. Laradji and M. P. Friedlander and H. Koepke}, Year = {2015}, Month = {July}, Pages = {1632-1641}, Title = {Coordinate descent converges faster with the Gauss-Southwell rule than random selection} } - [abs]
[bib]
Gauge optimization and duality
M. P. Friedlander, I. Macêdo, T. K. Pong. SIAM J. Optimization, 24(4):1999–2022,
2014.Gauge functions significantly generalize the notion of a norm, and gauge optimization, as defined by Freund (1987), seeks the element of a convex set that is minimal with respect to a gauge function. This conceptually simple problem can be used to model a remarkable array of useful problems, including a special case of conic optimization, and related problems that arise in machine learning and signal processing. The gauge structure of these problems allows for a special kind of duality framework. This paper explores the duality framework proposed by Freund, and proposes a particular form of the problem that exposes some useful properties of the gauge optimization framework (such as the variational properties of its value function), and yet maintains most of the generality of the abstract form of gauge optimization.
@Article{FriedlanderMacedoPong:2014, author = {M. P. Friedlander and I. Mac\\^edo and T. K. Pong}, title = {Gauge optimization and duality}, year = 2014, journal = {SIAM Journal on Optimization}, number = 4, volume = 24, pages = {1999-2022}, doi = {10.1137/130940785} url = {http://epubs.siam.org/doi/abs/10.1137/130940785} } - [abs]
[bib]
Variational properties of value functions
A. Aravkin, J. V. Burke, M. P. Friedlander. SIAM J. Optimization, 23(3):1689–1717,
2013.
[DOI]Regularization plays a key role in a variety of optimization formulations of inverse problems. A recurring theme in regularization approaches is the selection of regularization parameters, and their effect on the solution and on the optimal value of the optimization problem. The sensitivity of the value function to the regularization parameter can be linked directly to the Lagrange multipliers. This paper characterizes the variational properties of the value functions for a broad class of convex formulations, which are not all covered by standard Lagrange multiplier theory. An inverse function theorem is given that links the value functions of different regularization formulations (not necessarily convex). These results have implications for the selection of regularization parameters, and the development of specialized algorithms. Numerical examples illustrate the theoretical results.
@article{Aravkin2013Variational, Author = {A. Aravkin and J. V. Burke and M. P. Friedlander}, Year = {2013}, Month = {August}, Journal = {SIAM Journal on Optimization}, Number = {3}, Volume = {23}, Pages = {1689-1717}, Doi = {10.1137/120899157}, Title = {Variational properties of value functions} } - [abs]
[bib]
Fast dual variational inference for non-conjugate latent gaussian models
M. E. Khan, A. Y. Aravkin, M. P. Friedlander, M. Seeger. Intern. Conf. Machine Learning (ICML),
2013.Latent Gaussian models (LGMs) are widely used in statistics and machine learning. Bayesian inference in non-conjugate LGMs is difficult due to intractable integrals involving the Gaussian prior and non-conjugate likelihoods. Algorithms based on variational Gaussian (VG) approximations are widely employed since they strike a favorable balance between accuracy, generality, speed, and ease of use. However, the structure of the optimization problems associated with these approximations remains poorly understood, and standard solvers take too long to converge. We derive a novel dual variational inference approach that exploits the convexity property of the VG approximations. We obtain an algorithm that solves a convex optimization problem, reduces the number of variational parameters, and converges much faster than previous methods. Using realworld data, we demonstrate these advantages on a variety of LGMs, including Gaussian process classification, and latent Gaussian Markov random fields.
@article{Khan2013Fast, Author = {M. E. Khan and A. Y. Aravkin and M. P. Friedlander and M. Seeger}, Year = {2013}, Month = {May}, Pages = {951-959}, Title = {Fast dual variational inference for non-conjugate latent gaussian models} } - [abs]
[bib]
Tail bounds for stochastic approximation
M. P. Friedlander, G. Goh. arXiv:1304.5586,
2013.
[arXiv]
[DOI]Stochastic-approximation gradient methods are attractive for large-scale convex optimization because they offer inexpensive iterations. They are especially popular in data-fitting and machine-learning applications where the data arrives in a continuous stream, or it is necessary to minimize large sums of functions. It is known that by appropriately decreasing the variance of the error at each iteration, the expected rate of convergence matches that of the underlying deterministic gradient method. Conditions are given under which this happens with overwhelming probability.
@misc{Friedlander2013Tail, Author = {M. P. Friedlander and G. Goh}, Year = {2013}, Month = {April}, Doi = {10.48550/arxiv.1304.5586}, Title = {Tail bounds for stochastic approximation} } - [abs]
[bib]
Hybrid deterministic-stochastic methods for data fitting
M. P. Friedlander, M. Schmidt. SIAM J. Scientific Computing, 34(3):A1380–A1405,
2012.
[more]Many structured data-fitting applications require the solution of an optimization problem involving a sum over a potentially large number of measurements. Incremental gradient algorithms offer inexpensive iterations by sampling a subset of the terms in the sum; these methods can make great progress initially, but often slow as they approach a solution. In contrast, full-gradient methods achieve steady convergence at the expense of evaluating the full objective and gradient on each iteration. We explore hybrid methods that exhibit the benefits of both approaches. Rate-of-convergence analysis shows that by controlling the sample size in an incremental-gradient algorithm, it is possible to maintain the steady convergence rates of full-gradient methods. We detail a practical quasi-Newton implementation based on this approach. Numerical experiments illustrate its potential benefits.
@article{FriedlanderSchmidt2012, author = {M. P. Friedlander and M. Schmidt}, title = {Hybrid deterministic-stochastic methods for data fitting}, journal = {SIAM J. Scientific Computing}, volume = {34}, number = {3}, pages = {A1380–A1405}, year = {2012} } - [abs]
[bib]
A primal-dual regularized interior-point method for convex quadratic programs
M. P. Friedlander, D. Orban. Mathematical Programming Computation, 4(1):71–107,
2012.
[Preprint]
[Code]
[DOI]
[more]Interior-point methods in augmented form for linear and convex quadratic programming require the solution of a sequence of symmetric indefinite linear systems which are used to derive search directions. Safeguards are typically required in order to handle free variables or rank-deficient Jacobians. We propose a consistent framework and accompanying theoretical justification for regularizing these linear systems. Our approach can be interpreted as a simultaneous proximal-point regularization of the primal and dual problems. The regularization is termed exact to emphasize that, although the problems are regularized, the algorithm recovers a solution of the original problem, for appropriate values of the regularization parameters.
@article{Friedlander2012Primal, Author = {M. P. Friedlander and D. Orban}, Year = {2012}, Month = {February}, Journal = {Mathematical Programming Computation}, Number = {1}, Volume = {4}, Pages = {71-107}, Doi = {10.1007/s12532-012-0035-2}, Title = {A primal-dual regularized interior-point method for convex quadratic programs} } - [abs]
[bib]
Recovering compressively sampled signals using partial support information
M. P. Friedlander, H. Mansour, R. Saab, Ö. Yılmaz. IEEE Trans. Info. Theory, 58(2):1122–1134,
2012.
[Journal Link]
[DOI]In this paper we study recovery conditions of weighted \(\ell_1\) minimization for signal reconstruction from compressed sensing measurements when partial support information is available. We show that if at least 50% of the (partial) support information is accurate, then weighted \(\ell_1\) minimization is stable and robust under weaker conditions than the analogous conditions for standard \(\ell_1\) minimization. Moreover, weighted \(\ell_1\) minimization provides better bounds on the reconstruction error in terms of the measurement noise and the compressibility of the signal to be recovered. We illustrate our results with extensive numerical experiments on synthetic data and real audio and video signals.
@article{Friedlander2012Recovering, Author = {M. P. Friedlander and H. Mansour and R. Saab and Ö. Yılmaz}, Year = {2012}, Month = {February}, Journal = {IEEE Transactions on Information Theory}, Number = {2}, Volume = {58}, Pages = {1122-1134}, Doi = {10.1109/tit.2011.2167214}, Title = {Recovering compressively sampled signals using partial support information} } - [abs]
[bib]
Fighting the curse of dimensionality: compressive sensing in exploration seismology
F. J. Herrmann, M. P. Friedlander, Ö. Yılmaz. IEEE Signal Processing Magazine, 29(3):88–100,
2012.
[DOI]Many seismic exploration techniques rely on the collection of massive data volumes that are mined for information during processing. This approach has been extremely successful, but current efforts toward higher resolution images in increasingly complicated regions of Earth continue to reveal fundamental shortcomings in our typical workflows. The “curse” of dimensionality is the main roadblock and is exemplified by Nyquist’s sampling criterion, which disproportionately strains current acquisition and processing systems as the size and desired resolution of our survey areas continues to increase.
@article{Herrmann2012Fighting, Author = {F. J. Herrmann and M. P. Friedlander and Ö. Yılmaz}, Year = {2012}, Month = {January}, Journal = {IEEE Signal Processing Magazine}, Number = {3}, Volume = {29}, Pages = {88-100}, Doi = {10.1109/msp.2012.2185859}, Title = {Fighting the curse of dimensionality: compressive sensing in exploration seismology} } - [abs]
[bib]
Robust inversion via semistochastic dimensionality reduction
A. Aravkin, M. P. Friedlander, T. van Leeuwen. Intern. Conf. Acoustics, Speech, and Signal Processing (ICASSP),
2012.
[DOI]We consider a class of inverse problems where it is possible to aggregate the results of multiple experiments. This class includes problems where the forward model is the solution operator to linear ODEs or PDEs. The tremendous size of such problems motivates dimensionality reduction techniques based on randomly mixing experiments. These techniques break down, however, when robust data-fitting formulations are used, which are essential in cases of missing data, unusually large errors, and systematic features in the data unexplained by the forward model. We survey robust methods within a statistical framework, and propose a semistochastic optimization approach that allows dimensionality reduction. The efficacy of the methods are demonstrated for a large-scale seismic inverse problem using the robust Student’s t-distribution, where a useful synthetic velocity model is recovered in the extreme scenario of 60% data missing at random. The semistochastic approach achieves this recovery using 20% of the effort required by a direct robust approach.
@misc{Aravkin2012Robust, Author = {A. Aravkin and M. P. Friedlander and T. van Leeuwen}, Year = {2012}, Month = {January}, Doi = {10.48550/arxiv.1110.0895}, Title = {Robust inversion via semistochastic dimensionality reduction} } - [abs]
[bib]
Robust inversion, dimensionality reduction, and randomized sampling
A. Aravkin, M. P. Friedlander, F. Herrmann, T. van Leeuwen. Mathematical Programming, 134(1):101–125, 2012,
2012.
[DOI]We consider a class of inverse problems in which the forward model is the solution operator to linear ODEs or PDEs. This class admits several dimensionality-reduction techniques based on data averaging or sampling, which are especially useful for large-scale problems. We survey these approaches and their connection to stochastic optimization. The data-averaging approach is only viable, however, for a least-squares misfit, which is sensitive to outliers in the data and artifacts unexplained by the forward model. This motivates us to propose a robust formulation based on the Student’s t-distribution of the error. We demonstrate how the corresponding penalty function, together with the sampling approach, can obtain good results for a large-scale seismic inverse problem with 50% corrupted data.
@article{Aravkin2012Robust, Author = {A. Aravkin and M. P. Friedlander and F. Herrmann and T. van Leeuwen}, Year = {2012}, Month = {January}, Journal = {Mathematical Programming}, Number = {1}, Volume = {134}, Pages = {101-125}, Doi = {10.1007/s10107-012-0571-6}, Title = {Robust inversion, dimensionality reduction, and randomized sampling} } - [abs]
[bib]
Sparse optimization with least-squares constraints
E. van den Berg, M. P. Friedlander. SIAM J. Optimization, 21(4):1201–1229,
2011.
[DOI]
[more]The use of convex optimization for the recovery of sparse signals from incomplete or compressed data is now common practice. Motivated by the success of basis pursuit in recovering sparse vectors, new formulations have been proposed that take advantage of different types of sparsity. In this paper we propose an efficient algorithm for solving a general class of sparsifying formulations. For several common types of sparsity we provide applications, along with details on how to apply the algorithm, and experimental results.
@article{Berg2011Sparse, Author = {E. van den Berg and M. P. Friedlander}, Year = {2011}, Month = {October}, Journal = {SIAM Journal on Optimization}, Number = {4}, Volume = {21}, Pages = {1201-1229}, Doi = {10.1137/100785028}, Title = {Sparse optimization with least-squares constraints} } - [abs]
[bib]
Theoretical and empirical results for recovery from multiple measurements
E. van den Berg, M. P. Friedlander. IEEE Trans. Info. Theory, 56(5):2516–2527,
2010.
[Code & Data]
[DOI]
[more]The joint-sparse recovery problem aims to recover, from sets of compressed measurements, unknown sparse matrices with nonzero entries restricted to a subset of rows. This is an extension of the single-measurement-vector (SMV) problem widely studied in compressed sensing. We analyze the recovery properties for two types of recovery algorithms. First, we show that recovery using sum-of-norm minimization cannot exceed the uniform recovery rate of sequential SMV using L1 minimization, and that there are problems that can be solved with one approach but not with the other. Second, we analyze the performance of the ReMBo algorithm [M. Mishali and Y. Eldar, IEEE Trans. Sig. Proc., 56 (2008)] in combination with L1 minimization, and show how recovery improves as more measurements are taken. From this analysis it follows that having more measurements than number of nonzero rows does not improve the potential theoretical recovery rate.
@article{Berg2010Theoretical, Author = {E. van den Berg and M. P. Friedlander}, Year = {2010}, Month = {May}, Journal = {IEEE Transactions on Information Theory}, Number = {5}, Volume = {56}, Pages = {2516-2527}, Doi = {10.1109/tit.2010.2043876}, Title = {Theoretical and empirical results for recovery from multiple measurements} } - [abs]
[bib]
Optimizing costly functions with simple constraints: a limited-memory projected quasi-Newton algorithm
M. Schmidt, E. van den Berg, M. P. Friedlander, K. Murphy. Intern. Conf. Artificial Intelligence and Statistics (AISTATS),
2009.
[Slides]
[Software]An optimization algorithm for minimizing a smooth function over a convex set is described. Each iteration of the method computes a descent direction by minimizing, over the original constraints, a diagonal plus low-rank quadratic approximation to the function. The quadratic approximation is constructed using a limited-memory quasi-Newton update. The method is suitable for large-scale problems where evaluation of the function is substantially more expensive than projection onto the constraint set. Numerical experiments on one-norm regularized test problems indicate that the proposed method is competitive with state-of-the-art methods such as bound-constrained L-BFGS and orthant-wise descent. We further show that the method generalizes to a wide class of problems, and substantially improves on state-of-the-art methods for problems such as learning the structure of Gaussian graphical models and Markov random fields.
@article{Schmidt2009Optimizing, Author = {M. Schmidt and E. van den Berg and M. P. Friedlander and K. Murphy}, Year = {2009}, Month = {April}, Pages = {456-463}, Title = {Optimizing costly functions with simple constraints: a limited-memory projected quasi-Newton algorithm} } - [abs]
[bib]
Sparco: a testing framework for sparse reconstruction
E. van den Berg, M. P. Friedlander, G. Hennenfent, F. Herrmann, R. Saab, Ö. Yılmaz. ACM Trans. Math. Software, 35(4):1–16,
2009.
[Code]
[more]Sparco is a framework for testing and benchmarking algorithms for sparse reconstruction. It includes a large collection of sparse reconstruction problems drawn from the imaging, compressed sensing, and geophysics literature. Sparco is also a framework for implementing new test problems and can be used as a tool for reproducible research. Sparco is implemented entirely in Matlab, and is released as open-source software under the GNU Public License.
@article{Berg2009Sparco, Author = {E. van den Berg and M. P. Friedlander and G. Hennenfent and F. Herrmann and R. Saab and Ö. Yılmaz}, Year = {2009}, Month = {February}, Title = {Sparco: a testing framework for sparse reconstruction} } - [abs]
[bib]
Probing the Pareto frontier for basis pursuit solutions
E. van den Berg, M. P. Friedlander. SIAM J. Scientific Computing, 31(2):890–912,
2008.
[SPGL1 Software]
[DOI]
[more]The basis pursuit problem seeks a minimum one-norm solution of an underdetermined least-squares problem. Basis pursuit denoise (BPDN) fits the least-squares problem only approximately, and a single parameter determines a curve that traces the optimal trade-off between the least-squares fit and the one-norm of the solution. We prove that this curve is convex and continuously differentiable over all points of interest, and show that it gives an explicit relationship to two other optimization problems closely related to BPDN. We describe a root-finding algorithm for finding arbitrary points on this curve; the algorithm is suitable for problems that are large scale and for those that are in the complex domain. At each iteration, a spectral gradient-projection method approximately minimizes a least-squares problem with an explicit one-norm constraint. Only matrix-vector operations are required. The primal-dual solution of this problem gives function and derivative information needed for the root-finding method. Numerical experiments on a comprehensive set of test problems demonstrate that the method scales well to large problems.
@article{Berg2008Probing, Author = {E. van den Berg and M. P. Friedlander}, Year = {2008}, Month = {November}, Journal = {SIAM Journal on Scientific Computing}, Number = {2}, Volume = {31}, Pages = {890-912}, Doi = {10.1137/080714488}, Title = {Probing the Pareto frontier for basis pursuit solutions} } - [abs]
[bib]
Computing nonnegative tensor factorizations
M. P. Friedlander, K. Hatz. Optimization Methods and Software, 23(4):631–647,
2008.
[Journal Link]
[DOI]
[more]Nonnegative tensor factorization (NTF) is a technique for computing a parts-based representation of high-dimensional data. NTF excels at exposing latent structures in datasets, and at finding good low-rank approximations to the data. We describe an approach for computing the NTF of a dataset that relies only on iterative linear-algebra techniques and that is comparable in cost to the nonnegative matrix factorization. (The better-known nonnegative matrix factorization is a special case of NTF and is also handled by our implementation.) Some important features of our implementation include mechanisms for encouraging sparse factors and for ensuring that they are equilibrated in norm. The complete Matlab software package is available under the GPL license.
@article{Friedlander2008Computing, Author = {M. P. Friedlander and K. Hatz}, Year = {2008}, Month = {August}, Journal = {Optimization methods & software}, Number = {4}, Volume = {23}, Pages = {631-647}, Doi = {10.1080/10556780801996244}, Title = {Computing nonnegative tensor factorizations} } - [abs]
[bib]
New insights into one-norm solvers from the Pareto curve
G. Hennenfent, E. van den Berg, M. P. Friedlander, F. Herrmann. Geophysics, 73(4):A23–A26,
2008.
[DOI]Geophysical inverse problems typically involve a trade off between data misfit and some prior. Pareto curves trace the optimal trade off between these two competing aims. These curves are commonly used in problems with two-norm priors where they are plotted on a log-log scale and are known as L-curves. For other priors, such as the sparsity-promoting one norm, Pareto curves remain relatively unexplored. We show how these curves lead to new insights in one-norm regularization. First, we confirm the theoretical properties of smoothness and convexity of these curves from a stylized and a geophysical example. Second, we exploit these crucial properties to approximate the Pareto curve for a large-scale problem. Third, we show how Pareto curves provide an objective criterion to gauge how different one-norm solvers advance towards the solution.
@article{Hennenfent2008New, Author = {G. Hennenfent and E. van den Berg and M. P. Friedlander and F. Herrmann}, Year = {2008}, Month = {July}, Journal = {Geophysics}, Number = {4}, Volume = {73}, Pages = {A23-A26}, Doi = {10.1190/1.2944169}, Title = {New insights into one-norm solvers from the Pareto curve} } - [abs]
[bib]
Group sparsity via linear-time projection
E. van den Berg, M. Schmidt, M. P. Friedlander, K. Murphy. Tech. Rep. TR-2008-09, Dept of Computer Science, UBC,
2008.We present an efficient spectral projected-gradient algorithm for optimization subject to a group l1-norm constraint. Our approach is based on a novel linear-time algorithm for Euclidean projection onto the l1- and group l1-norm constraints. Numerical experiments on large data sets suggest that the proposed method is substantially more efficient and scalable than existing methods.
@article{Berg2008Group, Author = {E. van den Berg and M. Schmidt and M. P. Friedlander and K. Murphy}, Year = {2008}, Month = {June}, Title = {Group sparsity via linear-time projection} } - [abs]
[bib]
Global and finite termination of a two-phase augmented Lagrangian filter method for general quadratic programs
M. P. Friedlander, S. Leyffer. SIAM J. Scientific Computing, 30(4):1706–1726,
2008.
[DOI]We present a two-phase algorithm for solving large-scale quadratic programs (QPs). In the first phase, gradient-projection iterations approximately minimize an augmented Lagrangian function and provide an estimate of the optimal active set. In the second phase, an equality-constrained QP defined by the current inactive variables is approximately minimized in order to generate a second-order search direction. A filter determines the required accuracy of the subproblem solutions and provides an acceptance criterion for the search directions. The resulting algorithm is globally and finitely convergent. The algorithm is suitable for large-scale problems with many degrees of freedom, and provides an alternative to interior-point methods when iterative methods must be used to solve the underlying linear systems. Numerical experiments on a subset of the CUTEr QP test problems demonstrate the effectiveness of the approach.
@article{Friedlander2008Global, Author = {M. P. Friedlander and S. Leyffer}, Year = {2008}, Month = {April}, Journal = {SIAM Journal on Scientific Computing}, Number = {4}, Volume = {30}, Pages = {1706-1729}, Doi = {10.1137/060669930}, Title = {Global and finite termination of a two-phase augmented Lagrangian filter method for general quadratic programs} } - [bib]
Discussion: The Dantzig selector: Statistical estimation when p is much larger than n
M. P. Friedlander, M. A. Saunders. Annals of Statistics, 35(6):2385–2391,
2007.
[Journal]
[Slides]
[DOI]
@article{Friedlander2007Discussion, Author = {M. P. Friedlander and M. A. Saunders}, Year = {2007}, Month = {December}, Journal = {The Annals of Statistics}, Number = {6}, Volume = {35}, Doi = {10.1214/009053607000000479}, Title = {Discussion: The Dantzig selector: Statistical estimation when p is much larger than n} } - [abs]
[bib]
Exact regularization of convex programs
M. P. Friedlander, P. Tseng. SIAM J. Optimization, 18(4):1326–1350,
2007.
[Data & Code]
[DOI]
[more]The regularization of a convex program is exact if all solutions of the regularized problem are also solutions of the original problem for all values of the regularization parameter below some positive threshold. For a general convex program, we show that the regularization is exact if and only if a certain selection problem has a Lagrange multiplier. Moreover, the regularization parameter threshold is inversely related to the Lagrange multiplier. We use this result to generalize an exact regularization result of Ferris and Mangasarian [Appl. Math. Optim., 23 (1991), pp. 266-273] involving a linearized selection problem. We also use it to derive necessary and sufficient conditions for exact penalization, similar to those obtained by Bertsekas [Math. Programming, 9 (1975), pp. 87-99] and by Bertsekas, Nedić, and Ozdaglar [Convex Analysis and Optimization, Athena Scientific, Belmont, MA, 2003]. When the regularization is not exact, we derive error bounds on the distance from the regularized solution to the original solution set. We also show that existence of a "weak sharp minimum" is in some sense close to being necessary for exact regularization. We illustrate the main result with numerical experiments on the \(\ell_1\) regularization of benchmark (degenerate) linear programs and semidefinite/second-order cone programs. The experiments demonstrate the usefulness of \(\ell_1\) regularization in finding sparse solutions.
@article{Friedlander2007Exact, Author = {M. P. Friedlander and P. Tseng}, Year = {2007}, Month = {November}, Journal = {SIAM Journal on Optimization}, Number = {4}, Volume = {18}, Pages = {1326-1350}, Doi = {10.1137/060675320}, Title = {Exact regularization of convex programs} } - [abs]
[bib]
A filter active-set trust-region method
M. P. Friedlander, N. I. M. Gould, S. Leyffer, T. S. Munson. Preprint ANL/MCS-P1456-0907, Argonne National Laboratory,
2007.
[Preprint]We propose a two-phase active-set method for nonlinearly constrained optimization. The first phase solves a regularized linear program (LP) that serves to estimate an optimal active set. The second phase uses this active-set estimate to determine an equality-constrained quadratic program which provides for fast local convergence. A filter promotes global convergence. The regularization term in the first phase bounds the LP solution and plays role similar to an explicit ellipsoid trust-region constraint. We prove that the resulting method is globally convergent, and that an optimal active set is identified near a solution. We discuss alternative regularization functions that incorporate curvature information into the active-set identification phase. Preliminary numerical experiments on a subset of the CUTEr test problems illustrate the effectiveness of the approach.
@article{Friedlander2007Filter, Author = {M. P. Friedlander and N. I. M. Gould and S. Leyffer and T. S. Munson}, Year = {2007}, Month = {September}, Title = {A filter active-set trust-region method} } - [abs]
[bib]
In pursuit of a root
E. van den Berg, M. P. Friedlander. Tech. Rep. TR-2007-19, Dept. of Computer Science, UBC,
2007.
[more]The basis pursuit technique is used to find a minimum one-norm solution of an underdetermined least-squares problem. Basis pursuit denoise fits the least-squares problem only approximately, and a single parameter determines a curve that traces the trade-off between the least-squares fit and the one-norm of the solution. We show that the function that describes this curve is convex and continuously differentiable over all points of interest. The dual solution of a least-squares problem with an explicit one-norm constraint gives function and derivative information needed for a root-finding method. As a result, we can compute arbitrary points on this curve. Numerical experiments demonstrate that our method, which relies on only matrix-vector operations, scales well to large problems.
@article{Berg2007Pursuit, Author = {E. van den Berg and M. P. Friedlander}, Year = {2007}, Month = {June}, Title = {In pursuit of a root} } - [abs]
[bib]
Diffuse optical fluorescence tomography using time-resolved data acquired in transmission
F. Leblond, S. Fortier, M. P. Friedlander. Multimodal Biomedical Imaging II, vol. 6431. Proc. Intern. Society Optimal Imaging,
2007.
[Code]
[DOI]
[more]We present an algorithm using data acquired with a time-resolved system with the goal of reconstructing sources of fluorescence emanating from the deep interior of highly scattering biological tissues. A novelty in our tomography algorithm is the integration of a light transport model adapted to rodent geometries. For small volumes, our analysis suggest that neglecting the index of refraction mismatch between diffusive and non-diffusive regions, as well as the curved nature of the boundary, can have a profound impact on fluorescent images and spectroscopic applications relying on diffusion curve fitting. Moreover, we introduce a new least-squares solver with bound constraints adapted for optical problems where a physical non-negative constraint can be imposed. Finally, we find that maximizing the time-related information content of the data in the reconstruction process significantly enhances the quality of fluorescence images. Preliminary noise propagation and detector placement optimization analysis are also presented.
@article{Leblond2007Diffuse, Author = {F. Leblond and S. Fortier and M. P. Friedlander}, Year = {2007}, Month = {February}, Journal = {Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE}, Volume = {6431}, Pages = {643106}, Doi = {10.1117/12.700841}, Title = {Diffuse optical fluorescence tomography using time-resolved data acquired in transmission} } - [abs]
[bib]
On minimizing distortion and relative entropy
M. P. Friedlander, M. R. Gupta. IEEE Trans. Info. Theory, 52(1):238–245,
2006.
[DOI]A common approach for estimating a probability mass function \(w\) when given a prior \(q\) and moment constraints given by \(Aw\le b\) is to minimize the relative entropy between \(w\) and \(q\) subject to the set of linear constraints. In such cases, the solution \(w\) is known to have exponential form. We consider the case in which the linear constraints are noisy, uncertain, infeasible, or otherwise “soft.” A solution can then be obtained by minimizing both the relative entropy and violation of the constraints \(Aw\le b\). A penalty parameter \(\sigma\) weights the relative importance of these two objectives. We show that this penalty formulation also yields a solution \(w\) with exponential form. If the distortion is based on an \(\ell_p\) norm, then the exponential form of \(w\) is shown to have exponential decay parameters that are bounded as a function of \(\sigma\). We also state conditions under which the solution \(w\) to the penalty formulation will result in zero distortion, so that the moment constraints hold exactly. These properties are useful in choosing penalty parameters, evaluating the impact of chosen penalty parameters, and proving properties about methods that use such penalty formulations. to maximizing entropy.
@article{Friedlander2006Minimizing, Author = {M. P. Friedlander and M. R. Gupta}, Year = {2006}, Month = {January}, Journal = {IEEE Transactions on Information Theory}, Number = {1}, Volume = {52}, Pages = {238-245}, Doi = {10.1109/tit.2005.860448}, Title = {On minimizing distortion and relative entropy} } - [abs]
[bib]
A two-sided relaxation scheme for mathematical programs with equilibrium constraints
V. Demiguel, M. P. Friedlander, F. J. Nogales, S. Scholtes. SIAM J. Optimization, 16(1):587–609,
2005.
[DOI]We propose a relaxation scheme for mathematical programs with equilibrium constraints (MPECs). In contrast to previous approaches, our relaxation is two-sided: both the complementarity and the nonnegativity constraints are relaxed. The proposed relaxation update rule guarantees (under certain conditions) that the sequence of relaxed subproblems will maintain a strictly feasible interior–even in the limit. We show how the relaxation scheme can be used in combination with a standard interior-point method to achieve superlinear convergence. Numerical results on the MacMPEC test problem set demonstrate the fast local convergence properties of the approach.
@article{Demiguel2005Two, Author = {V. Demiguel and M. P. Friedlander and F. J. Nogales and S. Scholtes}, Year = {2005}, Month = {December}, Journal = {SIAM Journal on Optimization}, Number = {2}, Volume = {16}, Pages = {587-609}, Doi = {10.1137/04060754x}, Title = {A two-sided relaxation scheme for mathematical programs with equilibrium constraints} } - [abs]
[bib]
Exact regularization of linear programs
M. P. Friedlander. Tech. Rep. TR-2005-31, Dept. of Computer Science, UBC,
2005.
[Data]
[more]We show that linear programs (LPs) admit regularizations that either contract the original (primal) solution set or leave it unchanged. Any regularization function that is convex and has compact level sets is allowed; differentiability is not required. This is an extension of the result first described by Mangasarian and Meyer (SIAM J. Control Optim., 17(6), pp. 745-752, 1979). We show that there always exist positive values of the regularization parameter such that a solution of the regularized problem simultaneously minimizes the original LP and minimizes the regularization function over the original solution set. We illustrate the main result using the nondifferentiable one-norm regularization function on a set of degenerate LPs. Numerical results demonstrate how such an approach yields sparse solutions from the application of an interior-point method.
@techreport{Frie:2005, Address = {University of British Columbia, Vancouver}, Author = {M. P. Friedlander}, Institution = {Dept. Computer Science}, Month = {December}, Number = {TR-2005-31}, Title = {Exact regularization of linear programs}, Type = {Tech. Rep.}, Year = 2005 } - [abs]
[bib]
A globally convergent linearly constrained Lagrangian method for nonlinear optimization
M. P. Friedlander, M. A. Saunders. SIAM J. Optimization 15(3):863–897,
2005.
[DOI]For optimization problems with nonlinear constraints, linearly constrained Lagrangian (LCL) methods solve a sequence of subproblems of the form "minimize an augmented Lagrangian function subject to linearized constraints." Such methods converge rapidly near a solution but may not be reliable from arbitrary starting points. Nevertheless, the well-known software package MINOS has proved effective on many large problems. Its success motivates us to derive a related LCL algorithm that possesses three important properties: it is globally convergent, the subproblem constraints are always feasible, and the subproblems may be solved inexactly. The new algorithm has been implemented in Matlab, with an option to use either MINOS or SNOPT (Fortran codes) to solve the linearly constrained subproblems. Only first derivatives are required. We present numerical results on a subset of the COPS, HS, and CUTE test problems, which include many large examples. The results demonstrate the robustness and efficiency of the stabilized LCL procedure.
@article{Friedlander2005Globally, Author = {M. P. Friedlander and M. A. Saunders}, Year = {2005}, Month = {April}, Journal = {SIAM Journal on Optimization}, Number = {3}, Volume = {15}, Pages = {863-897}, Doi = {10.1137/s1052623402419789}, Title = {A globally convergent linearly constrained Lagrangian method for nonlinear optimization} } - [abs]
[bib]
Maximum entropy classification applied to speech
M. R. Gupta, M. P. Friedlander, R. M. Gray. Asilomar Conf. Signals, Systems, Computers (ACSSC), vol. 2, 1480–1483,
2000.
[DOI]We present a new method for classification using the maximum entropy principle, allowing full use of relevant training data and smoothing the data space. To classify a test point we compute a maximum entropy weight distribution over a subset of training data and constrain the weights to exactly reconstruct the test point. The classification problem is formulated as a linearly constrained optimization problem and solved using a primal-dual logarithmic barrier method, well suited for high-dimensional data. We discuss theoretical advantages and present experimental results on vowel data which demonstrate that the method performs competitively for speech classification tasks.
@article{Gupta2000Maximum, Author = {M. R. Gupta and M. P. Friedlander and R. M. Gray}, Year = {2000}, Month = {October}, Volume = {2}, Pages = {1480-1483}, Doi = {10.1109/acssc.2000.911236}, Title = {Maximum entropy classification applied to speech} }
Selected Talks
- [bib]
Polar deconvolution of mixed signals
M. P. Friedlander. One-World Opt. Seminar,
2020.
[slides]
[DOI]
@article{Friedlander2020Polar, Author = {M. P. Friedlander}, Year = {2020}, Month = {October}, Journal = {IEEE Transactions on Signal Processing}, Volume = {70}, Pages = {2713-2727}, Doi = {10.1109/tsp.2022.3178191}, Title = {Polar deconvolution of mixed signals} } - Polar duality in three liftings M. P. Friedlander. McGill University, 2017. [slides]
- Algorithms for sparse optimization M. P. Friedlander. Google Research, 2015. [slides]
- Robust inversion and randomized sampling M. P. Friedlander. ISMP, Berlin, 2012. [slides]