Mathematical Optimization Laboratory
The Mathematical Optimization Laboratory develops theory, algorithms, and software for large-scale optimization problems arising in machine learning, signal processing, and scientific computing. Our work bridges foundational mathematics with practical applications.
Current Members
- SGSimon Ghyselincks PhD StudentOptimization algorithms and applications
- JJJonas Jager PhD StudentMathematical optimization
- NRNicholas Richardson PhD StudentOptimization methods
Publications (3)
- [abs]
[bib]
Tracking Cu-fertile sediment sources via multivariate petrochronological mixture modeling of detrital zircons
J. E. Saylor, N. Richardson, N. Graham, R. G. Lee, M. P. Friedlander. J. Geophys. Res.: Earth Surface, 130(10):e2025JF008406,
2025.Whereas the ability to acquire petrochronological data from detrital minerals has exploded, development of tools to analyze and interpret the multivariate data sets has not kept pace. Herein, we present a case study, which applies the recently developed non-negative Tucker-1 decomposition (NNT1) method to a multivariate detrital zircon data set from till samples collected above the Cu-bearing Guichon Creek Batholith in southern British Columbia, Canada. Zircon composition variables that we consider include age, Ce anomaly, CeN/NdN, DyN/YbN, ΔFMQ, Eu anomaly, ΣHREE/ΣMREE, Hf, Th/U, Ti temperature, and YbN/GdN. The NNT1 approach successfully deconvolves the multivariate data set into two endmembers, which are consistent with derivation either from non-oxidized and relatively anhydrous (i.e., low Cu-ore potential, Source 1) or oxidized and hydrous (i.e., potential Cu-ore bodies, Source 2) igneous rocks. Furthermore, we attribute each of the zircon grains to either the Source 1 or 2 endmember based on maximization of the likelihood that their measured multivariate geochemistry was drawn from one or the other of the learned multivariate endmembers. Finally, we demonstrate that the proportions of the Source 2 endmember decrease with increasing distance from the ore bodies, as expected due to down-ice or off-axis zircon mixing and dilution. We conclude that the NNT1 approach provides insight into geologically meaningful sediment transport processes and multivariate sediment sources even when those sources are unknown. It thus provides a basis for future petrochronological interpretations with applied and pure geoscience applications.
@article{saylor2025tracking, title={Tracking Cu-fertile sediment sources via multivariate petrochronological mixture modeling of detrital zircons}, author={Saylor, Joel E and Richardson, Nicholas and Graham, Naomi and Lee, Robert G and Friedlander, Michael P}, journal={Journal of Geophysical Research: Earth Surface}, volume={130}, number={10}, pages={e2025JF008406}, year={2025}, publisher={Wiley Online Library} } - [abs]
[bib]
Tracing Sedimentary Origins in Multivariate Geochronology via Constrained Tensor Factorization
N. Graham, N. Richardson, M. P. Friedlander, J. Saylor. Mathematical Geosciences, 57(4):601-628,
2025.
[DOI]We devise a novel statistical method for deconvolving multivariate geochronology and geochemistry datasets to characterize sediment sources. The approach employs a third-order constrained Tucker-1 tensor decomposition to estimate the probability distributions of multiple features in sediment samples. By integrating kernel density estimation with matrix-tensor factorization, the model quantitatively determines the distributions and mixing proportions of sediment sources. The methodology introduces a numerical test for rank estimation to define the number of latent sources, and uses a maximum-likelihood approach to correlate individual detrital grains to these latent sources based on an arbitrary number of features. The method’s efficacy is validated through a numerical experiment with detrital zircon data that captures natural variability associated with temporal changes in crustal thickness in the Andes. The findings hold potential implications for resolving sediment sources, determining sediment mixing, enhancing the understanding of sediment transport processes, characterizing the lithology, tectonic motion, or metallogenic potential of sediment sources. This method is adaptable to other data de-mixing problems and is available in a publicly released software package.
@article{Graham2025Tracing, Author = {N. Graham and N. Richardson and M. P. Friedlander and J. Saylor}, Year = {2025}, Month = {May}, Journal = {Mathematical Geosciences}, Number = {4}, Volume = {57}, Pages = {601-628}, Doi = {10.1007/s11004-024-10175-0}, Title = {Tracing Sedimentary Origins in Multivariate Geochronology via Constrained Tensor Factorization} } - [abs]
[bib]
Endmember modelling of detrital zircon petrochronology data via multivariate Tucker-1 tensor decomposition
J. E. Saylor, N. Richardson, N. Graham, R. G. Lee, M. P. Friedlander. EGU General Assembly 2025,
2025.
[EGU Abstract]
[Web App]
[GitHub]
Detrital petrochronology is a powerful method of characterizing sediment and potentially sediment sources. The recently developed Tucker-1 decomposition method holds promise of using detrital petrochronology to identify both sediment-source characteristics and the proportions in which sources are present in sink samples even when sediment sources are unknown or unavailable for sampling. However, the correlation between endmember characteristics and lithological sources or proportions and sedimentary processes has not been established. Herein we present a case study of the recently developed Tucker-1 decomposition method to a multivariate geochemical data set from detrital zircons in till samples collected above the Cu-bearing Guichon Creek Batholith (GCB) in southern British Columbia, Canada. Data include a suite of eleven variables, including age, Ce anomaly, CeN/NdN, DyN/YbN, ΔFMQ, Eu anomaly, ΣHREE/ΣMREE, Hf, Th/U, Ti temperature, and YbN/GdN, from 12 samples from collected at a range of distances in the down ice-flow direction from the GCB.
We demonstrate that endmember modelling using the Tucker-1 decomposition method successfully deconvolves the multivariate data sets into two endmembers in which the geochemical distributions are consistent with derivation from either non-oxidized and relatively anhydrous (i.e., low ore potential, Source 1) or oxidized and hydrous (i.e., potential ore bodies, Source 2) igneous rocks. Furthermore, we demonstrate that the proportions of the Source 2 endmember decrease with increasing distance from the ore bodies, as expected due to downstream zircon mixing and dilution.
Finally, we attribute each of the zircon grains to either the Source 1 or 2 endmember based on maximization of the likelihood that their measured multivariate geochemistry was drawn from one or the other of the learned multivariate endmembers. We compared these grain attributions to the results of an independent Classification and Regression Tree (CART) analysis designed to characterize zircon grains as either “fertile” or “barren” with respect to copper based on their geochemistry. We find that there is ~80% overlap between the source attributions based on the CART analysis and the grain-source identification based on the Tucker-1 decomposition.
We conclude that the novel Tucker-1 decomposition approach provides a flexible, precise, and accurate method of characterizing multivariate sediment sources even when those sources are unknown. It thus provides a basis for future petrochronological interpretations with applied and pure geoscience applications. All of the analyses presented herein can be freely accessed through a web application (https://dzgrainalyzer.eoas.ubc.ca/) or open-source Julia code (https://github.com/MPF-Optimization-Laboratory/MatrixTensorFactor.jl).
@inproceedings{saylor2025endmember, title={Endmember modelling of detrital zircon petrochronology data via multivariate Tucker-1 tensor decomposition}, author={Saylor, Joel E and Richardson, Nicholas and Graham, Naomi and Lee, Robert G and Friedlander, Michael P}, booktitle={EGU General Assembly Conference Abstracts}, pages={EGU25--4081}, year={2025} }
- [abs]
[bib]
Tracking Cu-fertile sediment sources via multivariate petrochronological mixture modeling of detrital zircons
J. E. Saylor, N. Richardson, N. Graham, R. G. Lee, M. P. Friedlander. J. Geophys. Res.: Earth Surface, 130(10):e2025JF008406,
2025.
- KTKevin Thomas MSc StudentOptimization and machine learning
Alumni
Our alumni have gone on to positions at leading research labs and universities worldwide.
2020–Present
- NGNaomi Graham PhD (2025)Postdoctoral Researcher, Umeå University
Publications (5)
- [abs]
[bib]
STARK denoises spatial transcriptomics images via adaptive regularization
S. Kubal, N. Graham, M. Heitz, A. Warren, M. P. Friedlander, Y. Plan, G. Schiebinger. arXiv:2512.10994,
2025.
[arXiv]We present a denoising method for spatial transcriptomics imagery effective at extremely low sequencing depths, enabling gene expression interpolation. STARK augments kernel ridge regression with an incrementally adaptive graph Laplacian regularizer. Each iteration performs kernel ridge regression and updates the graph based on the enhanced image. The approach optimizes a block-convex objective through alternating minimization with closed-form solutions. Convergence to stationary points is proven, with statistical convergence to ground truth at rate O(R^{-1/2}), where R denotes read count. Experimental validation on real data demonstrates improved denoising performance versus competing methods, measured through label transfer accuracy.
@article{Kubal2025STARK, Author = {S. Kubal and N. Graham and M. Heitz and A. Warren and M. P. Friedlander and Y. Plan and G. Schiebinger}, Year = {2025}, Month = {December}, Journal = {arXiv preprint arXiv:2512.10994}, Title = {STARK denoises spatial transcriptomics images via adaptive regularization} } - [abs]
[bib]
Factoring over probability simplices: from theory to applications
N. Graham. Ph.D. Thesis, UBC,
2025.
[UBC cIRcle]This thesis develops a framework for non-negative matrix and tensor factorization with sum-to-one constraints. A novel rescaled block coordinate descent method is presented that manages simplex constraints while ensuring bounded iterates. The work bridges theoretical analysis of constrained factorization with practical applications in sediment provenance analysis and spatial transcriptomics denoising.
@phdthesis{Graham2025Factoring, Author = {N. Graham}, Year = {2025}, Month = {November}, Title = {Factoring over probability simplices: from theory to applications} } - [abs]
[bib]
Tracking Cu-fertile sediment sources via multivariate petrochronological mixture modeling of detrital zircons
J. E. Saylor, N. Richardson, N. Graham, R. G. Lee, M. P. Friedlander. J. Geophys. Res.: Earth Surface, 130(10):e2025JF008406,
2025.Whereas the ability to acquire petrochronological data from detrital minerals has exploded, development of tools to analyze and interpret the multivariate data sets has not kept pace. Herein, we present a case study, which applies the recently developed non-negative Tucker-1 decomposition (NNT1) method to a multivariate detrital zircon data set from till samples collected above the Cu-bearing Guichon Creek Batholith in southern British Columbia, Canada. Zircon composition variables that we consider include age, Ce anomaly, CeN/NdN, DyN/YbN, ΔFMQ, Eu anomaly, ΣHREE/ΣMREE, Hf, Th/U, Ti temperature, and YbN/GdN. The NNT1 approach successfully deconvolves the multivariate data set into two endmembers, which are consistent with derivation either from non-oxidized and relatively anhydrous (i.e., low Cu-ore potential, Source 1) or oxidized and hydrous (i.e., potential Cu-ore bodies, Source 2) igneous rocks. Furthermore, we attribute each of the zircon grains to either the Source 1 or 2 endmember based on maximization of the likelihood that their measured multivariate geochemistry was drawn from one or the other of the learned multivariate endmembers. Finally, we demonstrate that the proportions of the Source 2 endmember decrease with increasing distance from the ore bodies, as expected due to down-ice or off-axis zircon mixing and dilution. We conclude that the NNT1 approach provides insight into geologically meaningful sediment transport processes and multivariate sediment sources even when those sources are unknown. It thus provides a basis for future petrochronological interpretations with applied and pure geoscience applications.
@article{saylor2025tracking, title={Tracking Cu-fertile sediment sources via multivariate petrochronological mixture modeling of detrital zircons}, author={Saylor, Joel E and Richardson, Nicholas and Graham, Naomi and Lee, Robert G and Friedlander, Michael P}, journal={Journal of Geophysical Research: Earth Surface}, volume={130}, number={10}, pages={e2025JF008406}, year={2025}, publisher={Wiley Online Library} } - [abs]
[bib]
Tracing Sedimentary Origins in Multivariate Geochronology via Constrained Tensor Factorization
N. Graham, N. Richardson, M. P. Friedlander, J. Saylor. Mathematical Geosciences, 57(4):601-628,
2025.
[DOI]We devise a novel statistical method for deconvolving multivariate geochronology and geochemistry datasets to characterize sediment sources. The approach employs a third-order constrained Tucker-1 tensor decomposition to estimate the probability distributions of multiple features in sediment samples. By integrating kernel density estimation with matrix-tensor factorization, the model quantitatively determines the distributions and mixing proportions of sediment sources. The methodology introduces a numerical test for rank estimation to define the number of latent sources, and uses a maximum-likelihood approach to correlate individual detrital grains to these latent sources based on an arbitrary number of features. The method’s efficacy is validated through a numerical experiment with detrital zircon data that captures natural variability associated with temporal changes in crustal thickness in the Andes. The findings hold potential implications for resolving sediment sources, determining sediment mixing, enhancing the understanding of sediment transport processes, characterizing the lithology, tectonic motion, or metallogenic potential of sediment sources. This method is adaptable to other data de-mixing problems and is available in a publicly released software package.
@article{Graham2025Tracing, Author = {N. Graham and N. Richardson and M. P. Friedlander and J. Saylor}, Year = {2025}, Month = {May}, Journal = {Mathematical Geosciences}, Number = {4}, Volume = {57}, Pages = {601-628}, Doi = {10.1007/s11004-024-10175-0}, Title = {Tracing Sedimentary Origins in Multivariate Geochronology via Constrained Tensor Factorization} } - [abs]
[bib]
Endmember modelling of detrital zircon petrochronology data via multivariate Tucker-1 tensor decomposition
J. E. Saylor, N. Richardson, N. Graham, R. G. Lee, M. P. Friedlander. EGU General Assembly 2025,
2025.
[EGU Abstract]
[Web App]
[GitHub]
Detrital petrochronology is a powerful method of characterizing sediment and potentially sediment sources. The recently developed Tucker-1 decomposition method holds promise of using detrital petrochronology to identify both sediment-source characteristics and the proportions in which sources are present in sink samples even when sediment sources are unknown or unavailable for sampling. However, the correlation between endmember characteristics and lithological sources or proportions and sedimentary processes has not been established. Herein we present a case study of the recently developed Tucker-1 decomposition method to a multivariate geochemical data set from detrital zircons in till samples collected above the Cu-bearing Guichon Creek Batholith (GCB) in southern British Columbia, Canada. Data include a suite of eleven variables, including age, Ce anomaly, CeN/NdN, DyN/YbN, ΔFMQ, Eu anomaly, ΣHREE/ΣMREE, Hf, Th/U, Ti temperature, and YbN/GdN, from 12 samples from collected at a range of distances in the down ice-flow direction from the GCB.
We demonstrate that endmember modelling using the Tucker-1 decomposition method successfully deconvolves the multivariate data sets into two endmembers in which the geochemical distributions are consistent with derivation from either non-oxidized and relatively anhydrous (i.e., low ore potential, Source 1) or oxidized and hydrous (i.e., potential ore bodies, Source 2) igneous rocks. Furthermore, we demonstrate that the proportions of the Source 2 endmember decrease with increasing distance from the ore bodies, as expected due to downstream zircon mixing and dilution.
Finally, we attribute each of the zircon grains to either the Source 1 or 2 endmember based on maximization of the likelihood that their measured multivariate geochemistry was drawn from one or the other of the learned multivariate endmembers. We compared these grain attributions to the results of an independent Classification and Regression Tree (CART) analysis designed to characterize zircon grains as either “fertile” or “barren” with respect to copper based on their geochemistry. We find that there is ~80% overlap between the source attributions based on the CART analysis and the grain-source identification based on the Tucker-1 decomposition.
We conclude that the novel Tucker-1 decomposition approach provides a flexible, precise, and accurate method of characterizing multivariate sediment sources even when those sources are unknown. It thus provides a basis for future petrochronological interpretations with applied and pure geoscience applications. All of the analyses presented herein can be freely accessed through a web application (https://dzgrainalyzer.eoas.ubc.ca/) or open-source Julia code (https://github.com/MPF-Optimization-Laboratory/MatrixTensorFactor.jl).
@inproceedings{saylor2025endmember, title={Endmember modelling of detrital zircon petrochronology data via multivariate Tucker-1 tensor decomposition}, author={Saylor, Joel E and Richardson, Nicholas and Graham, Naomi and Lee, Robert G and Friedlander, Michael P}, booktitle={EGU General Assembly Conference Abstracts}, pages={EGU25--4081}, year={2025} }
- [abs]
[bib]
STARK denoises spatial transcriptomics images via adaptive regularization
S. Kubal, N. Graham, M. Heitz, A. Warren, M. P. Friedlander, Y. Plan, G. Schiebinger. arXiv:2512.10994,
2025.
[arXiv]
- YKYao Kuang MSc (2025)
Publications (2)
- [abs]
[bib]
Communication-efficient algorithms for decentralized multi-task learning
Y. Kuang. M.Sc. Thesis, UBC,
2025.
[UBC cIRcle]This thesis addresses distributed optimization challenges by proposing a randomized local coordination method. Rather than requiring full network synchronization, the approach has each node independently sample one regularizer uniformly and coordinate only with nodes sharing that term. For graph-guided regularizers, the method reduces expected communication to exactly 2 messages per iteration, regardless of network topology. Convergence rates comparable to centralized stochastic gradient descent are demonstrated through theoretical analysis and experimental validation.
@mastersthesis{Kuang2025Communication, Author = {Y. Kuang}, Year = {2025}, Month = {November}, Title = {Communication-efficient algorithms for decentralized multi-task learning} } - [abs]
[bib]
Decentralized Optimization with Topology-Independent Communication
Y. Lin, Y. Kuang, A. Alacaoglu, M. P. Friedlander. arXiv:2509.14488,
2025.Distributed optimization requires nodes to coordinate, yet full synchronization scales poorly. When $n$ nodes collaborate through $m$ pairwise regularizers, standard methods demand $\mathcal{O}(m)$ communications per iteration. This paper proposes randomized local coordination: each node independently samples one regularizer uniformly and coordinates only with nodes sharing that term. This exploits partial separability, where each regularizer $G_j$ depends on a subset $S_j \subseteq \{1,\ldots,n\}$ of nodes. For graph-guided regularizers where $|S_j|=2$, expected communication drops to exactly 2 messages per iteration. This method achieves $\tilde{\mathcal{O}}(\varepsilon^{-2})$ iterations for convex objectives and under strong convexity, $\mathcal{O}(\varepsilon^{-1})$ to an $\varepsilon$-solution and $\mathcal{O}(\log(1/\varepsilon))$ to a neighborhood. Replacing the proximal map of the sum $\sum_j G_j$ with the proximal map of a single randomly selected regularizer $G_j$ preserves convergence while eliminating global coordination. Experiments validate both convergence rates and communication efficiency across synthetic and real-world datasets.
@misc{LinKuangAlacaogluFriedlander2025, author = {Lin, Ying and Kuang, Yao and Alacaoglu, Ahmet and Friedlander, Michael P.}, title = {Decentralized Optimization with Topology-Independent Communication}, year = {2025}, eprint = {2509.14488}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, doi = {10.48550/arXiv.2509.14488} }
- [abs]
[bib]
Communication-efficient algorithms for decentralized multi-task learning
Y. Kuang. M.Sc. Thesis, UBC,
2025.
[UBC cIRcle]
- ZFZhenan Fan PhD (2022)Staff Research Engineer, Huawei Technologies Canada
Publications (12)
- [abs]
[bib]
Cardinality-constrained structured data-fitting problems
Z. Fan, H. Fang, M. P. Friedlander. Open J. Math. Optim.,
2023.
[Journal]
[DOI]A memory-efficient solution framework is proposed for the cardinality-constrained structured data-fitting problem. Dual-based atom-identification rules reveal the structure of the optimal primal solution from near-optimal dual solutions, which allows for a simple and computationally efficient algorithm that translates any feasible dual solution into a primal solution satisfying the cardinality constraint. Rigorous guarantees bound the quality of a near-optimal primal solution given any dual-based method that generates dual iterates converging to an optimal dual solution. Numerical experiments on real-world datasets support the analysis and demonstrate the efficiency of the proposed approach.
@misc{Fan2023Cardinality, Author = {Z. Fan and H. Fang and M. P. Friedlander}, Year = {2023}, Month = {July}, Doi = {10.48550/arxiv.2107.11373}, Title = {Cardinality-constrained structured data-fitting problems} } - [abs]
[bib]
Duality in structured and federated optimization: theory and applications
Z. Fan. Ph.D. Thesis, UBC,
2022.
[UBC cIRcle]This thesis explores dual approaches in mathematical optimization across two problem classes. The work characterizes dual correspondence in structured optimization to develop algorithms and models. It proposes a dual methodology for federated optimization with improved convergence properties compared to primal-based alternatives. Additionally, the thesis addresses contribution valuation in federated learning scenarios, emphasizing the role of structured optimization in fair and efficient metrics for both horizontal and vertical federated settings.
@phdthesis{Fan2022Duality, Author = {Z. Fan}, Year = {2022}, Month = {November}, Title = {Duality in structured and federated optimization: theory and applications} } - [abs]
[bib]
Knowledge-injected federated learning
Z. Fan, H. Fang, Z. Zhou, J. Pei, M. P. Friedlander, J. Hu, C. Li, Y. Zhang. arXiv:2208.07530,
2022.
[arXiv]
[GitHub]
[DOI]Federated learning is an emerging technique for training models from decentralized data sets. In many applications, data owners participating in the federated learning system hold not only the data but also a set of domain knowledge. Such knowledge includes human know-how and craftsmanship that can be extremely helpful to the federated learning task. In this work, we propose a federated learning framework that allows the injection of participants’ domain knowledge, where the key idea is to refine the global model with knowledge locally. The scenario we consider is motivated by a real industry-level application, and we demonstrate the effectiveness of our approach to this application.
@misc{Fan2022Knowledge, Author = {Z. Fan and H. Fang and Z. Zhou and J. Pei and M. P. Friedlander and J. Hu and C. Li and Y. Zhang}, Year = {2022}, Month = {August}, Doi = {10.48550/arxiv.2208.07530}, Title = {Knowledge-injected federated learning} } - [abs]
[bib]
Polar deconvolution of mixed signals
Z. Fan, H. Jeong, B. Joshi, M. P. Friedlander. IEEE Trans. Signal Processing,
2022.
[Journal]
[GitHub]The signal demixing problem seeks to separate a superposition of multiple signals into its constituent components. This paper describes a two-stage approach that first decompresses and subsequently deconvolves the noisy and undersampled observations of the superposition. Probabilistic error bounds are given on the accuracy with which this process approximates the individual signals. The theory of polar convolution of convex sets and gauge functions plays a central role in the analysis and solution process. If the measurements are random and the noise is bounded, this approach stably recovers low-complexity and mutually incoherent signals, with high probability and with optimal sample complexity. We develop an efficient algorithm, based on level-set and conditional-gradient methods, that solves the convex optimization problems with sublinear iteration complexity and linear space requirements. Numerical experiments on both real and synthetic data confirm the theory and the efficiency of the approach.
@article{fan2022deconvolution, author = {Z. Fan and H. Jeong and B. Joshi and M. P. Friedlander}, title = {Polar deconvolution of mixed signal}, journal = {IEEE Transactions on Signal Processing}, volume = {70}, pages = {2713--2727}, year = {2022}, month = {May}, doi = {10.1109/TSP.2022.3178191} } - [abs]
[bib]
A dual approach for federated learning
Z. Fan, H. Fang, M. P. Friedlander. arXiv 2201.11183,
2022.
[arXiv]
[GitHub]
[DOI]We study the federated optimization problem from a dual perspective and propose a new algorithm termed federated dual coordinate descent (FedDCD), which is based on a type of coordinate descent method developed by Necora et al. [Journal of Optimization Theory and Applications, 2017]. Additionally, we enhance the FedDCD method with inexact gradient oracles and Nesterov’s acceleration. We demonstrate theoretically that our proposed approach achieves better convergence rates than the state-of-the-art primal federated optimization algorithms under mild conditions. Numerical experiments on real-world datasets support our analysis.
@misc{Fan2022Dual, Author = {Z. Fan and H. Fang and M. P. Friedlander}, Year = {2022}, Month = {January}, Doi = {10.48550/arxiv.2201.11183}, Title = {A dual approach for federated learning} } - [abs]
[bib]
Improving fairness for data valuation in horizontal federated learning
Z. Fan, H. Fang, Z. Zhou, J. Pei, M. P. Friedlander, C. Liu, Y. Zhang. Intern Conf Data Engineering (ICDE),
2022.
[IEEE]
[arXiv]
[DOI]Federated learning is an emerging decentralized machine learning scheme that allows multiple data owners to work collaboratively while ensuring data privacy. The success of federated learning depends largely on the participation of data owners. To sustain and encourage data owners’ participation, it is crucial to fairly evaluate the quality of the data provided by the data owners and reward them correspondingly. Federated Shapley value, recently proposed by Wang et al. [Federated Learning, 2020], is a measure for data value under the framework of federated learning that satisfies many desired properties for data valuation. However, there are still factors of potential unfairness in the design of federated Shapley value because two data owners with the same local data may not receive the same evaluation. We propose a new measure called completed federated Shapley value to improve the fairness of federated Shapley value. The design depends on completing a matrix consisting of all the possible contributions by different subsets of the data owners. It is shown under mild conditions that this matrix is approximately low-rank by leveraging concepts and tools from optimization. Both theoretical analysis and empirical evaluation verify that the proposed measure does improve fairness in many circumstances.
@article{Fan2022Improving, Author = {Z. Fan and H. Fang and Z. Zhou and J. Pei and M. P. Friedlander and C. Liu and Y. Zhang}, Year = {2022}, Month = {January}, Journal = {2022 IEEE 38th International Conference on Data Engineering (ICDE)}, Pages = {2440-2453}, Doi = {10.1109/icde53745.2022.00228}, Title = {Improving fairness for data valuation in horizontal federated learning} } - [abs]
[bib]
Fair and efficient contribution valuation for vertical federated learning
Z. Fan, H. Fang, Z. Zhou, J. Pei, M. P. Friedlander, Y. Zhang. arXiv:2201.02658,
2022.
[arXiv]
[DOI]Federated learning is a popular technology for training machine learning models on distributed data sources without sharing data. Vertical federated learning or feature-based federated learning applies to the cases that different data sources share the same sample ID space but differ in feature space. To ensure the data owners’ long-term engagement, it is critical to objectively assess the contribution from each data source and recompense them accordingly. The Shapley value (SV) is a provably fair contribution valuation metric originated from cooperative game theory. However, computing the SV requires extensively retraining the model on each subset of data sources, which causes prohibitively high communication costs in federated learning. We propose a contribution valuation metric called vertical federated Shapley value (VerFedSV) based on SV. We show that VerFedSV not only satisfies many desirable properties for fairness but is also efficient to compute, and can be adapted to both synchronous and asynchronous vertical federated learning algorithms. Both theoretical analysis and extensive experimental results verify the fairness, efficiency, and adaptability of VerFedSV.
@misc{Fan2022Fair, Author = {Z. Fan and H. Fang and Z. Zhou and J. Pei and M. P. Friedlander and Y. Zhang}, Year = {2022}, Month = {January}, Doi = {10.48550/arxiv.2201.02658}, Title = {Fair and efficient contribution valuation for vertical federated learning} } - [abs]
Fast convergence of the stochastic subgradient method under interpolation
H. Fang, Z. Fan, M. P. Friedlander. Intern. Conf. Learning Representations (ICLR),
2021.This paper studies the behaviour of the stochastic subgradient descent (SSGD) method applied to over-parameterized nonsmooth optimization problems that satisfy an interpolation condition. By leveraging the composite structure of the empirical risk minimization problems, we prove that when interpolation holds, SSGD converges at the same rate as the stochastic gradient descent (SGD) method applied to smooth problems. Our analysis provides a partial explanation for the empirical observation that sometimes SGD and SSGD behave similarly for training smooth and nonsmooth machine learning models. We also prove that the rates we derive are optimal for the subgradient method in the convex and interpolation setting.
- [abs]
[bib]
Atomic decomposition via polar alignment: the geometry of structured optimization
Z. Fan, H. Jeong, Y. Sun, M. P. Friedlander. Foundations and Trends in Optimization, 3(4):280–366,
2020.
[Journal]
[DOI]Structured optimization uses a prescribed set of atoms to assemble a solution that fits a model to data. Polarity, which extends the familiar notion of orthogonality from linear sets to general convex sets, plays a special role in a simple and geometric form of convex duality. This duality correspondence yields a general notion of alignment that leads to an intuitive and complete description of how atoms participate in the final decomposition of the solution. The resulting geometric perspective leads to variations of existing algorithms effective for large-scale problems. We illustrate these ideas with many examples, including applications in matrix completion and morphological component analysis for the separation of mixtures of signals.
@article{Fan2020Atomic, Author = {Z. Fan and H. Jeong and Y. Sun and M. P. Friedlander}, Year = {2020}, Month = {November}, Journal = {Foundations and Trends® in Optimization}, Number = {4}, Volume = {3}, Pages = {280-366}, Doi = {10.1561/2400000028}, Title = {Atomic decomposition via polar alignment: the geometry of structured optimization} } - [abs]
[bib]
Greed meets sparsity: understanding and improving greedy coordinate descent for sparse optimization
H. Fang, Z. Fan, Y. Sun, M. P. Friedlander. Intern. Conf. Artificial Intelligence and Statistics (AISTATS),
2020.We consider greedy coordinate descent (GCD) for composite problems with sparsity inducing regularizers, including 1-norm regularization and non-negative constraints. Empirical evidence strongly suggests that GCD, when initialized with the zero vector, has an implicit screening ability that usually selects at each iteration coordinates that at are nonzero at the solution. Thus, for problems with sparse solutions, GCD can converge significantly faster than randomized coordinate descent. We present an improved convergence analysis of GCD for sparse optimization, and a formal analysis of its screening properties. We also propose and analyze an improved selection rule with stronger ability to produce sparse iterates. Numerical experiments on both synthetic and real-world data support our analysis and the effectiveness of the proposed selection rule.
@article{Fang2020Greed, Author = {H. Fang and Z. Fan and Y. Sun and M. P. Friedlander}, Year = {2020}, Month = {February}, Journal = {International Conference on Artificial Intelligence and Statistics}, Pages = {434-444}, Title = {Greed meets sparsity: understanding and improving greedy coordinate descent for sparse optimization} } - [abs]
[bib]
Bundle methods for dual atomic pursuit
Z. Fan, Y. Sun, M. P. Friedlander. Asilomar Conf. Signals, Systems, Computers (ACSSC),
2019.
[arXiv]
[DOI]The aim of structured optimization is to assemble a solution, using a given set of (possibly uncountably infinite) atoms, to fit a model to data. A two-stage algorithm based on gauge duality and bundle method is proposed. The first stage discovers the optimal atomic support for the primal problem by solving a sequence of approximations of the dual problem using a bundle-type method. The second stage recovers the approximate primal solution using the atoms discovered in the first stage. The overall approach leads to implementable and efficient algorithms for large problems.
@article{Fan2019Bundle, Author = {Z. Fan and Y. Sun and M. P. Friedlander}, Year = {2019}, Month = {November}, Volume = {59}, Pages = {264-270}, Doi = {10.1109/ieeeconf44664.2019.9048711}, Title = {Bundle methods for dual atomic pursuit} } - [bib]
Bundle-type methods for dual atomic pursuit
Z. Fan. M.Sc. Thesis, UBC,
2019.
[UBC cIRcle]
@mastersthesis{Fan2019Bundle, Author = {Z. Fan}, Year = {2019}, Month = {August}, Title = {Bundle-type methods for dual atomic pursuit} }
- [abs]
[bib]
Cardinality-constrained structured data-fitting problems
Z. Fan, H. Fang, M. P. Friedlander. Open J. Math. Optim.,
2023.
[Journal]
[DOI]
- MKMia Kramer MSc (2022)Scientific Software Engineer, 1QBit
Publications (1)
- [bib]
AtomicSets.jl: The calculus of support functions in Julia
M. Kramer. M.Sc. Thesis, UBC,
2022.
[UBC cIRcle]
@mastersthesis{Kramer2022AtomicSets, Author = {M. Kramer}, Year = {2022}, Month = {August}, Title = {AtomicSets.jl: The calculus of support functions in Julia} }
- [bib]
AtomicSets.jl: The calculus of support functions in Julia
M. Kramer. M.Sc. Thesis, UBC,
2022.
[UBC cIRcle]
- HFHuang Fang PhD (2021)
Publications (11)
- [abs]
[bib]
Cardinality-constrained structured data-fitting problems
Z. Fan, H. Fang, M. P. Friedlander. Open J. Math. Optim.,
2023.
[Journal]
[DOI]A memory-efficient solution framework is proposed for the cardinality-constrained structured data-fitting problem. Dual-based atom-identification rules reveal the structure of the optimal primal solution from near-optimal dual solutions, which allows for a simple and computationally efficient algorithm that translates any feasible dual solution into a primal solution satisfying the cardinality constraint. Rigorous guarantees bound the quality of a near-optimal primal solution given any dual-based method that generates dual iterates converging to an optimal dual solution. Numerical experiments on real-world datasets support the analysis and demonstrate the efficiency of the proposed approach.
@misc{Fan2023Cardinality, Author = {Z. Fan and H. Fang and M. P. Friedlander}, Year = {2023}, Month = {July}, Doi = {10.48550/arxiv.2107.11373}, Title = {Cardinality-constrained structured data-fitting problems} } - [abs]
[bib]
Knowledge-injected federated learning
Z. Fan, H. Fang, Z. Zhou, J. Pei, M. P. Friedlander, J. Hu, C. Li, Y. Zhang. arXiv:2208.07530,
2022.
[arXiv]
[GitHub]
[DOI]Federated learning is an emerging technique for training models from decentralized data sets. In many applications, data owners participating in the federated learning system hold not only the data but also a set of domain knowledge. Such knowledge includes human know-how and craftsmanship that can be extremely helpful to the federated learning task. In this work, we propose a federated learning framework that allows the injection of participants’ domain knowledge, where the key idea is to refine the global model with knowledge locally. The scenario we consider is motivated by a real industry-level application, and we demonstrate the effectiveness of our approach to this application.
@misc{Fan2022Knowledge, Author = {Z. Fan and H. Fang and Z. Zhou and J. Pei and M. P. Friedlander and J. Hu and C. Li and Y. Zhang}, Year = {2022}, Month = {August}, Doi = {10.48550/arxiv.2208.07530}, Title = {Knowledge-injected federated learning} } - [abs]
[bib]
Online mirror descent and dual averaging: keeping pace in the dynamic case
H. Fang, N J. A. Harvey, V. S. Portella, M. P. Friedlander. J. Machine Learning Research,
2022.Online mirror descent (OMD) and dual averaging (DA) are two fundamental algorithms for online convex optimization. They are known to have very similar (or even identical) performance guarantees in most scenarios when a fixed learning rate is used. However, for dynamic learning rates OMD is provably inferior to DA. It is known that, with a dynamic learning rate, OMD can suffer linear regret, even in common settings such as prediction with expert advice. This hints that the relationship between OMD and DA is not fully understood at present. In this paper, we modify the OMD algorithm by a simple technique that we call stabilization. We give essentially the same abstract regret bound for stabilized OMD and DA by modifying the classical OMD convergence analysis in a careful and modular way, yielding proofs that we believe to be clean and flexible. Simple corollaries of these bounds show that OMD with stabilization and DA enjoy the same performance guarantees in many applications even under dynamic learning rates. We also shed some light on the similarities between OMD and DA and show simple conditions under which stabilized OMD and DA generate the same iterates.
@article{fang2022online, author = {Huang Fang and Nicholas J. A. Harvey and Victor S. Portella and Michael P. Friedlander}, title = {Online Mirror Descent and Dual Averaging: Keeping Pace in the Dynamic Case}, journal = {Journal of Machine Learning Research}, year = {2022}, volume = {23}, number = {121}, pages = {1-38}, url = {http://jmlr.org/papers/v23/21-1027.html} } - [abs]
[bib]
A dual approach for federated learning
Z. Fan, H. Fang, M. P. Friedlander. arXiv 2201.11183,
2022.
[arXiv]
[GitHub]
[DOI]We study the federated optimization problem from a dual perspective and propose a new algorithm termed federated dual coordinate descent (FedDCD), which is based on a type of coordinate descent method developed by Necora et al. [Journal of Optimization Theory and Applications, 2017]. Additionally, we enhance the FedDCD method with inexact gradient oracles and Nesterov’s acceleration. We demonstrate theoretically that our proposed approach achieves better convergence rates than the state-of-the-art primal federated optimization algorithms under mild conditions. Numerical experiments on real-world datasets support our analysis.
@misc{Fan2022Dual, Author = {Z. Fan and H. Fang and M. P. Friedlander}, Year = {2022}, Month = {January}, Doi = {10.48550/arxiv.2201.11183}, Title = {A dual approach for federated learning} } - [abs]
[bib]
Improving fairness for data valuation in horizontal federated learning
Z. Fan, H. Fang, Z. Zhou, J. Pei, M. P. Friedlander, C. Liu, Y. Zhang. Intern Conf Data Engineering (ICDE),
2022.
[IEEE]
[arXiv]
[DOI]Federated learning is an emerging decentralized machine learning scheme that allows multiple data owners to work collaboratively while ensuring data privacy. The success of federated learning depends largely on the participation of data owners. To sustain and encourage data owners’ participation, it is crucial to fairly evaluate the quality of the data provided by the data owners and reward them correspondingly. Federated Shapley value, recently proposed by Wang et al. [Federated Learning, 2020], is a measure for data value under the framework of federated learning that satisfies many desired properties for data valuation. However, there are still factors of potential unfairness in the design of federated Shapley value because two data owners with the same local data may not receive the same evaluation. We propose a new measure called completed federated Shapley value to improve the fairness of federated Shapley value. The design depends on completing a matrix consisting of all the possible contributions by different subsets of the data owners. It is shown under mild conditions that this matrix is approximately low-rank by leveraging concepts and tools from optimization. Both theoretical analysis and empirical evaluation verify that the proposed measure does improve fairness in many circumstances.
@article{Fan2022Improving, Author = {Z. Fan and H. Fang and Z. Zhou and J. Pei and M. P. Friedlander and C. Liu and Y. Zhang}, Year = {2022}, Month = {January}, Journal = {2022 IEEE 38th International Conference on Data Engineering (ICDE)}, Pages = {2440-2453}, Doi = {10.1109/icde53745.2022.00228}, Title = {Improving fairness for data valuation in horizontal federated learning} } - [abs]
[bib]
Fair and efficient contribution valuation for vertical federated learning
Z. Fan, H. Fang, Z. Zhou, J. Pei, M. P. Friedlander, Y. Zhang. arXiv:2201.02658,
2022.
[arXiv]
[DOI]Federated learning is a popular technology for training machine learning models on distributed data sources without sharing data. Vertical federated learning or feature-based federated learning applies to the cases that different data sources share the same sample ID space but differ in feature space. To ensure the data owners’ long-term engagement, it is critical to objectively assess the contribution from each data source and recompense them accordingly. The Shapley value (SV) is a provably fair contribution valuation metric originated from cooperative game theory. However, computing the SV requires extensively retraining the model on each subset of data sources, which causes prohibitively high communication costs in federated learning. We propose a contribution valuation metric called vertical federated Shapley value (VerFedSV) based on SV. We show that VerFedSV not only satisfies many desirable properties for fairness but is also efficient to compute, and can be adapted to both synchronous and asynchronous vertical federated learning algorithms. Both theoretical analysis and extensive experimental results verify the fairness, efficiency, and adaptability of VerFedSV.
@misc{Fan2022Fair, Author = {Z. Fan and H. Fang and Z. Zhou and J. Pei and M. P. Friedlander and Y. Zhang}, Year = {2022}, Month = {January}, Doi = {10.48550/arxiv.2201.02658}, Title = {Fair and efficient contribution valuation for vertical federated learning} } - [bib]
First-order methods for structured optimization
H. Fang. Ph.D. Thesis, UBC,
2021.
@phdthesis{Fang2021FirstOrder, Author = {H. Fang}, Year = {2021}, Month = {August}, Title = {First-order methods for structured optimization} } - [abs]
Fast convergence of the stochastic subgradient method under interpolation
H. Fang, Z. Fan, M. P. Friedlander. Intern. Conf. Learning Representations (ICLR),
2021.This paper studies the behaviour of the stochastic subgradient descent (SSGD) method applied to over-parameterized nonsmooth optimization problems that satisfy an interpolation condition. By leveraging the composite structure of the empirical risk minimization problems, we prove that when interpolation holds, SSGD converges at the same rate as the stochastic gradient descent (SGD) method applied to smooth problems. Our analysis provides a partial explanation for the empirical observation that sometimes SGD and SSGD behave similarly for training smooth and nonsmooth machine learning models. We also prove that the rates we derive are optimal for the subgradient method in the convex and interpolation setting.
- [abs]
[bib]
Online mirror descent and dual averaging: keeping pace in the dynamic case
H. Fang, N J. A. Harvey, V. S. Portella, M. P. Friedlander. Intern. Conf. Machine Learning (ICML),
2020.
[arXiv]
[video]
[DOI]Online mirror descent (OMD) and dual averaging (DA) are two fundamental algorithms for online convex optimization. They are known to have very similar (or even identical) performance guarantees in most scenarios when a fixed learning rate is used. However, for dynamic learning rates OMD is provably inferior to DA. It is known that, with a dynamic learning rate, OMD can suffer linear regret, even in common settings such as prediction with expert advice. This hints that the relationship between OMD and DA is not fully understood at present. In this paper, we modify the OMD algorithm by a simple technique that we call stabilization. We give essentially the same abstract regret bound for stabilized OMD and DA by modifying the classical OMD convergence analysis in a careful and modular way, yielding proofs that we believe to be clean and flexible. Simple corollaries of these bounds show that OMD with stabilization and DA enjoy the same performance guarantees in many applications even under dynamic learning rates. We also shed some light on the similarities between OMD and DA and show simple conditions under which stabilized OMD and DA generate the same iterates.
@article{Fang2020Online, Author = {H. Fang and N J. A. Harvey and V. S. Portella and M. P. Friedlander}, Year = {2020}, Month = {June}, Journal = {International Conference on Machine Learning}, Volume = {1}, Pages = {3008-3017}, Title = {Online mirror descent and dual averaging: keeping pace in the dynamic case} } - [abs]
[bib]
Greed meets sparsity: understanding and improving greedy coordinate descent for sparse optimization
H. Fang, Z. Fan, Y. Sun, M. P. Friedlander. Intern. Conf. Artificial Intelligence and Statistics (AISTATS),
2020.We consider greedy coordinate descent (GCD) for composite problems with sparsity inducing regularizers, including 1-norm regularization and non-negative constraints. Empirical evidence strongly suggests that GCD, when initialized with the zero vector, has an implicit screening ability that usually selects at each iteration coordinates that at are nonzero at the solution. Thus, for problems with sparse solutions, GCD can converge significantly faster than randomized coordinate descent. We present an improved convergence analysis of GCD for sparse optimization, and a formal analysis of its screening properties. We also propose and analyze an improved selection rule with stronger ability to produce sparse iterates. Numerical experiments on both synthetic and real-world data support our analysis and the effectiveness of the proposed selection rule.
@article{Fang2020Greed, Author = {H. Fang and Z. Fan and Y. Sun and M. P. Friedlander}, Year = {2020}, Month = {February}, Journal = {International Conference on Artificial Intelligence and Statistics}, Pages = {434-444}, Title = {Greed meets sparsity: understanding and improving greedy coordinate descent for sparse optimization} } - [abs]
[bib]
Fast training for large-scale one-versus-all linear classifiers using tree-structured initialization
H. Fang, M. Cheng, C.-J. Hsieh, M. P. Friedlander. SIAM Intern. Conf. Data Mining (SDM),
2019.
[DOI]We consider the problem of training one-versus-all (OVA) linear classifiers for multiclass or multilabel classification when the number of labels is large. A naive extension of OVA to this problem, even with hundreds of cores, usually requires hours for training on large real world datasets. We propose a novel algorithm called OVA-Primal++ that speeds up the training of OVA by using a tree-structured training order, where each classifier is trained using its parent’s classifier as initialization. OVA-Primal++ is both theoretically and empirically faster than the naive OVA algorithm, and yet still enjoys the same highly parallelizability and small memory footprint. Extensive experiments on multiclass and multilabel classification datasets validate the effectiveness of our method.
@incollection{Fang2019Fast, Author = {H. Fang and M. Cheng and C.-J. Hsieh and M. P. Friedlander}, Year = {2019}, Month = {February}, Pages = {280-288}, Doi = {10.1137/1.9781611975673.32}, Title = {Fast training for large-scale one-versus-all linear classifiers using tree-structured initialization} }
- [abs]
[bib]
Cardinality-constrained structured data-fitting problems
Z. Fan, H. Fang, M. P. Friedlander. Open J. Math. Optim.,
2023.
[Journal]
[DOI]
- HJHalyun Jeong Postdoc (2021)Adjunct Assistant Professor, UCLA
Publications (3)
- [abs]
[bib]
Polar deconvolution of mixed signals
Z. Fan, H. Jeong, B. Joshi, M. P. Friedlander. IEEE Trans. Signal Processing,
2022.
[Journal]
[GitHub]The signal demixing problem seeks to separate a superposition of multiple signals into its constituent components. This paper describes a two-stage approach that first decompresses and subsequently deconvolves the noisy and undersampled observations of the superposition. Probabilistic error bounds are given on the accuracy with which this process approximates the individual signals. The theory of polar convolution of convex sets and gauge functions plays a central role in the analysis and solution process. If the measurements are random and the noise is bounded, this approach stably recovers low-complexity and mutually incoherent signals, with high probability and with optimal sample complexity. We develop an efficient algorithm, based on level-set and conditional-gradient methods, that solves the convex optimization problems with sublinear iteration complexity and linear space requirements. Numerical experiments on both real and synthetic data confirm the theory and the efficiency of the approach.
@article{fan2022deconvolution, author = {Z. Fan and H. Jeong and B. Joshi and M. P. Friedlander}, title = {Polar deconvolution of mixed signal}, journal = {IEEE Transactions on Signal Processing}, volume = {70}, pages = {2713--2727}, year = {2022}, month = {May}, doi = {10.1109/TSP.2022.3178191} } - [abs]
[bib]
NBIHT: An efficient algorithm for 1-bit compressed sensing with optimal error decay rate
M. P. Friedlander, H. Jeong, Y. Plan, O. Yιlmaz. IEEE Trans Info Theory,
2021.
[DOI]The Binary Iterative Hard Thresholding (BIHT) algorithm is a popular reconstruction method for one-bit compressed sensing due to its simplicity and fast empirical convergence. There have been several works about BIHT but a theoretical understanding of the corresponding approximation error and convergence rate still remains open. This paper shows that the normalized version of BIHT (NBHIT) achieves an approximation error rate optimal up to logarithmic factors. More precisely, using m one-bit measurements of an s-sparse vector x, we prove that the approximation error of NBIHT is of order Ο(1/m) up to logarithmic factors, which matches the information-theoretic lower bound Ω(1/m) proved by Jacques, Laska, Boufounos, and Baraniuk in 2013. To our knowledge, this is the first theoretical analysis of a BIHT-type algorithm that explains the optimal rate of error decay empirically observed in the literature. This also makes NBIHT the first provable computationally-efficient one-bit compressed sensing algorithm that breaks the inverse square root error decay rate Ο(1/√m).
@misc{Friedlander2021Nbiht, Author = {M. P. Friedlander and H. Jeong and Y. Plan and O. Yιlmaz}, Year = {2021}, Month = {November}, Number = {2}, Volume = {68}, Pages = {1157-1177}, Doi = {10.1109/tit.2021.3124598}, Title = {NBIHT: An efficient algorithm for 1-bit compressed sensing with optimal error decay rate} } - [abs]
[bib]
Atomic decomposition via polar alignment: the geometry of structured optimization
Z. Fan, H. Jeong, Y. Sun, M. P. Friedlander. Foundations and Trends in Optimization, 3(4):280–366,
2020.
[Journal]
[DOI]Structured optimization uses a prescribed set of atoms to assemble a solution that fits a model to data. Polarity, which extends the familiar notion of orthogonality from linear sets to general convex sets, plays a special role in a simple and geometric form of convex duality. This duality correspondence yields a general notion of alignment that leads to an intuitive and complete description of how atoms participate in the final decomposition of the solution. The resulting geometric perspective leads to variations of existing algorithms effective for large-scale problems. We illustrate these ideas with many examples, including applications in matrix completion and morphological component analysis for the separation of mixtures of signals.
@article{Fan2020Atomic, Author = {Z. Fan and H. Jeong and Y. Sun and M. P. Friedlander}, Year = {2020}, Month = {November}, Journal = {Foundations and Trends® in Optimization}, Number = {4}, Volume = {3}, Pages = {280-366}, Doi = {10.1561/2400000028}, Title = {Atomic decomposition via polar alignment: the geometry of structured optimization} }
- [abs]
[bib]
Polar deconvolution of mixed signals
Z. Fan, H. Jeong, B. Joshi, M. P. Friedlander. IEEE Trans. Signal Processing,
2022.
[Journal]
[GitHub]
- BJBahbru Joshi Postdoc (2021)
Publications (1)
- [abs]
[bib]
Polar deconvolution of mixed signals
Z. Fan, H. Jeong, B. Joshi, M. P. Friedlander. IEEE Trans. Signal Processing,
2022.
[Journal]
[GitHub]The signal demixing problem seeks to separate a superposition of multiple signals into its constituent components. This paper describes a two-stage approach that first decompresses and subsequently deconvolves the noisy and undersampled observations of the superposition. Probabilistic error bounds are given on the accuracy with which this process approximates the individual signals. The theory of polar convolution of convex sets and gauge functions plays a central role in the analysis and solution process. If the measurements are random and the noise is bounded, this approach stably recovers low-complexity and mutually incoherent signals, with high probability and with optimal sample complexity. We develop an efficient algorithm, based on level-set and conditional-gradient methods, that solves the convex optimization problems with sublinear iteration complexity and linear space requirements. Numerical experiments on both real and synthetic data confirm the theory and the efficiency of the approach.
@article{fan2022deconvolution, author = {Z. Fan and H. Jeong and B. Joshi and M. P. Friedlander}, title = {Polar deconvolution of mixed signal}, journal = {IEEE Transactions on Signal Processing}, volume = {70}, pages = {2713--2727}, year = {2022}, month = {May}, doi = {10.1109/TSP.2022.3178191} }
- [abs]
[bib]
Polar deconvolution of mixed signals
Z. Fan, H. Jeong, B. Joshi, M. P. Friedlander. IEEE Trans. Signal Processing,
2022.
[Journal]
[GitHub]
2010–2019
- YSYifan Sun Postdoc (2019)Assistant Professor, Stony Brook University
Publications (4)
- [abs]
[bib]
Atomic decomposition via polar alignment: the geometry of structured optimization
Z. Fan, H. Jeong, Y. Sun, M. P. Friedlander. Foundations and Trends in Optimization, 3(4):280–366,
2020.
[Journal]
[DOI]Structured optimization uses a prescribed set of atoms to assemble a solution that fits a model to data. Polarity, which extends the familiar notion of orthogonality from linear sets to general convex sets, plays a special role in a simple and geometric form of convex duality. This duality correspondence yields a general notion of alignment that leads to an intuitive and complete description of how atoms participate in the final decomposition of the solution. The resulting geometric perspective leads to variations of existing algorithms effective for large-scale problems. We illustrate these ideas with many examples, including applications in matrix completion and morphological component analysis for the separation of mixtures of signals.
@article{Fan2020Atomic, Author = {Z. Fan and H. Jeong and Y. Sun and M. P. Friedlander}, Year = {2020}, Month = {November}, Journal = {Foundations and Trends® in Optimization}, Number = {4}, Volume = {3}, Pages = {280-366}, Doi = {10.1561/2400000028}, Title = {Atomic decomposition via polar alignment: the geometry of structured optimization} } - [abs]
[bib]
Greed meets sparsity: understanding and improving greedy coordinate descent for sparse optimization
H. Fang, Z. Fan, Y. Sun, M. P. Friedlander. Intern. Conf. Artificial Intelligence and Statistics (AISTATS),
2020.We consider greedy coordinate descent (GCD) for composite problems with sparsity inducing regularizers, including 1-norm regularization and non-negative constraints. Empirical evidence strongly suggests that GCD, when initialized with the zero vector, has an implicit screening ability that usually selects at each iteration coordinates that at are nonzero at the solution. Thus, for problems with sparse solutions, GCD can converge significantly faster than randomized coordinate descent. We present an improved convergence analysis of GCD for sparse optimization, and a formal analysis of its screening properties. We also propose and analyze an improved selection rule with stronger ability to produce sparse iterates. Numerical experiments on both synthetic and real-world data support our analysis and the effectiveness of the proposed selection rule.
@article{Fang2020Greed, Author = {H. Fang and Z. Fan and Y. Sun and M. P. Friedlander}, Year = {2020}, Month = {February}, Journal = {International Conference on Artificial Intelligence and Statistics}, Pages = {434-444}, Title = {Greed meets sparsity: understanding and improving greedy coordinate descent for sparse optimization} } - [abs]
[bib]
Bundle methods for dual atomic pursuit
Z. Fan, Y. Sun, M. P. Friedlander. Asilomar Conf. Signals, Systems, Computers (ACSSC),
2019.
[arXiv]
[DOI]The aim of structured optimization is to assemble a solution, using a given set of (possibly uncountably infinite) atoms, to fit a model to data. A two-stage algorithm based on gauge duality and bundle method is proposed. The first stage discovers the optimal atomic support for the primal problem by solving a sequence of approximations of the dual problem using a bundle-type method. The second stage recovers the approximate primal solution using the atoms discovered in the first stage. The overall approach leads to implementable and efficient algorithms for large problems.
@article{Fan2019Bundle, Author = {Z. Fan and Y. Sun and M. P. Friedlander}, Year = {2019}, Month = {November}, Volume = {59}, Pages = {264-270}, Doi = {10.1109/ieeeconf44664.2019.9048711}, Title = {Bundle methods for dual atomic pursuit} } - [abs]
[bib]
One-shot atomic detection
Y. Sun, M. P. Friedlander. IEEE Intern. Workshop Comput. Adv. Multi-Sensor Adaptive Proc. (CAMSAP),
2019.
[DOI]Feature selection in data science involves identifying the most prominent and uncorrelated features in the data, which can be useful for compression and interpretability. If these feature can be easily extracted, then a model can be trained over a reduced set of weights, which leads to more efficient training and possibly more robust classifiers. There are many approaches to feature selection; in this work, we propose screening the “atoms” of a gradient of a loss function taken at a random point. We illustrate this approach on sparse and low-rank optimization problems. Despite the simplicity of the approach, we are often able to select the dominant features easily, and greatly improve the runtime and robustness in training overparametrized models.
@article{Sun2019One, Author = {Y. Sun and M. P. Friedlander}, Year = {2019}, Month = {September}, Volume = {16}, Pages = {1-5}, Doi = {10.1109/camsap45676.2019.9022441}, Title = {One-shot atomic detection} }
- [abs]
[bib]
Atomic decomposition via polar alignment: the geometry of structured optimization
Z. Fan, H. Jeong, Y. Sun, M. P. Friedlander. Foundations and Trends in Optimization, 3(4):280–366,
2020.
[Journal]
[DOI]
- ZFZhenan Fan MSc (2019)Staff Research Engineer, Huawei Technologies Canada
Publications (12)
- [abs]
[bib]
Cardinality-constrained structured data-fitting problems
Z. Fan, H. Fang, M. P. Friedlander. Open J. Math. Optim.,
2023.
[Journal]
[DOI]A memory-efficient solution framework is proposed for the cardinality-constrained structured data-fitting problem. Dual-based atom-identification rules reveal the structure of the optimal primal solution from near-optimal dual solutions, which allows for a simple and computationally efficient algorithm that translates any feasible dual solution into a primal solution satisfying the cardinality constraint. Rigorous guarantees bound the quality of a near-optimal primal solution given any dual-based method that generates dual iterates converging to an optimal dual solution. Numerical experiments on real-world datasets support the analysis and demonstrate the efficiency of the proposed approach.
@misc{Fan2023Cardinality, Author = {Z. Fan and H. Fang and M. P. Friedlander}, Year = {2023}, Month = {July}, Doi = {10.48550/arxiv.2107.11373}, Title = {Cardinality-constrained structured data-fitting problems} } - [abs]
[bib]
Duality in structured and federated optimization: theory and applications
Z. Fan. Ph.D. Thesis, UBC,
2022.
[UBC cIRcle]This thesis explores dual approaches in mathematical optimization across two problem classes. The work characterizes dual correspondence in structured optimization to develop algorithms and models. It proposes a dual methodology for federated optimization with improved convergence properties compared to primal-based alternatives. Additionally, the thesis addresses contribution valuation in federated learning scenarios, emphasizing the role of structured optimization in fair and efficient metrics for both horizontal and vertical federated settings.
@phdthesis{Fan2022Duality, Author = {Z. Fan}, Year = {2022}, Month = {November}, Title = {Duality in structured and federated optimization: theory and applications} } - [abs]
[bib]
Knowledge-injected federated learning
Z. Fan, H. Fang, Z. Zhou, J. Pei, M. P. Friedlander, J. Hu, C. Li, Y. Zhang. arXiv:2208.07530,
2022.
[arXiv]
[GitHub]
[DOI]Federated learning is an emerging technique for training models from decentralized data sets. In many applications, data owners participating in the federated learning system hold not only the data but also a set of domain knowledge. Such knowledge includes human know-how and craftsmanship that can be extremely helpful to the federated learning task. In this work, we propose a federated learning framework that allows the injection of participants’ domain knowledge, where the key idea is to refine the global model with knowledge locally. The scenario we consider is motivated by a real industry-level application, and we demonstrate the effectiveness of our approach to this application.
@misc{Fan2022Knowledge, Author = {Z. Fan and H. Fang and Z. Zhou and J. Pei and M. P. Friedlander and J. Hu and C. Li and Y. Zhang}, Year = {2022}, Month = {August}, Doi = {10.48550/arxiv.2208.07530}, Title = {Knowledge-injected federated learning} } - [abs]
[bib]
Polar deconvolution of mixed signals
Z. Fan, H. Jeong, B. Joshi, M. P. Friedlander. IEEE Trans. Signal Processing,
2022.
[Journal]
[GitHub]The signal demixing problem seeks to separate a superposition of multiple signals into its constituent components. This paper describes a two-stage approach that first decompresses and subsequently deconvolves the noisy and undersampled observations of the superposition. Probabilistic error bounds are given on the accuracy with which this process approximates the individual signals. The theory of polar convolution of convex sets and gauge functions plays a central role in the analysis and solution process. If the measurements are random and the noise is bounded, this approach stably recovers low-complexity and mutually incoherent signals, with high probability and with optimal sample complexity. We develop an efficient algorithm, based on level-set and conditional-gradient methods, that solves the convex optimization problems with sublinear iteration complexity and linear space requirements. Numerical experiments on both real and synthetic data confirm the theory and the efficiency of the approach.
@article{fan2022deconvolution, author = {Z. Fan and H. Jeong and B. Joshi and M. P. Friedlander}, title = {Polar deconvolution of mixed signal}, journal = {IEEE Transactions on Signal Processing}, volume = {70}, pages = {2713--2727}, year = {2022}, month = {May}, doi = {10.1109/TSP.2022.3178191} } - [abs]
[bib]
A dual approach for federated learning
Z. Fan, H. Fang, M. P. Friedlander. arXiv 2201.11183,
2022.
[arXiv]
[GitHub]
[DOI]We study the federated optimization problem from a dual perspective and propose a new algorithm termed federated dual coordinate descent (FedDCD), which is based on a type of coordinate descent method developed by Necora et al. [Journal of Optimization Theory and Applications, 2017]. Additionally, we enhance the FedDCD method with inexact gradient oracles and Nesterov’s acceleration. We demonstrate theoretically that our proposed approach achieves better convergence rates than the state-of-the-art primal federated optimization algorithms under mild conditions. Numerical experiments on real-world datasets support our analysis.
@misc{Fan2022Dual, Author = {Z. Fan and H. Fang and M. P. Friedlander}, Year = {2022}, Month = {January}, Doi = {10.48550/arxiv.2201.11183}, Title = {A dual approach for federated learning} } - [abs]
[bib]
Improving fairness for data valuation in horizontal federated learning
Z. Fan, H. Fang, Z. Zhou, J. Pei, M. P. Friedlander, C. Liu, Y. Zhang. Intern Conf Data Engineering (ICDE),
2022.
[IEEE]
[arXiv]
[DOI]Federated learning is an emerging decentralized machine learning scheme that allows multiple data owners to work collaboratively while ensuring data privacy. The success of federated learning depends largely on the participation of data owners. To sustain and encourage data owners’ participation, it is crucial to fairly evaluate the quality of the data provided by the data owners and reward them correspondingly. Federated Shapley value, recently proposed by Wang et al. [Federated Learning, 2020], is a measure for data value under the framework of federated learning that satisfies many desired properties for data valuation. However, there are still factors of potential unfairness in the design of federated Shapley value because two data owners with the same local data may not receive the same evaluation. We propose a new measure called completed federated Shapley value to improve the fairness of federated Shapley value. The design depends on completing a matrix consisting of all the possible contributions by different subsets of the data owners. It is shown under mild conditions that this matrix is approximately low-rank by leveraging concepts and tools from optimization. Both theoretical analysis and empirical evaluation verify that the proposed measure does improve fairness in many circumstances.
@article{Fan2022Improving, Author = {Z. Fan and H. Fang and Z. Zhou and J. Pei and M. P. Friedlander and C. Liu and Y. Zhang}, Year = {2022}, Month = {January}, Journal = {2022 IEEE 38th International Conference on Data Engineering (ICDE)}, Pages = {2440-2453}, Doi = {10.1109/icde53745.2022.00228}, Title = {Improving fairness for data valuation in horizontal federated learning} } - [abs]
[bib]
Fair and efficient contribution valuation for vertical federated learning
Z. Fan, H. Fang, Z. Zhou, J. Pei, M. P. Friedlander, Y. Zhang. arXiv:2201.02658,
2022.
[arXiv]
[DOI]Federated learning is a popular technology for training machine learning models on distributed data sources without sharing data. Vertical federated learning or feature-based federated learning applies to the cases that different data sources share the same sample ID space but differ in feature space. To ensure the data owners’ long-term engagement, it is critical to objectively assess the contribution from each data source and recompense them accordingly. The Shapley value (SV) is a provably fair contribution valuation metric originated from cooperative game theory. However, computing the SV requires extensively retraining the model on each subset of data sources, which causes prohibitively high communication costs in federated learning. We propose a contribution valuation metric called vertical federated Shapley value (VerFedSV) based on SV. We show that VerFedSV not only satisfies many desirable properties for fairness but is also efficient to compute, and can be adapted to both synchronous and asynchronous vertical federated learning algorithms. Both theoretical analysis and extensive experimental results verify the fairness, efficiency, and adaptability of VerFedSV.
@misc{Fan2022Fair, Author = {Z. Fan and H. Fang and Z. Zhou and J. Pei and M. P. Friedlander and Y. Zhang}, Year = {2022}, Month = {January}, Doi = {10.48550/arxiv.2201.02658}, Title = {Fair and efficient contribution valuation for vertical federated learning} } - [abs]
Fast convergence of the stochastic subgradient method under interpolation
H. Fang, Z. Fan, M. P. Friedlander. Intern. Conf. Learning Representations (ICLR),
2021.This paper studies the behaviour of the stochastic subgradient descent (SSGD) method applied to over-parameterized nonsmooth optimization problems that satisfy an interpolation condition. By leveraging the composite structure of the empirical risk minimization problems, we prove that when interpolation holds, SSGD converges at the same rate as the stochastic gradient descent (SGD) method applied to smooth problems. Our analysis provides a partial explanation for the empirical observation that sometimes SGD and SSGD behave similarly for training smooth and nonsmooth machine learning models. We also prove that the rates we derive are optimal for the subgradient method in the convex and interpolation setting.
- [abs]
[bib]
Atomic decomposition via polar alignment: the geometry of structured optimization
Z. Fan, H. Jeong, Y. Sun, M. P. Friedlander. Foundations and Trends in Optimization, 3(4):280–366,
2020.
[Journal]
[DOI]Structured optimization uses a prescribed set of atoms to assemble a solution that fits a model to data. Polarity, which extends the familiar notion of orthogonality from linear sets to general convex sets, plays a special role in a simple and geometric form of convex duality. This duality correspondence yields a general notion of alignment that leads to an intuitive and complete description of how atoms participate in the final decomposition of the solution. The resulting geometric perspective leads to variations of existing algorithms effective for large-scale problems. We illustrate these ideas with many examples, including applications in matrix completion and morphological component analysis for the separation of mixtures of signals.
@article{Fan2020Atomic, Author = {Z. Fan and H. Jeong and Y. Sun and M. P. Friedlander}, Year = {2020}, Month = {November}, Journal = {Foundations and Trends® in Optimization}, Number = {4}, Volume = {3}, Pages = {280-366}, Doi = {10.1561/2400000028}, Title = {Atomic decomposition via polar alignment: the geometry of structured optimization} } - [abs]
[bib]
Greed meets sparsity: understanding and improving greedy coordinate descent for sparse optimization
H. Fang, Z. Fan, Y. Sun, M. P. Friedlander. Intern. Conf. Artificial Intelligence and Statistics (AISTATS),
2020.We consider greedy coordinate descent (GCD) for composite problems with sparsity inducing regularizers, including 1-norm regularization and non-negative constraints. Empirical evidence strongly suggests that GCD, when initialized with the zero vector, has an implicit screening ability that usually selects at each iteration coordinates that at are nonzero at the solution. Thus, for problems with sparse solutions, GCD can converge significantly faster than randomized coordinate descent. We present an improved convergence analysis of GCD for sparse optimization, and a formal analysis of its screening properties. We also propose and analyze an improved selection rule with stronger ability to produce sparse iterates. Numerical experiments on both synthetic and real-world data support our analysis and the effectiveness of the proposed selection rule.
@article{Fang2020Greed, Author = {H. Fang and Z. Fan and Y. Sun and M. P. Friedlander}, Year = {2020}, Month = {February}, Journal = {International Conference on Artificial Intelligence and Statistics}, Pages = {434-444}, Title = {Greed meets sparsity: understanding and improving greedy coordinate descent for sparse optimization} } - [abs]
[bib]
Bundle methods for dual atomic pursuit
Z. Fan, Y. Sun, M. P. Friedlander. Asilomar Conf. Signals, Systems, Computers (ACSSC),
2019.
[arXiv]
[DOI]The aim of structured optimization is to assemble a solution, using a given set of (possibly uncountably infinite) atoms, to fit a model to data. A two-stage algorithm based on gauge duality and bundle method is proposed. The first stage discovers the optimal atomic support for the primal problem by solving a sequence of approximations of the dual problem using a bundle-type method. The second stage recovers the approximate primal solution using the atoms discovered in the first stage. The overall approach leads to implementable and efficient algorithms for large problems.
@article{Fan2019Bundle, Author = {Z. Fan and Y. Sun and M. P. Friedlander}, Year = {2019}, Month = {November}, Volume = {59}, Pages = {264-270}, Doi = {10.1109/ieeeconf44664.2019.9048711}, Title = {Bundle methods for dual atomic pursuit} } - [bib]
Bundle-type methods for dual atomic pursuit
Z. Fan. M.Sc. Thesis, UBC,
2019.
[UBC cIRcle]
@mastersthesis{Fan2019Bundle, Author = {Z. Fan}, Year = {2019}, Month = {August}, Title = {Bundle-type methods for dual atomic pursuit} }
- [abs]
[bib]
Cardinality-constrained structured data-fitting problems
Z. Fan, H. Fang, M. P. Friedlander. Open J. Math. Optim.,
2023.
[Journal]
[DOI]
- CBCasie Bao MSc (2019)
Publications (1)
- [bib]
An accelerated dual method for SPGL1
C. Bao. M.Sc. Thesis, UBC,
2019.
[UBC cIRcle]
@mastersthesis{Bao2019SPGL1, Author = {C. Bao}, Year = {2019}, Month = {April}, Title = {An accelerated dual method for SPGL1} }
- [bib]
An accelerated dual method for SPGL1
C. Bao. M.Sc. Thesis, UBC,
2019.
[UBC cIRcle]
- LLLiran Li MSc (2019)
- GGGabriel Goh PhD (2017)Scientist, OpenAI
Publications (4)
- [bib]
Optimization with costly subgradients
G. Goh. Ph.D. Thesis, UBC,
2017.
@phdthesis{Goh2017Optimization, Author = {G. Goh}, Year = {2017}, Month = {August}, Title = {Optimization with costly subgradients} } - [abs]
[bib]
Efficient evaluation of scaled proximal operators
M. P. Friedlander, G. Goh. Electronic Trans. Numerical Analysis, 46:1–22,
2017.
[DOI]Quadratic-support functions, cf. Aravkin, Burke, and Pillonetto [J. Mach. Learn. Res., 14 (2013)], constitute a parametric family of convex functions that includes a range of useful regularization terms found in applications of convex optimization. We show how an interior method can be used to efficiently compute the proximal operator of a quadratic-support function under different metrics. When the metric and the function have the right structure, the proximal map can be computed with costs nearly linear in the input size. We describe how to use this approach to implement quasi-Newton methods for a rich class of nonsmooth problems that arise, for example, in sparse optimization, image denoising, and sparse logistic regression.
@misc{Friedlander2017Efficient, Author = {M. P. Friedlander and G. Goh}, Year = {2017}, Month = {February}, Doi = {10.48550/arxiv.1603.05719}, Title = {Efficient evaluation of scaled proximal operators} } - [abs]
[bib]
Satisfying real-world goals with dataset constraints
G. Goh, A. Cotter, M. Gupta, M. P. Friedlander. Advances in Neural Information Processing Systems 29 (NIPS),
2016.
[DOI]The goal of minimizing misclassification error on a training set is often just one of several real-world goals that might be defined on different datasets. For example, one may require a classifier to also make positive predictions at some specified rate for some subpopulation (fairness), or to achieve a specified empirical recall. Other real-world goals include reducing churn with respect to a previously deployed model, or stabilizing online training. In this paper we propose handling multiple goals on multiple datasets by training with dataset constraints, using the ramp penalty to accurately quantify costs, and present an efficient algorithm to approximately optimize the resulting non-convex constrained optimization problem. Experiments on both benchmark and real-world industry datasets demonstrate the effectiveness of our approach.
@article{Goh2016Satisfying, Author = {G. Goh and A. Cotter and M. Gupta and M. P. Friedlander}, Year = {2016}, Month = {December}, Journal = {Neural Information Processing Systems}, Volume = {29}, Pages = {2423-2431}, Title = {Satisfying real-world goals with dataset constraints} } - [abs]
[bib]
Tail bounds for stochastic approximation
M. P. Friedlander, G. Goh. arXiv:1304.5586,
2013.
[arXiv]
[DOI]Stochastic-approximation gradient methods are attractive for large-scale convex optimization because they offer inexpensive iterations. They are especially popular in data-fitting and machine-learning applications where the data arrives in a continuous stream, or it is necessary to minimize large sums of functions. It is known that by appropriately decreasing the variance of the error at each iteration, the expected rate of convergence matches that of the underlying deterministic gradient method. Conditions are given under which this happens with overwhelming probability.
@misc{Friedlander2013Tail, Author = {M. P. Friedlander and G. Goh}, Year = {2013}, Month = {April}, Doi = {10.48550/arxiv.1304.5586}, Title = {Tail bounds for stochastic approximation} }
- [bib]
Optimization with costly subgradients
G. Goh. Ph.D. Thesis, UBC,
2017.
- IMIves Macêdo PhD (2015)Staff Machine Learning Scientist and AI Engineer, Ada
Publications (4)
- [abs]
[bib]
Polar convolution
M. P. Friedlander, I. Macêdo, T. K. Pong. SIAM J. Optimization, 29(2):1366–1391,
2019.
[DOI]The Moreau envelope is one of the key convexity-preserving functional operations in convex analysis, and it is central to the development and analysis of many approaches for solving convex optimization problems. This paper develops the theory for a parallel convolution operation, called the polar envelope, specialized to gauge functions. We show that many important properties of the Moreau envelope and the proximal map are mirrored by the polar envelope and its corresponding proximal map. These properties include smoothness of the envelope function, uniqueness and continuity of the proximal map, a role in duality and in the construction of algorithms for gauge optimization. We thus establish a suite of tools with which to build algorithms for this family of optimization problems.
@article{Friedlander2019Polar, Author = {M. P. Friedlander and I. Macêdo and T. K. Pong}, Year = {2019}, Month = {February}, Journal = {SIAM Journal on Optimization}, Number = {2}, Volume = {29}, Pages = {1366-1391}, Doi = {10.1137/18m1209088}, Title = {Polar convolution} } - [abs]
[bib]
Low-rank spectral optimization via gauge duality
M. P. Friedlander, I. Macêdo. SIAM J. Scientific Computing, 38(3):A1616–A1638,
2016.
[DOI]
[more]Various applications in signal processing and machine learning give rise to highly structured spectral optimization problems characterized by low-rank solutions. Two important examples that motivate this work are optimization problems from phase retrieval and from blind deconvolution, which are designed to yield rank-1 solutions. An algorithm is described that is based on solving a certain constrained eigenvalue optimization problem that corresponds to the gauge dual which, unlike the more typical Lagrange dual, has an especially simple constraint. The dominant cost at each iteration is the computation of rightmost eigenpairs of a Hermitian operator. A range of numerical examples illustrate the scalability of the approach.
@article{Friedlander2016Low, Author = {M. P. Friedlander and I. Macêdo}, Year = {2016}, Month = {June}, Journal = {SIAM Journal on Scientific Computing}, Number = {3}, Volume = {38}, Pages = {A1616-A1638}, Doi = {10.1137/15m1034283}, Title = {Low-rank spectral optimization via gauge duality} } - [abs]
[bib]
Gauge duality and low-rank spectral optimization
I. Macêdo. Ph.D. Thesis, UBC,
2015.
[UBC cIRcle]This research addresses low-rank spectral optimization through gauge duality theory. The thesis connects theoretical frameworks with practical eigenvalue problems, focusing on trace minimization in the cone of positive semidefinite matrices and affine nuclear norm minimization. The work demonstrates applications to phase recovery and image deconvolution.
@phdthesis{Macedo2015Gauge, Author = {I. Macêdo}, Year = {2015}, Month = {August}, Title = {Gauge duality and low-rank spectral optimization} } - [abs]
[bib]
Gauge optimization and duality
M. P. Friedlander, I. Macêdo, T. K. Pong. SIAM J. Optimization, 24(4):1999–2022,
2014.Gauge functions significantly generalize the notion of a norm, and gauge optimization, as defined by Freund (1987), seeks the element of a convex set that is minimal with respect to a gauge function. This conceptually simple problem can be used to model a remarkable array of useful problems, including a special case of conic optimization, and related problems that arise in machine learning and signal processing. The gauge structure of these problems allows for a special kind of duality framework. This paper explores the duality framework proposed by Freund, and proposes a particular form of the problem that exposes some useful properties of the gauge optimization framework (such as the variational properties of its value function), and yet maintains most of the generality of the abstract form of gauge optimization.
@Article{FriedlanderMacedoPong:2014, author = {M. P. Friedlander and I. Mac\\^edo and T. K. Pong}, title = {Gauge optimization and duality}, year = 2014, journal = {SIAM Journal on Optimization}, number = 4, volume = 24, pages = {1999-2022}, doi = {10.1137/130940785} url = {http://epubs.siam.org/doi/abs/10.1137/130940785} }
- [abs]
[bib]
Polar convolution
M. P. Friedlander, I. Macêdo, T. K. Pong. SIAM J. Optimization, 29(2):1366–1391,
2019.
[DOI]
- JEJohn "Ernie" Esser Postdoc (2014)In Memoriam (2015)
- TPTing Kei Pong Postdoc (2014)Professor, Hong Kong Polytechnic University
Publications (3)
- [abs]
[bib]
Polar convolution
M. P. Friedlander, I. Macêdo, T. K. Pong. SIAM J. Optimization, 29(2):1366–1391,
2019.
[DOI]The Moreau envelope is one of the key convexity-preserving functional operations in convex analysis, and it is central to the development and analysis of many approaches for solving convex optimization problems. This paper develops the theory for a parallel convolution operation, called the polar envelope, specialized to gauge functions. We show that many important properties of the Moreau envelope and the proximal map are mirrored by the polar envelope and its corresponding proximal map. These properties include smoothness of the envelope function, uniqueness and continuity of the proximal map, a role in duality and in the construction of algorithms for gauge optimization. We thus establish a suite of tools with which to build algorithms for this family of optimization problems.
@article{Friedlander2019Polar, Author = {M. P. Friedlander and I. Macêdo and T. K. Pong}, Year = {2019}, Month = {February}, Journal = {SIAM Journal on Optimization}, Number = {2}, Volume = {29}, Pages = {1366-1391}, Doi = {10.1137/18m1209088}, Title = {Polar convolution} } - [abs]
[bib]
Social resistance
M. P. Friedlander, N. Krislock, T. K. Pong. IEEE Comput. Sci. Eng., 8(2):98-103; reprinted in Computing Edge,
2016.
[DOI]
[Computing Edge]This case study concerns an important question in social networks: How closely related are two people in a network? We measure the strength of the relationship between two people in terms of their resistance distance, a notion of distance that has its origin in the study of electrical networks. In that context, the resistance distance measures the electrical resistance between the two nodes in a network, assuming unit resistance on each edge. Its relationship with the information distance in social networks was recently identified. For large networks, this distance measure can be expensive to compute. As the network changes, it’s useful to have an inexpensive way to make updates. We’ll study how various techniques for updating matrix factorizations can help efficiently compute the resistance distances with small changes in the network. We focus on Cholesky rank-one updates, and also briefly discuss the Sherman-Morrison-Woodbury formula (as one of the last exercises). We expect that you’re familiar with the idea of the Cholesky factorization of positive definite matrices. We’ll guide you through a derivation of the formula for the Cholesky rank-one update, and introduce two aspects that maybe new for you. First, you’ll have to figure out the vector used for the rank-one update when one additional connection in the network is identified. Second, when matrices are large, you’ll observe that there can be a big difference in computational effort when the computation involves matrix-matrix multiplications rather than only matrix-vector multiplications.
@article{Friedlander2016Social, Author = {M. P. Friedlander and N. Krislock and T. K. Pong}, Year = {2016}, Month = {August}, Journal = {Computing in Science & Engineering}, Number = {2}, Volume = {18}, Pages = {98-103}, Doi = {10.1109/mcse.2016.30}, Title = {Social resistance} } - [abs]
[bib]
Gauge optimization and duality
M. P. Friedlander, I. Macêdo, T. K. Pong. SIAM J. Optimization, 24(4):1999–2022,
2014.Gauge functions significantly generalize the notion of a norm, and gauge optimization, as defined by Freund (1987), seeks the element of a convex set that is minimal with respect to a gauge function. This conceptually simple problem can be used to model a remarkable array of useful problems, including a special case of conic optimization, and related problems that arise in machine learning and signal processing. The gauge structure of these problems allows for a special kind of duality framework. This paper explores the duality framework proposed by Freund, and proposes a particular form of the problem that exposes some useful properties of the gauge optimization framework (such as the variational properties of its value function), and yet maintains most of the generality of the abstract form of gauge optimization.
@Article{FriedlanderMacedoPong:2014, author = {M. P. Friedlander and I. Mac\\^edo and T. K. Pong}, title = {Gauge optimization and duality}, year = 2014, journal = {SIAM Journal on Optimization}, number = 4, volume = 24, pages = {1999-2022}, doi = {10.1137/130940785} url = {http://epubs.siam.org/doi/abs/10.1137/130940785} }
- [abs]
[bib]
Polar convolution
M. P. Friedlander, I. Macêdo, T. K. Pong. SIAM J. Optimization, 29(2):1366–1391,
2019.
[DOI]
- NKNathan Krislock Postdoc (2013)Associate Professor, Northern Illinois University
Publications (1)
- [abs]
[bib]
Social resistance
M. P. Friedlander, N. Krislock, T. K. Pong. IEEE Comput. Sci. Eng., 8(2):98-103; reprinted in Computing Edge,
2016.
[DOI]
[Computing Edge]This case study concerns an important question in social networks: How closely related are two people in a network? We measure the strength of the relationship between two people in terms of their resistance distance, a notion of distance that has its origin in the study of electrical networks. In that context, the resistance distance measures the electrical resistance between the two nodes in a network, assuming unit resistance on each edge. Its relationship with the information distance in social networks was recently identified. For large networks, this distance measure can be expensive to compute. As the network changes, it’s useful to have an inexpensive way to make updates. We’ll study how various techniques for updating matrix factorizations can help efficiently compute the resistance distances with small changes in the network. We focus on Cholesky rank-one updates, and also briefly discuss the Sherman-Morrison-Woodbury formula (as one of the last exercises). We expect that you’re familiar with the idea of the Cholesky factorization of positive definite matrices. We’ll guide you through a derivation of the formula for the Cholesky rank-one update, and introduce two aspects that maybe new for you. First, you’ll have to figure out the vector used for the rank-one update when one additional connection in the network is identified. Second, when matrices are large, you’ll observe that there can be a big difference in computational effort when the computation involves matrix-matrix multiplications rather than only matrix-vector multiplications.
@article{Friedlander2016Social, Author = {M. P. Friedlander and N. Krislock and T. K. Pong}, Year = {2016}, Month = {August}, Journal = {Computing in Science & Engineering}, Number = {2}, Volume = {18}, Pages = {98-103}, Doi = {10.1109/mcse.2016.30}, Title = {Social resistance} }
- [abs]
[bib]
Social resistance
M. P. Friedlander, N. Krislock, T. K. Pong. IEEE Comput. Sci. Eng., 8(2):98-103; reprinted in Computing Edge,
2016.
[DOI]
[Computing Edge]
- AAAlexandr (Sasha) Aravkin Postdoc (2012)Associate Professor, University of Washington
Publications (6)
- [abs]
[bib]
Foundations of gauge and perspective duality
A. Aravkin, J. V. Burke, D. Drusvyatskiy, M. P. Friedlander, K. MacPhee. SIAM J. Optimization, 28(3):2406–2434,
2018.
[DOI]We revisit the foundations of gauge duality and demonstrate that it can be explained using a modern approach to duality based on a perturbation framework. We therefore put gauge duality and Fenchel-Rockafellar duality on equal footing, including explaining gauge dual variables as sensitivity measures, and showing how to recover primal solutions from those of the gauge dual. This vantage point allows a direct proof that optimal solutions of the Fenchel-Rockafellar dual of the gauge dual are precisely the primal solutions rescaled by the optimal value. We extend the gauge duality framework to the setting in which the functional components are general nonnegative convex functions, including problems with piecewise linear quadratic functions and constraints that arise from generalized linear models used in regression.
@misc{Aravkin2018Foundations, Author = {A. Aravkin and J. V. Burke and D. Drusvyatskiy and M. P. Friedlander and K. MacPhee}, Year = {2018}, Month = {August}, Number = {3}, Volume = {28}, Pages = {2406-2434}, Doi = {10.1137/17m1119020}, Title = {Foundations of gauge and perspective duality} } - [abs]
[bib]
Level-set methods for convex optimization
A. Aravkin, J. V. Burke, D. Drusvyatskiy, M. P. Friedlander, S. Roy. Mathematical Programming, 174(1-2):359–390,
2018.
[Journal]
[DOI]Convex optimization problems arising in applications often have favorable objective functions and complicated constraints, thereby precluding first-order methods from being immediately applicable. We describe an approach that exchanges the roles of the objective and constraint functions, and instead approximately solves a sequence of parametric level-set problems. A zero-finding procedure, based on inexact function evaluations and possibly inexact derivative information, leads to an efficient solution scheme for the original problem. We describe the theoretical and practical properties of this approach for a broad range of problems, including low-rank semidefinite optimization, sparse optimization, and generalized linear models for inference.
@misc{Aravkin2018Level, Author = {A. Aravkin and J. V. Burke and D. Drusvyatskiy and M. P. Friedlander and S. Roy}, Year = {2018}, Month = {April}, Number = {1-2}, Volume = {174}, Pages = {359-390}, Doi = {10.1007/s10107-018-1351-8}, Title = {Level-set methods for convex optimization} } - [abs]
[bib]
Variational properties of value functions
A. Aravkin, J. V. Burke, M. P. Friedlander. SIAM J. Optimization, 23(3):1689–1717,
2013.
[DOI]Regularization plays a key role in a variety of optimization formulations of inverse problems. A recurring theme in regularization approaches is the selection of regularization parameters, and their effect on the solution and on the optimal value of the optimization problem. The sensitivity of the value function to the regularization parameter can be linked directly to the Lagrange multipliers. This paper characterizes the variational properties of the value functions for a broad class of convex formulations, which are not all covered by standard Lagrange multiplier theory. An inverse function theorem is given that links the value functions of different regularization formulations (not necessarily convex). These results have implications for the selection of regularization parameters, and the development of specialized algorithms. Numerical examples illustrate the theoretical results.
@article{Aravkin2013Variational, Author = {A. Aravkin and J. V. Burke and M. P. Friedlander}, Year = {2013}, Month = {August}, Journal = {SIAM Journal on Optimization}, Number = {3}, Volume = {23}, Pages = {1689-1717}, Doi = {10.1137/120899157}, Title = {Variational properties of value functions} } - [abs]
[bib]
Fast dual variational inference for non-conjugate latent gaussian models
M. E. Khan, A. Aravkin, M. P. Friedlander, M. Seeger. Intern. Conf. Machine Learning (ICML),
2013.Latent Gaussian models (LGMs) are widely used in statistics and machine learning. Bayesian inference in non-conjugate LGMs is difficult due to intractable integrals involving the Gaussian prior and non-conjugate likelihoods. Algorithms based on variational Gaussian (VG) approximations are widely employed since they strike a favorable balance between accuracy, generality, speed, and ease of use. However, the structure of the optimization problems associated with these approximations remains poorly understood, and standard solvers take too long to converge. We derive a novel dual variational inference approach that exploits the convexity property of the VG approximations. We obtain an algorithm that solves a convex optimization problem, reduces the number of variational parameters, and converges much faster than previous methods. Using realworld data, we demonstrate these advantages on a variety of LGMs, including Gaussian process classification, and latent Gaussian Markov random fields.
@article{Khan2013Fast, Author = {M. E. Khan and A. Aravkin and M. P. Friedlander and M. Seeger}, Year = {2013}, Month = {May}, Pages = {951-959}, Title = {Fast dual variational inference for non-conjugate latent gaussian models} } - [abs]
[bib]
Robust inversion via semistochastic dimensionality reduction
A. Aravkin, M. P. Friedlander, T. van Leeuwen. Intern. Conf. Acoustics, Speech, and Signal Processing (ICASSP),
2012.
[DOI]We consider a class of inverse problems where it is possible to aggregate the results of multiple experiments. This class includes problems where the forward model is the solution operator to linear ODEs or PDEs. The tremendous size of such problems motivates dimensionality reduction techniques based on randomly mixing experiments. These techniques break down, however, when robust data-fitting formulations are used, which are essential in cases of missing data, unusually large errors, and systematic features in the data unexplained by the forward model. We survey robust methods within a statistical framework, and propose a semistochastic optimization approach that allows dimensionality reduction. The efficacy of the methods are demonstrated for a large-scale seismic inverse problem using the robust Student’s t-distribution, where a useful synthetic velocity model is recovered in the extreme scenario of 60% data missing at random. The semistochastic approach achieves this recovery using 20% of the effort required by a direct robust approach.
@misc{Aravkin2012Robust, Author = {A. Aravkin and M. P. Friedlander and T. van Leeuwen}, Year = {2012}, Month = {January}, Doi = {10.48550/arxiv.1110.0895}, Title = {Robust inversion via semistochastic dimensionality reduction} } - [abs]
[bib]
Robust inversion, dimensionality reduction, and randomized sampling
A. Aravkin, M. P. Friedlander, F. Herrmann, T. van Leeuwen. Mathematical Programming, 134(1):101–125, 2012,
2012.
[DOI]We consider a class of inverse problems in which the forward model is the solution operator to linear ODEs or PDEs. This class admits several dimensionality-reduction techniques based on data averaging or sampling, which are especially useful for large-scale problems. We survey these approaches and their connection to stochastic optimization. The data-averaging approach is only viable, however, for a least-squares misfit, which is sensitive to outliers in the data and artifacts unexplained by the forward model. This motivates us to propose a robust formulation based on the Student’s t-distribution of the error. We demonstrate how the corresponding penalty function, together with the sampling approach, can obtain good results for a large-scale seismic inverse problem with 50% corrupted data.
@article{Aravkin2012Robust, Author = {A. Aravkin and M. P. Friedlander and F. Herrmann and T. van Leeuwen}, Year = {2012}, Month = {January}, Journal = {Mathematical Programming}, Number = {1}, Volume = {134}, Pages = {101-125}, Doi = {10.1007/s10107-012-0571-6}, Title = {Robust inversion, dimensionality reduction, and randomized sampling} }
- [abs]
[bib]
Foundations of gauge and perspective duality
A. Aravkin, J. V. Burke, D. Drusvyatskiy, M. P. Friedlander, K. MacPhee. SIAM J. Optimization, 28(3):2406–2434,
2018.
[DOI]
- HMHassan Mansour Postdoc (2012)Research Staff, Mitsubishi Electric Research Laboratories
Publications (1)
- [abs]
[bib]
Recovering compressively sampled signals using partial support information
M. P. Friedlander, H. Mansour, R. Saab, Ö. Yılmaz. IEEE Trans. Info. Theory, 58(2):1122–1134,
2012.
[Journal Link]
[DOI]In this paper we study recovery conditions of weighted \(\ell_1\) minimization for signal reconstruction from compressed sensing measurements when partial support information is available. We show that if at least 50% of the (partial) support information is accurate, then weighted \(\ell_1\) minimization is stable and robust under weaker conditions than the analogous conditions for standard \(\ell_1\) minimization. Moreover, weighted \(\ell_1\) minimization provides better bounds on the reconstruction error in terms of the measurement noise and the compressibility of the signal to be recovered. We illustrate our results with extensive numerical experiments on synthetic data and real audio and video signals.
@article{Friedlander2012Recovering, Author = {M. P. Friedlander and H. Mansour and R. Saab and Ö. Yılmaz}, Year = {2012}, Month = {February}, Journal = {IEEE Transactions on Information Theory}, Number = {2}, Volume = {58}, Pages = {1122-1134}, Doi = {10.1109/tit.2011.2167214}, Title = {Recovering compressively sampled signals using partial support information} }
- [abs]
[bib]
Recovering compressively sampled signals using partial support information
M. P. Friedlander, H. Mansour, R. Saab, Ö. Yılmaz. IEEE Trans. Info. Theory, 58(2):1122–1134,
2012.
[Journal Link]
[DOI]
- MSMark Schmidt Postdoc (2010)Professor, University of British Columbia
Publications (4)
- [abs]
[bib]
Coordinate descent converges faster with the Gauss-Southwell rule than random selection
J. Nutini, M. Schmidt, I. H. Laradji, M. P. Friedlander, H. Koepke. Intern. Conf. Machine Learning (ICML),
2015.
[Preprint]
[NIPS PDF]
[ICML PDF]
[ICML Supplementary]
[DOI]There has been significant recent work on the theory and application of randomized coordinate descent algorithms, beginning with the work of Nesterov [SIAM J. Optim., 22(2), 2012], who showed that a random-coordinate selection rule achieves the same convergence rate as the Gauss-Southwell selection rule. This result suggests that we should never use the Gauss-Southwell rule, as it is typically much more expensive than random selection. However, the empirical behaviours of these algorithms contradict this theoretical result: in applications where the computational costs of the selection rules are comparable, the Gauss-Southwell selection rule tends to perform substantially better than random coordinate selection. We give a simple analysis of the Gauss-Southwell rule showing that–except in extreme cases–its convergence rate is faster than choosing random coordinates. Further, in this work we (i) show that exact coordinate optimization improves the convergence rate for certain sparse problems, (ii) propose a Gauss-Southwell-Lipschitz rule that gives an even faster convergence rate given knowledge of the Lipschitz constants of the partial derivatives, (iii) analyze the effect of approximate Gauss-Southwell rules, and (iv) analyze proximal-gradient variants of the Gauss-Southwell rule.
@article{Nutini2015Coordinate, Author = {J. Nutini and M. Schmidt and I. H. Laradji and M. P. Friedlander and H. Koepke}, Year = {2015}, Month = {July}, Pages = {1632-1641}, Title = {Coordinate descent converges faster with the Gauss-Southwell rule than random selection} } - [abs]
[bib]
Hybrid deterministic-stochastic methods for data fitting
M. P. Friedlander, M. Schmidt. SIAM J. Scientific Computing, 34(3):A1380–A1405,
2012.
[more]Many structured data-fitting applications require the solution of an optimization problem involving a sum over a potentially large number of measurements. Incremental gradient algorithms offer inexpensive iterations by sampling a subset of the terms in the sum; these methods can make great progress initially, but often slow as they approach a solution. In contrast, full-gradient methods achieve steady convergence at the expense of evaluating the full objective and gradient on each iteration. We explore hybrid methods that exhibit the benefits of both approaches. Rate-of-convergence analysis shows that by controlling the sample size in an incremental-gradient algorithm, it is possible to maintain the steady convergence rates of full-gradient methods. We detail a practical quasi-Newton implementation based on this approach. Numerical experiments illustrate its potential benefits.
@article{FriedlanderSchmidt2012, author = {M. P. Friedlander and M. Schmidt}, title = {Hybrid deterministic-stochastic methods for data fitting}, journal = {SIAM J. Scientific Computing}, volume = {34}, number = {3}, pages = {A1380–A1405}, year = {2012} } - [abs]
[bib]
Optimizing costly functions with simple constraints: a limited-memory projected quasi-Newton algorithm
M. Schmidt, E. van den Berg, M. P. Friedlander, K. Murphy. Intern. Conf. Artificial Intelligence and Statistics (AISTATS),
2009.
[Slides]
[Software]An optimization algorithm for minimizing a smooth function over a convex set is described. Each iteration of the method computes a descent direction by minimizing, over the original constraints, a diagonal plus low-rank quadratic approximation to the function. The quadratic approximation is constructed using a limited-memory quasi-Newton update. The method is suitable for large-scale problems where evaluation of the function is substantially more expensive than projection onto the constraint set. Numerical experiments on one-norm regularized test problems indicate that the proposed method is competitive with state-of-the-art methods such as bound-constrained L-BFGS and orthant-wise descent. We further show that the method generalizes to a wide class of problems, and substantially improves on state-of-the-art methods for problems such as learning the structure of Gaussian graphical models and Markov random fields.
@article{Schmidt2009Optimizing, Author = {M. Schmidt and E. van den Berg and M. P. Friedlander and K. Murphy}, Year = {2009}, Month = {April}, Pages = {456-463}, Title = {Optimizing costly functions with simple constraints: a limited-memory projected quasi-Newton algorithm} } - [abs]
[bib]
Group sparsity via linear-time projection
E. van den Berg, M. Schmidt, M. P. Friedlander, K. Murphy. Tech. Rep. TR-2008-09, Dept of Computer Science, UBC,
2008.We present an efficient spectral projected-gradient algorithm for optimization subject to a group l1-norm constraint. Our approach is based on a novel linear-time algorithm for Euclidean projection onto the l1- and group l1-norm constraints. Numerical experiments on large data sets suggest that the proposed method is substantially more efficient and scalable than existing methods.
@article{Berg2008Group, Author = {E. van den Berg and M. Schmidt and M. P. Friedlander and K. Murphy}, Year = {2008}, Month = {June}, Title = {Group sparsity via linear-time projection} }
- [abs]
[bib]
Coordinate descent converges faster with the Gauss-Southwell rule than random selection
J. Nutini, M. Schmidt, I. H. Laradji, M. P. Friedlander, H. Koepke. Intern. Conf. Machine Learning (ICML),
2015.
[Preprint]
[NIPS PDF]
[ICML PDF]
[ICML Supplementary]
[DOI]
- MCMitch Crowe MSc (2010)
Publications (1)
- [abs]
[bib]
Nonlinearly constrained optimization via sequential regularized linear programming
M. Crowe. M.Sc. Thesis, UBC,
2010.
[UBC cIRcle]This thesis introduces an active-set approach for solving large-scale nonlinearly constrained optimization problems. Rather than using trust-region constraints for globalization, the method employs a 2-norm regularization term in the objective. The proposed regularized linear programming (RLP) solution addresses scaling inefficiencies and infeasible subproblems, with the dual problem formulated as a bound-constrained least-squares problem.
@mastersthesis{Crowe2010Nonlinearly, Author = {M. Crowe}, Year = {2010}, Month = {August}, Title = {Nonlinearly constrained optimization via sequential regularized linear programming} }
- [abs]
[bib]
Nonlinearly constrained optimization via sequential regularized linear programming
M. Crowe. M.Sc. Thesis, UBC,
2010.
[UBC cIRcle]
2007–2009
- EBEwout van den Berg PhD (2009)Scientist, IBM Research
Publications (9)
- [abs]
[bib]
Sparse optimization with least-squares constraints
E. van den Berg, M. P. Friedlander. SIAM J. Optimization, 21(4):1201–1229,
2011.
[DOI]
[more]The use of convex optimization for the recovery of sparse signals from incomplete or compressed data is now common practice. Motivated by the success of basis pursuit in recovering sparse vectors, new formulations have been proposed that take advantage of different types of sparsity. In this paper we propose an efficient algorithm for solving a general class of sparsifying formulations. For several common types of sparsity we provide applications, along with details on how to apply the algorithm, and experimental results.
@article{Berg2011Sparse, Author = {E. van den Berg and M. P. Friedlander}, Year = {2011}, Month = {October}, Journal = {SIAM Journal on Optimization}, Number = {4}, Volume = {21}, Pages = {1201-1229}, Doi = {10.1137/100785028}, Title = {Sparse optimization with least-squares constraints} } - [abs]
[bib]
Theoretical and empirical results for recovery from multiple measurements
E. van den Berg, M. P. Friedlander. IEEE Trans. Info. Theory, 56(5):2516–2527,
2010.
[Code & Data]
[DOI]
[more]The joint-sparse recovery problem aims to recover, from sets of compressed measurements, unknown sparse matrices with nonzero entries restricted to a subset of rows. This is an extension of the single-measurement-vector (SMV) problem widely studied in compressed sensing. We analyze the recovery properties for two types of recovery algorithms. First, we show that recovery using sum-of-norm minimization cannot exceed the uniform recovery rate of sequential SMV using L1 minimization, and that there are problems that can be solved with one approach but not with the other. Second, we analyze the performance of the ReMBo algorithm [M. Mishali and Y. Eldar, IEEE Trans. Sig. Proc., 56 (2008)] in combination with L1 minimization, and show how recovery improves as more measurements are taken. From this analysis it follows that having more measurements than number of nonzero rows does not improve the potential theoretical recovery rate.
@article{Berg2010Theoretical, Author = {E. van den Berg and M. P. Friedlander}, Year = {2010}, Month = {May}, Journal = {IEEE Transactions on Information Theory}, Number = {5}, Volume = {56}, Pages = {2516-2527}, Doi = {10.1109/tit.2010.2043876}, Title = {Theoretical and empirical results for recovery from multiple measurements} } - [abs]
[bib]
Convex optimization for generalized sparse recovery
E. van den Berg. Ph.D. Thesis, UBC,
2009.
[UBC cIRcle]This thesis addresses sparse signal recovery within compressed sensing. It examines joint-sparse recovery by means of sum-of-norms minimization and the ReMBo-ℓ₁ algorithm. The primary contribution involves creating a framework for solving a wide class of convex optimization problems for sparse recovery, implemented through the SPGL1 algorithm. Supporting tools include Sparco (test problem suite) and Spot (a MATLAB toolbox for linear operators).
@phdthesis{vandenBerg2009Convex, Author = {E. van den Berg}, Year = {2009}, Month = {August}, Title = {Convex optimization for generalized sparse recovery} } - [abs]
[bib]
Optimizing costly functions with simple constraints: a limited-memory projected quasi-Newton algorithm
M. Schmidt, E. van den Berg, M. P. Friedlander, K. Murphy. Intern. Conf. Artificial Intelligence and Statistics (AISTATS),
2009.
[Slides]
[Software]An optimization algorithm for minimizing a smooth function over a convex set is described. Each iteration of the method computes a descent direction by minimizing, over the original constraints, a diagonal plus low-rank quadratic approximation to the function. The quadratic approximation is constructed using a limited-memory quasi-Newton update. The method is suitable for large-scale problems where evaluation of the function is substantially more expensive than projection onto the constraint set. Numerical experiments on one-norm regularized test problems indicate that the proposed method is competitive with state-of-the-art methods such as bound-constrained L-BFGS and orthant-wise descent. We further show that the method generalizes to a wide class of problems, and substantially improves on state-of-the-art methods for problems such as learning the structure of Gaussian graphical models and Markov random fields.
@article{Schmidt2009Optimizing, Author = {M. Schmidt and E. van den Berg and M. P. Friedlander and K. Murphy}, Year = {2009}, Month = {April}, Pages = {456-463}, Title = {Optimizing costly functions with simple constraints: a limited-memory projected quasi-Newton algorithm} } - [abs]
[bib]
Sparco: a testing framework for sparse reconstruction
E. van den Berg, M. P. Friedlander, G. Hennenfent, F. Herrmann, R. Saab, Ö. Yılmaz. ACM Trans. Math. Software, 35(4):1–16,
2009.
[Code]
[more]Sparco is a framework for testing and benchmarking algorithms for sparse reconstruction. It includes a large collection of sparse reconstruction problems drawn from the imaging, compressed sensing, and geophysics literature. Sparco is also a framework for implementing new test problems and can be used as a tool for reproducible research. Sparco is implemented entirely in Matlab, and is released as open-source software under the GNU Public License.
@article{Berg2009Sparco, Author = {E. van den Berg and M. P. Friedlander and G. Hennenfent and F. Herrmann and R. Saab and Ö. Yılmaz}, Year = {2009}, Month = {February}, Title = {Sparco: a testing framework for sparse reconstruction} } - [abs]
[bib]
Probing the Pareto frontier for basis pursuit solutions
E. van den Berg, M. P. Friedlander. SIAM J. Scientific Computing, 31(2):890–912,
2008.
[SPGL1 Software]
[DOI]
[more]The basis pursuit problem seeks a minimum one-norm solution of an underdetermined least-squares problem. Basis pursuit denoise (BPDN) fits the least-squares problem only approximately, and a single parameter determines a curve that traces the optimal trade-off between the least-squares fit and the one-norm of the solution. We prove that this curve is convex and continuously differentiable over all points of interest, and show that it gives an explicit relationship to two other optimization problems closely related to BPDN. We describe a root-finding algorithm for finding arbitrary points on this curve; the algorithm is suitable for problems that are large scale and for those that are in the complex domain. At each iteration, a spectral gradient-projection method approximately minimizes a least-squares problem with an explicit one-norm constraint. Only matrix-vector operations are required. The primal-dual solution of this problem gives function and derivative information needed for the root-finding method. Numerical experiments on a comprehensive set of test problems demonstrate that the method scales well to large problems.
@article{Berg2008Probing, Author = {E. van den Berg and M. P. Friedlander}, Year = {2008}, Month = {November}, Journal = {SIAM Journal on Scientific Computing}, Number = {2}, Volume = {31}, Pages = {890-912}, Doi = {10.1137/080714488}, Title = {Probing the Pareto frontier for basis pursuit solutions} } - [abs]
[bib]
New insights into one-norm solvers from the Pareto curve
G. Hennenfent, E. van den Berg, M. P. Friedlander, F. Herrmann. Geophysics, 73(4):A23–A26,
2008.
[DOI]Geophysical inverse problems typically involve a trade off between data misfit and some prior. Pareto curves trace the optimal trade off between these two competing aims. These curves are commonly used in problems with two-norm priors where they are plotted on a log-log scale and are known as L-curves. For other priors, such as the sparsity-promoting one norm, Pareto curves remain relatively unexplored. We show how these curves lead to new insights in one-norm regularization. First, we confirm the theoretical properties of smoothness and convexity of these curves from a stylized and a geophysical example. Second, we exploit these crucial properties to approximate the Pareto curve for a large-scale problem. Third, we show how Pareto curves provide an objective criterion to gauge how different one-norm solvers advance towards the solution.
@article{Hennenfent2008New, Author = {G. Hennenfent and E. van den Berg and M. P. Friedlander and F. Herrmann}, Year = {2008}, Month = {July}, Journal = {Geophysics}, Number = {4}, Volume = {73}, Pages = {A23-A26}, Doi = {10.1190/1.2944169}, Title = {New insights into one-norm solvers from the Pareto curve} } - [abs]
[bib]
Group sparsity via linear-time projection
E. van den Berg, M. Schmidt, M. P. Friedlander, K. Murphy. Tech. Rep. TR-2008-09, Dept of Computer Science, UBC,
2008.We present an efficient spectral projected-gradient algorithm for optimization subject to a group l1-norm constraint. Our approach is based on a novel linear-time algorithm for Euclidean projection onto the l1- and group l1-norm constraints. Numerical experiments on large data sets suggest that the proposed method is substantially more efficient and scalable than existing methods.
@article{Berg2008Group, Author = {E. van den Berg and M. Schmidt and M. P. Friedlander and K. Murphy}, Year = {2008}, Month = {June}, Title = {Group sparsity via linear-time projection} } - [abs]
[bib]
In pursuit of a root
E. van den Berg, M. P. Friedlander. Tech. Rep. TR-2007-19, Dept. of Computer Science, UBC,
2007.
[more]The basis pursuit technique is used to find a minimum one-norm solution of an underdetermined least-squares problem. Basis pursuit denoise fits the least-squares problem only approximately, and a single parameter determines a curve that traces the trade-off between the least-squares fit and the one-norm of the solution. We show that the function that describes this curve is convex and continuously differentiable over all points of interest. The dual solution of a least-squares problem with an explicit one-norm constraint gives function and derivative information needed for a root-finding method. As a result, we can compute arbitrary points on this curve. Numerical experiments demonstrate that our method, which relies on only matrix-vector operations, scales well to large problems.
@article{Berg2007Pursuit, Author = {E. van den Berg and M. P. Friedlander}, Year = {2007}, Month = {June}, Title = {In pursuit of a root} }
- [abs]
[bib]
Sparse optimization with least-squares constraints
E. van den Berg, M. P. Friedlander. SIAM J. Optimization, 21(4):1201–1229,
2011.
[DOI]
[more]
- MGMarisol Garrido MSc (2008)
Publications (1)
- [abs]
[bib]
An all-at-once approach to nonnegative tensor factorizations
M. Garrido. M.Sc. Thesis, UBC,
2008.
[UBC cIRcle]This research addresses computational methods for nonnegative tensor factorization (NTF). Rather than optimizing factors separately, the proposed approach optimizes over all factors simultaneously and combines two regularization strategies to ensure that the factors in the decomposition remain bounded and equilibrated in norm. The work includes numerical experiments and demonstrates practical application in digital image restoration.
@mastersthesis{Garrido2008AllAtOnce, Author = {M. Garrido}, Year = {2008}, Month = {August}, Title = {An all-at-once approach to nonnegative tensor factorizations} }
- [abs]
[bib]
An all-at-once approach to nonnegative tensor factorizations
M. Garrido. M.Sc. Thesis, UBC,
2008.
[UBC cIRcle]
- SSShidong Shan MSc (2008)
Publications (1)
- [abs]
[bib]
A Levenberg-Marquardt method for large-scale bound-constrained nonlinear least-squares
S. Shan. M.Sc. Thesis, UBC,
2008.
[UBC cIRcle]The well known Levenberg-Marquardt method is used extensively for solving nonlinear least-squares problems. This research extends the Levenberg-Marquardt method to problems with bound constraints on the variables, requiring only Jacobian matrix-vector products. The author demonstrates effectiveness through numerical experiments and applies the approach to a practical curve fitting problem in fluorescence optical imaging.
@mastersthesis{Shan2008LevenbergMarquardt, Author = {S. Shan}, Year = {2008}, Month = {August}, Title = {A Levenberg-Marquardt method for large-scale bound-constrained nonlinear least-squares} }
- [abs]
[bib]
A Levenberg-Marquardt method for large-scale bound-constrained nonlinear least-squares
S. Shan. M.Sc. Thesis, UBC,
2008.
[UBC cIRcle]
- JSJelena Sirovljevic MSc (2007)
Publications (1)
- [bib]
Incomplete factorization preconditioners for least squares and linear and quadratic programming
J. Sirovljevic. M.Sc. Thesis, UBC,
2007.
[UBC cIRcle]
@mastersthesis{Sirovljevic2007Incomplete, Author = {J. Sirovljevic}, Year = {2007}, Month = {August}, Title = {Incomplete factorization preconditioners for least squares and linear and quadratic programming} }
- [bib]
Incomplete factorization preconditioners for least squares and linear and quadratic programming
J. Sirovljevic. M.Sc. Thesis, UBC,
2007.
[UBC cIRcle]
Interested in Joining?
I am always looking for motivated students interested in mathematical optimization, machine learning, and scientific computing. Prospective students should have strong mathematical foundations and programming skills.
If you're interested in joining the lab, please send me an email with your CV, transcripts, and a brief description of your research interests.