Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality
Journal
Journal of Machine Learning Research
Subject
Management Science and Operations
Publishing details
Authors / Editors
Bertsimas D;Cory-Wright R;Pauphilet J
Biographies
Publication Year
2022
Abstract
Sparse principal component analysis (PCA) is a popular dimensionality reduction technique for obtaining principal components which are linear combinations of a small subset of the original features. Existing approaches cannot supply certifiably optimal principal components with more than p = 100s of variables. By reformulating sparse PCA as a convex mixed-integer semidefinite optimization problem, we design a cutting-plane method which solves the problem to certifiable optimality at the scale of selecting k = 5 covariates from p = 300 variables, and provides small bound gaps at a larger scale. We also propose a convex relaxation and greedy rounding scheme that provides bound gaps of 1- 2% in practice within minutes for p = 100s or hours for p = 1; 000s and is therefore a viable alternative to the exact method at scale. Using real-world financial and medical data sets, we illustrate our approach's ability to derive interpretable principal components tractably at scale.
Keywords
Sparse PCA, Sparse Eigenvalues, Semidefinite Optimization
Available on ECCH
No