Home
Faculty and Research
Academic research
Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality

Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality

Journal

Journal of Machine Learning Research

Subject

Management Science and Operations

Publishing details

Journal of Machine Learning Research 2022 Vol 23:13 p 1-35 In Press

Authors / Editors

Bertsimas D;Cory-Wright R;Pauphilet J

Biographies

Jean Pauphilet

Publication Year

2022

Abstract

Sparse principal component analysis (PCA) is a popular dimensionality reduction technique for obtaining principal components which are linear combinations of a small subset of the original features. Existing approaches cannot supply certifiably optimal principal components with more than p = 100s of variables. By reformulating sparse PCA as a convex mixed-integer semidefinite optimization problem, we design a cutting-plane method which solves the problem to certifiable optimality at the scale of selecting k = 5 covariates from p = 300 variables, and provides small bound gaps at a larger scale. We also propose a convex relaxation and greedy rounding scheme that provides bound gaps of 1- 2% in practice within minutes for p = 100s or hours for p = 1; 000s and is therefore a viable alternative to the exact method at scale. Using real-world financial and medical data sets, we illustrate our approach's ability to derive interpretable principal components tractably at scale.

Keywords

Sparse PCA, Sparse Eigenvalues, Semidefinite Optimization

Available on ECCH

Select up to 4 programmes to compare

Select one more to compare

Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality

Sign up to receive our latest news and business thinking direct to your inbox

Sign up to receive our latest course information and business thinking