Relating a set of variables X to a response y is crucial in chemometrics. A quantitative prediction objective can be enriched by qualitative data interpretation, for instance by locating the most influential features. When high-dimensional problems arise, dimension reduction techniques can be used. Most notable are projections (e.g. Partial Least Squares or PLS ) or variable selections (e.g. lasso). Sparse partial least squares combine both strategies, by blending variable selection into PLS. The variant presented in this paper, Dual-sPLS, generalizes the classical PLS1 algorithm. It provides balance between accurate prediction and efficient interpretation. It is based on penalizations inspired by classical regression methods (lasso, group lasso, least squares, ridge) and uses the dual norm notion. The resulting sparsity is enforced by an intuitive shrinking ratio parameter. Dual-sPLS favorably compares to similar regression methods, on simulated and real chemical data.
Authors
Louna Alsouki, Laurent Duval, Clément Marteau, Rami El Haddad, François Wahl
Related Organizations
- Bibliographic Reference
- Louna Alsouki, Laurent Duval, Clément Marteau, Rami El Haddad, François Wahl. Dual-sPLS : a Family of Dual Sparse Partial Least Squares Regressions for Feature Selection and Prediction with Tunable Sparsity; Evaluation on Simulated and Near-Infrared (NIR) Data. Chemometrics and Intelligent Laboratory Systems, 2023, 237, pp.104813. ⟨10.1016/j.chemolab.2023.104813⟩. ⟨hal-04127738⟩
- DOI
- https://doi.org/10.1016/j.chemolab.2023.104813
- HAL Collection
- ['Université Jean Monnet - Saint-Etienne', "Sciences De l'Environnement", 'IFP Energies Nouvelles', 'CNRS - Centre national de la recherche scientifique', 'Institut Camille Jordan', 'Université Claude Bernard - Lyon I', 'Institut National des Sciences Appliquées de Lyon', 'Ecole Centrale de Lyon', 'CNRS-INSMI - INstitut des Sciences Mathématiques et de leurs Interactions', 'GIP Bretagne Environnement', "Laboratoire d'excellence en Mathématiques et informatique fondamentale de Lyon", 'Groupe INSA', 'UDL', 'Université de Lyon', 'Université Saint Joseph de Beyrouth', 'ANR']
- HAL Identifier
- 4127738
- Institution
- ['École Centrale de Lyon', 'Université Claude Bernard Lyon 1', 'Institut National des Sciences Appliquées de Lyon', 'Université Jean Monnet - Saint-Étienne', 'Université Saint-Joseph de Beyrouth', 'IFP Energies nouvelles']
- Laboratory
- Institut Camille Jordan
- Published in
- France
Table of Contents
- Introduction 3
- Background 6
- Partial Least Squares (PLS) 6
- Least absolute shrinkage and selection operator 7
- Blending methods: sparse Partial Least Squares (sPLS) 9
- Dual Sparse Partial Least Squares (Dual-sPLS) 10
- Motivation and purposes 10
- Norm options (lasso, group lasso, least squares and ridge) 12
- Pseudo-lasso 12
- Pseudo-group lasso 14
- Pseudo-least squares and pseudo-ridge 15
- Simulated and real data, model settings, evaluation 16
- Simulated sparse data: Gaussian mixtures DSIM and DSIM 16
- Real data: near-infrared (NIR) spectroscopy DNIR 17
- Model settings: number of latent component selection 18
- Calibration and validation 19
- Comparative evaluation and discussion 19
- Dual-sPLS pseudo-lasso evaluation (DSIM, DNIR) 20
- Dual-sPLS pseudo-least squares evaluation (DSIM) 22
- Dual-sPLS pseudo-ridge evaluation (DSIM, DNIR) 22
- Conclusion and perspectives 26
- Declaration of competing interest 29
- Acknowledgements 29
- Detailed resolution of Dual-sPLSs 34
- Dual-sPLS pseudo-group lasso 34
- Dual-sPLS pseudo-least squares 35
- Dual-sPLS pseudo-ridge 37
- Complementary plots 38