Sunday, July 24, 2011

PLS Discriminant Analysis - A comparative study

PLS regression is a regression technique usually designed to predict the values taken by a group of Y variables (target variables, dependent variables) from a set of variables X (descriptors, independent variables). Initially defined for the prediction of continuous target variable, the PLS regression can be adapted to the prediction of one discrete variable - i.e. adapted to the supervised learning framework - in different ways . The approah is called "PLS Discriminant Analysis" in this context. It incorporates the valuable qualities that we know usually into this new framework: the ability to process a representation space with very high dimensionality, a large number of noisy and / or redundant descriptors.

This tutorial is the continuation of a precedent paper dedicated to the presentation of some variants of the PLS-DA. We describe the behavior of one of them (PLS-LDA - PLS Linear Discriminant Analysis) on a learning set where the number of descriptors is moderately high (278 descriptors) in relation to the number of instances (232 instances). Even if the number of descriptors is not really very high, we note in our experiment a valuable characteristic of the PLS approach: we can control the variance of the classifier by adjusting the number of latent variables.

To assess this idea, we compare the behavior of the PLS-LDA with state-of-the-art supervised learning methods such as K-nearest neighbors , SVM (Support Vector Machine from the LIBSVM library ), the Breiman's Random Forest approach , or the Fisher's Linear Discriminant Analysis .

Keywords: pls regression, linear discriminant analysis, supervised learning, support vector machine, SVM, random forest, nearest neighbor
Components: K-NN, PLS-LDA, BAGGING, RND TREE, C-SVC, TEST, DISCRETE SELECT EXAMPLES, REMOVE CONSTANT
Tutorial: en_Tanagra_PLS_DA_Comparaison.pdf
Dataset: arrhytmia.bdm
References :
S. Chevallier, D. Bertrand, A. Kohler, P. Courcoux, « Application of PLS-DA in multivariate image analysis », in J. Chemometrics, 20 : 221-229, 2006.
Garson, « Partial Least Squares Regression (PLS) », http://www2.chass.ncsu.edu/garson/PA765/pls.htm