Feature Selection

Reduce features to improve model performance

Feature selection is a dimensionality reduction technique that selects a subset of features (predictor variables) that provide the best predictive power in modeling a set of data.

Feature selection can be used to:

  • Prevent overfitting: avoid modeling with an excessive number of features that are more susceptible to rote-learning specific training examples
  • Reduce model size:  increase computational performance with high-dimensional data or prepare model for embedded deployment where memory may be limited.
  • Improve interpretability: use fewer features, which may help identify those that affect model behavior

There are several common approaches to feature selection.

Iteratively change features set to optimize performance or loss

Stepwise regression sequentially adds or removes features until there is no improvement in prediction. It is used with linear regression or generalized linear regression algorithms. Similarly, sequential feature selection builds up a feature set until accuracy (or a custom performance measure) stops improving.

Rank features based on intrinsic characteristic

These methods estimate a ranking of the features, which in turn can be used to select the top few ranked features. Minimum redundance maximum relevance (MRMR) finds features that maximize mutual information between features and response variable and minimize mutual information between features themselves. Related methods rank features according to Laplacian scores or use a statistical test of whether a single feature is independent of response to determine feature importance.

Neighborhood Component Analysis (NCA) and ReliefF

These methods determine feature weights by maximizing the accuracy of prediction based on pairwise distance and penalizing predictors that lead to misclassification results.

Learn feature importance along with the model

Some supervised machine learning algorithms estimate feature importance during the training process. Those estimates can be used to rank features after the training is completed.  Models with built-in feature selection include linear SVMs, boosted decision trees and their ensembles (random forests), and generalized linear models. Similarly, in lasso regularization a shrinkage estimator reduces the weights (coefficients) of redundant features to zero during training.

MATLAB® supports the following feature selection methods:

Algorithm Training Types of Models Accuracy Caveats
NCA Moderate Better for distance-based models High Needs manual tuning of regularization lambda
MRMR Fast Any High Only for classification
ReliefF Moderate Better for distance-based models Medium Unable to differentiate correlated predictors
Sequential Slow Any High Doesn’t rank all features
F test Fast Any Medium For regression. Unable to differentiate correlated predictors.
Chi-square Fast Any Medium For classification. Unable to differentiate correlated predictors.

As an alternative to feature selection, feature transformation techniques transform existing features into new features (predictor variables) with the less descriptive features dropped. Feature transformation approaches include:

For more information on feature selection with MATLAB, including machine learning, regression, and transformation, see Statistics and Machine Learning Toolbox™ .

Key Points

  • Automated feature selection is a part of the complete AutoML workflow that delivers optimized models in a few simple steps.
  • Feature selection is an advanced technique to boost model performance (especially on high-dimensional data), improve interpretability, and reduce size.
  • Consider one of the models with “built-in” feature selection first. Otherwise MRMR works really well for classification.

Example

Feature selection can help select a reasonable subset from hundreds of features automatically generated by applying wavelet scattering. The figure below shows the ranking of the top 50 features obtained by applying the MATLAB function fscmrmr to automatically generated wavelet features from human activity sensor data.

predictor rank

See also: Statistics and Machine Learning Toolbox, machine learning, feature engineering, regularization, feature extraction, biomedical signal processing, AutoML