Machine learning technique for building predictive models from known input and response data
Supervised learning is the most common type of machine learning algorithms. It uses a known dataset (called the training dataset) to train an algorithm with a known set of input data (called features) and known responses to make predictions. The training dataset includes labeled input data that pair with desired outputs or response values. From it, the supervised learning algorithm seeks to create a model by discovering relationships between the features and output data and then makes predictions of the response values for a new dataset.
Prior to applying supervised learning, unsupervised learning is frequently used to discover patterns in the input data that suggest candidate features, and feature engineering transforms them to be more suitable for supervised learning. In addition to identifying features, the correct category or response needs to be identified for all observations in the training set, which is a very labor-intensive step. Semi-supervised learning lets you train models with very limited labeled data and thus reduce the labelling effort.
Once the algorithm is trained, a test dataset, which hasn’t been used for training, is typically used to predict the performance of the algorithm and validate it. To obtain accurate performance results, it is critical that both the training and test set are a good representation of “reality”( i.e., data from the production environment and the model were both validated correctly).
You can train, validate, and tune predictive supervised learning models in MATLAB® with Deep Learning Toolbox™, and Statistics and Machine Learning Toolbox™.
Supervised Learning Algorithms Categories
Classification: Used for categorical response values, where the data can be separated into specific classes. A binary classification model has two classes and a multiclass classification model has more. You can train classification models with the Classification Learner app with MATLAB.
Common classification algorithms for this category include:
- Logistic regression
- Support vector machines (SVM)
- Neural networks
- Naïve Bayes classifier
- Decision trees
- Discriminant analysis
- Nearest neighbors (kNN)
- Ensemble Classification
- Generalized Additive Model (GAM)
Regression: used for numerical continuous-response values. Regression models can be easily trained with the Regression Learner app with MATLAB, learn how in this video (3:42) and in this article.
Common regression algorithms include:
- Linear regression
- Nonlinear regression
- Generalized linear models
- Decision trees
- Neural networks
- Gaussian Process Regression
- Support Vector Machine Regression
- Ensemble Regression
Supervised Learning Applications
Supervised learning is used in financial applications for credit scoring, algorithmic trading, and bond classification; in biological applications for tumor detection and drug discovery; in energy applications for price and load forecasting (3:42); in pattern recognition applications for speech and images; and in predictive maintenance for life of equipment estimates (57:25).
Examples and How To
Videos
Examples
Articles and Tutorials
Software Reference
See also: Statistics and Machine Learning Toolbox, Deep Learning Toolbox, machine learning, unsupervised learning, AdaBoost, linear regression, nonlinear regression, data fitting, data analysis, mathematical modeling, predictive modeling, artificial intelligence, AutoML, regularization, biomedical signal processing