fitrchains
Syntax
Description
returns a trained multiresponse regression model Mdl
= fitrchains(Tbl
,ResponseVarNames
)Mdl
by using
regression chains. The function trains the model using the predictors in the table
Tbl
and the response values in the
ResponseVarNames
table variables. For more information, see Regression Chains.
specifies options using one or more name-value arguments in addition to any of the input
argument combinations in previous syntaxes. For example, you can specify the type of model
to use in the regression chains by setting the Mdl
= fitrchains(___,Name=Value
)Learner
name-value
argument.
Examples
Train Multiresponse Regression Model with Regression Chains
Create a regression model with more than one response variable by using fitrchains
.
Load the carbig
data set, which contains measurements of cars made in the 1970s and early 1980s. Create a table containing the predictor variables Displacement
, Horsepower
, and so on, as well as the response variables Acceleration
and MPG
. Display the first eight rows of the table.
load carbig cars = table(Displacement,Horsepower,Model_Year, ... Origin,Weight,Acceleration,MPG); head(cars)
Displacement Horsepower Model_Year Origin Weight Acceleration MPG ____________ __________ __________ _______ ______ ____________ ___ 307 130 70 USA 3504 12 18 350 165 70 USA 3693 11.5 15 318 150 70 USA 3436 11 18 304 150 70 USA 3433 12 16 302 140 70 USA 3449 10.5 17 429 198 70 USA 4341 10 15 454 220 70 USA 4354 9 14 440 215 70 USA 4312 8.5 14
Categorize the cars based on whether they were made in the USA.
cars.Origin = categorical(cellstr(cars.Origin)); cars.Origin = mergecats(cars.Origin,["France","Japan",... "Germany","Sweden","Italy","England"],"NotUSA");
Partition the data into training and test sets. Use approximately 85% of the observations to train a multiresponse model, and 15% of the observations to test the performance of the trained model on new data. Use cvpartition
to partition the data.
rng("default") % For reproducibility c = cvpartition(height(cars),"Holdout",0.15); carsTrain = cars(training(c),:); carsTest = cars(test(c),:);
Train a multiresponse regression model by passing the carsTrain
training data to the fitrchains
function. By default, the function uses bagged ensembles of trees in the regression chains.
Mdl = fitrchains(carsTrain,["Acceleration","MPG"])
Mdl = RegressionChainEnsemble PredictorNames: {'Displacement' 'Horsepower' 'Model_Year' 'Origin' 'Weight'} ResponseName: ["Acceleration" "MPG"] CategoricalPredictors: 4 NumChains: 2 LearnedChains: {2x2 cell} NumObservations: 338
Mdl
is a trained RegressionChainEnsemble
model object. You can use dot notation to access the properties of Mdl
. For example, you can specify Mdl.Learners
to see the bagged ensembles used to train the model.
Evaluate the performance of the regression model on the test set by computing the test mean squared error (MSE). Smaller MSE values indicate better performance. Return the loss for each response variable separately by setting the OutputType
name-value argument to "per-response"
.
testMSE = loss(Mdl,carsTest,["Acceleration","MPG"], ... OutputType="per-response")
testMSE = 1×2
2.4921 9.0568
Predict the response values for the observations in the test set. Return the predicted response values as a table.
predictedY = predict(Mdl,carsTest,OutputType="table")
predictedY=60×2 table
Acceleration MPG
____________ ______
12.573 16.109
10.78 13.988
11.282 12.963
15.185 21.066
12.203 13.773
13.216 14.216
17.117 30.199
16.478 29.033
13.439 14.208
11.552 13.066
13.398 13.271
14.848 20.927
16.552 24.603
12.501 15.359
15.778 19.328
12.343 13.185
⋮
Specify Multiresponse Regression Model Properties
Train a multiresponse regression model using regression chains. Specify the type of regression models to use in the regression chains, and train the models with predicted values for response variables used as predictors.
Load the carbig
data set, which contains measurements of cars made in the 1970s and early 1980s. Create a table containing the predictor variables Displacement
, Horsepower
, and so on, as well as the response variables Acceleration
and MPG
. Display the first eight rows of the table.
load carbig cars = table(Displacement,Horsepower,Model_Year, ... Origin,Weight,Acceleration,MPG); head(cars)
Displacement Horsepower Model_Year Origin Weight Acceleration MPG ____________ __________ __________ _______ ______ ____________ ___ 307 130 70 USA 3504 12 18 350 165 70 USA 3693 11.5 15 318 150 70 USA 3436 11 18 304 150 70 USA 3433 12 16 302 140 70 USA 3449 10.5 17 429 198 70 USA 4341 10 15 454 220 70 USA 4354 9 14 440 215 70 USA 4312 8.5 14
Categorize the cars based on whether they were made in the USA.
cars.Origin = categorical(cellstr(cars.Origin)); cars.Origin = mergecats(cars.Origin,["France","Japan",... "Germany","Sweden","Italy","England"],"NotUSA");
Remove observations with missing values.
cars = rmmissing(cars);
Train a multiresponse regression model by passing the cars
data to the fitrchains
function. Use regression chains composed of regression support vector machine (SVM) models with standardized numeric predictors. When training the SVM models, use the predicted values for the response variables that are treated as predictors.
Mdl = fitrchains(cars,["Acceleration","MPG"], ... Learner=templateSVM(Standardize=true), ... ChainPredictedResponse=true);
Mdl
is a trained RegressionChainEnsemble
model object. You can use dot notation to access the properties of Mdl
.
Display the order of the response variables in the regression chains in Mdl
, and display the trained regression SVM models in the regression chains.
Mdl.ChainOrders
ans = 2×2
1 2
2 1
Mdl.Learners
ans=2×2 cell array
{1x1 classreg.learning.regr.CompactRegressionSVM} {1x1 classreg.learning.regr.CompactRegressionSVM}
{1x1 classreg.learning.regr.CompactRegressionSVM} {1x1 classreg.learning.regr.CompactRegressionSVM}
In the first regression chain, the first SVM model uses Acceleration
as the response variable. The second SVM model uses MPG
as the response variable and the predicted values for Acceleration
as a predictor variable. The first SVM model provides the predicted Acceleration
values used by the second SVM model.
Recall that the SVM models use standardized numeric predictors. Find the means (Mu
) and standard deviations (Sigma
) used by the second model in the first regression chain.
Chain1Model2 = Mdl.Learners{1,2}; Mdl.PredictorNames
ans = 1x5 cell
{'Displacement'} {'Horsepower'} {'Model_Year'} {'Origin'} {'Weight'}
Chain1Model2.ExpandedPredictorNames
ans = 1x7 cell
{'x1'} {'x2'} {'x3'} {'x4 == 1'} {'x4 == 2'} {'x5'} {'x6'}
Chain1Model2.Mu
ans = 1×7
103 ×
0.1944 0.1045 0.0760 0 0 2.9776 0.0153
Chain1Model2.Sigma
ans = 1×7
104.6440 38.4912 3.6837 1.0000 1.0000 849.4026 2.2190
The SVM model uses five numeric predictors: Displacement
(x1
), Horsepower
(x2
), Model_Year
(x3
), Weight
(x5
), and the predicted values for Acceleration
(x6
). The software uses the corresponding Mu
and Sigma
values to standardize the predictor data before predicting with the predict
object function.
The categorical predictor Origin
is split into two variables (x4 == 1
and x4 == 2
) after categorical expansion. The corresponding Mu
and Sigma
values indicate that the two variables are unchanged after standardization.
Input Arguments
Tbl
— Sample data
table
Sample data used to train the model, specified as a table. Each row of
Tbl
corresponds to one observation, and each column corresponds
to one variable. Multicolumn variables and cell arrays other than cell arrays of
character vectors are not allowed.
Tbl
must contain columns for the response variables and can
contain a column for the observation weights. Each response and observation weight
variable must be a numeric vector.
You must specify the response variables in Tbl
by using
ResponseVarNames
or formula
, and specify the
observation weights in Tbl
by using Weights
.
When you specify the response variables by using
ResponseVarNames
,fitrchains
uses the remaining variables as predictors. To use a subset of the remaining variables inTbl
as predictors, specify predictor variables by usingPredictorNames
.When you define a model specification by using
formula
,fitrchains
uses a subset of the variables inTbl
as predictor variables and response variables, as specified informula
.
Data Types: table
ResponseVarNames
— Names of response variables
names of variables in Tbl
Names of the response variables, specified as the names of variables in
Tbl
. Each response variable must be a numeric vector.
You must specify ResponseVarNames
as a string array or a cell
array of character vectors. For example, if Tbl
stores the response
variables Y1
and Y2
as Tbl.Y1
and Tbl.Y2
, respectively, then specify
ResponseVarNames
as ["Y1","Y2"]
. Otherwise,
the software treats the Y1
and Y2
columns of
Tbl
as predictors when training the model.
Data Types: string
| cell
formula
— Explanatory model of response variables and subset of predictor variables
character vector | string scalar
Explanatory model of the response variables and a subset of the predictor variables,
specified as character vector or string scalar in the form
"Y1,Y2~x1+x2+x3"
. In this form, Y1
and
Y2
represent the response variables, and x1
,
x2
, and x3
represent the predictor
variables.
To specify a subset of variables in Tbl
as predictors for
training the model, use a formula. If you specify a formula, then the software does not
use any variables in Tbl
that do not appear in
formula
, except for observation weights (if specified).
The variable names in the formula must be both variable names in Tbl
(Tbl.Properties.VariableNames
) and valid MATLAB® identifiers. You can verify the variable names in Tbl
by
using the isvarname
function. If the variable names
are not valid, then you can convert them by using the matlab.lang.makeValidName
function.
Data Types: char
| string
Y
— Response data
numeric matrix | numeric table
Response data, specified as a numeric matrix or table. Each row corresponds to an
observation, and each column corresponds to a response variable. Y
must have the same number of rows as the predictor data X
.
Data Types: single
| double
| table
X
— Predictor data
numeric matrix | numeric table
Predictor data, specified as a numeric matrix or table. Each row corresponds to an
observation, and each column corresponds to a predictor. Optionally, when
X
is a table, it can contain a column for the observation
weights. X
and Y
must have the same number of
rows.
If
X
is a matrix, you can specify the names of the predictors in the order of their appearance inX
by using thePredictorNames
name-value argument.If
X
is a table, you can use a subset of the variables inX
as predictors. To do so, specify predictor variables by usingPredictorNames
.
Data Types: single
| double
Note
The software treats NaN
, empty character vector
(''
), empty string (""
),
<missing>
, and <undefined>
elements as missing
data. Before training Mdl
, the software removes observations with
missing values in the response data, although the model retains the observations in its data
properties (for example, Mdl.X
and Mdl.Y
). The
treatment of observations with missing values in the predictor data depends on the
regression model type specified by the Learner
name-value argument.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: fitrchains(Tbl,["Y1","Y2"],Learner="svm",ChainPredictedResponse=true)
creates a support vector machine (SVM) regression model with two response variables and uses
predicted responses in the regression chains to train the model.
ChainOrder
— Order of response variables in regression chain
[]
(default) | positive integer vector
Order of the response variables in the regression chain, specified as a positive integer vector. For more information, see Regression Chains.
If you specify ChainOrder
, Mdl
contains
only one regression chain.
Example: ChainOrder=[1 3 2]
Data Types: single
| double
ChainPredictedResponse
— Flag to use predicted responses in regression chains
false
or 0
(default) | true
or 1
Flag to use predicted responses in the regression chains, specified as a numeric
or logical 0
(false
) or 1
(true
).
A value of
0
indicates to train models with observed values for response variables used as predictors.A value of
1
indicates to train models with predicted values for response variables used as predictors.
For more information, see Regression Chains.
Example: ChainPredictedResponse=true
Data Types: single
| double
| logical
Learner
— Type of regression model to train
"bag"
(default) | "gam"
| "gp"
| "kernel"
| "linear"
| "lsboost"
| "svm"
| "tree"
| template object
Type of regression model to train, specified as one of the values in this table.
Value | Regression Model Type |
---|---|
"bag" or templateEnsemble template (with
the method specified as "Bag" and the weak learners
specified as "Tree" ) | Bagged ensemble of trees |
"gam" or templateGAM template | General additive model (GAM) |
"gp" or templateGP template | Gaussian process regression (GPR) |
"kernel" or templateKernel template | Kernel model |
"linear" or templateLinear template | Linear model |
"lsboost" or templateEnsemble template (with
the method specified as "LSBoost" and the weak learners
specified as "Tree" ) | Boosted ensemble of trees |
"svm" or templateSVM template | Support vector machine (SVM) |
"tree" or templateTree template | Decision tree |
Example: Learner="svm"
Example: Learner=templateEnsemble("LSBoost",50,"Tree")
MaxNumChains
— Maximum number of regression chains
10
(default) | positive scalar
Maximum number of regression chains, specified as a positive scalar. Because each
regression chain contains one regression model for each response variable, specify
MaxNumChains
to limit the total number of regression models to
train.
Example: MaxNumChains=5
Data Types: single
| double
CategoricalPredictors
— Categorical predictors list
vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | "all"
Categorical predictors list, specified as one of the values in this table.
Value | Description |
---|---|
Vector of positive integers |
Each entry in the vector is an index value indicating that the corresponding predictor is
categorical. The index values are between 1 and If |
Logical vector |
A |
Character matrix | Each row of the matrix is the name of a predictor variable. The names must match the entries in PredictorNames . Pad the names with extra blanks so each row of the character matrix has the same length. |
String array or cell array of character vectors | Each element in the array is the name of a predictor variable. The names must match the entries in PredictorNames . |
"all" | All predictors are categorical. |
By default, if the predictor data is in a table, fitrchains
assumes that a variable is categorical if it is a logical vector, categorical vector,
character array, string array, or cell array of character vectors. However, learners
that use decision trees assume that mathematically ordered categorical vectors are
continuous variables. If the predictor data is a matrix,
fitrchains
assumes that all predictors are continuous. To
identify any other predictors as categorical predictors, specify them by using the
CategoricalPredictors
name-value argument.
The software creates dummy variables based on the Learner
name-value argument and the underlying fitting function used to create the regression
models in the Learners
property of Mdl
. For more information on
how fitting functions treat categorical predictors, see Automatic Creation of Dummy Variables.
Example: CategoricalPredictors="all"
Data Types: single
| double
| logical
| char
| string
| cell
Options
— Options for computing in parallel and setting random streams
structure
Options for computing in parallel and setting random streams, specified as a
structure. Create the Options
structure using statset
. This table lists the option fields and their
values.
Field Name | Value | Default |
---|---|---|
UseParallel | Set this value to true to run computations in
parallel. | false |
UseSubstreams | Set this value to To compute
reproducibly, set | false |
Streams | Specify this value as a RandStream object or
cell array of such objects. Use a single object except when the
UseParallel value is true
and the UseSubstreams value is
false . In that case, use a cell array that
has the same size as the parallel pool. | If you do not specify Streams , then
fitrchains uses the default stream or
streams. |
Note
You need Parallel Computing Toolbox™ to run computations in parallel.
Example: Options=statset(UseParallel=true,UseSubstreams=true,Streams=RandStream("mlfg6331_64"))
Data Types: struct
PredictorNames
— Predictor variable names
string array | cell array of character vectors
Predictor variable names, specified as a string array or a cell array of character vectors.
If you supply predictor data using a numeric matrix, then you can use
PredictorNames
to assign names to the predictor variables.The order of the names in
PredictorNames
must correspond to the order of the columns in the matrix.By default,
PredictorNames
is{'x1','x2',...}
.
If you supply predictor data using a table, then you can use
PredictorNames
to specify which variables to use as predictors during training.PredictorNames
must be a subset of the variable names in the table and cannot include the names of response variables.By default,
PredictorNames
contains the names of all predictor variables.
Example: PredictorNames=["SepalLength","SepalWidth","PetalLength","PetalWidth"]
Data Types: string
| cell
ResponseName
— Response variable names
string array | cell array of character vectors
Response variable names, specified as a string array or a cell array of character vectors.
If you supply
Y
, then you can useResponseName
to specify names for the response variables.If you supply
ResponseVarNames
orformula
, then you cannot useResponseName
.
Example: ResponseName=["Response1","Response2"]
Data Types: string
| cell
Weights
— Observation weights
nonnegative numeric vector | name of variable in X
or Tbl
Observation weights, specified as a nonnegative numeric vector or the name of a
variable in X
or Tbl
. The software weights
each observation in X
or Tbl
with the
corresponding value in Weights
. The length of
Weights
must equal the number of observations in
X
or Tbl
.
If you specify the input data as a table, then Weights
can be
the name of a variable in the table that contains a numeric vector. In this case, you
must specify Weights
as a character vector or string scalar. For
example, if the weights vector W
is stored as
Tbl.W
, then specify it as "W"
. Otherwise, the
software treats the W
column of Tbl
as a
predictor during the training process.
By default, Weights
is ones(n,1)
, where
n
is the number of observations in X
or
Tbl
.
Before training, fitrchains
normalizes the weights to sum to
1.
Data Types: single
| double
| char
| string
Output Arguments
Mdl
— Multiresponse regression model
RegressionChainEnsemble
model object
Multiresponse regression model, returned as a RegressionChainEnsemble
model object. To access the properties of
Mdl
, use dot notation.
Algorithms
Regression Chains
A regression chain is a sequence of regression models in which the response variables for previous models become predictor variables for subsequent models. If the training data consists of p predictor variables and k response variables, then a regression chain includes exactly k models, each with a different response variable. The first model has p predictors, the second model has p+1 predictors, and so on, with the last model having p+k–1 predictors.
For example, suppose that the predictor data in X
or
Tbl
consists of three variables, x1,
x2, and x3, and the response data in
Y
or Tbl
consists of two variables,
y1 and y2. A regression chain with the chain order
[2 1]
(ChainOrder
) consists of a model trained on
the predictor data [x1, x2,
x3] and the response variable y2, followed by a model
trained on the predictor data [x1, x2, x3,
y2] and the response variable y1.
If you specify to use predicted responses in regression chains
(ChainPredictedResponse
), the predictor data for the second model is [x1, x2, x3,
yfit2], where yfit2 contains the predicted responses returned
by the first model.
In general, fitrchains
returns an ensemble of regression chains
Mdl
, where each row of Mdl.Learners
corresponds to
one regression chain.
References
[1] Spyromitros-Xioufis, Eleftherios, Grigorios Tsoumakas, William Groves, and Ioannis Vlahavas. "Multi-Target Regression via Input Space Expansion: Treating Targets as Inputs." Machine Learning 104, no. 1 (July 2016): 55–98. https://doi.org/10.1007/s10994-016-5546-z.
Extended Capabilities
Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.
To run in parallel, specify the Options
name-value argument in the call to
this function and set the UseParallel
field of the
options structure to true
using
statset
:
Options=statset(UseParallel=true)
For more information about parallel computing, see Run MATLAB Functions with Automatic Parallel Support (Parallel Computing Toolbox).
Version History
Introduced in R2024b
See Also
RegressionChainEnsemble
| CompactRegressionChainEnsemble
| loss
| predict
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other bat365 country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)