bat365 Homepage

RegressionTree class

Superclasses: CompactRegressionTree

Regression tree

Description

A decision tree with binary splits for regression. An object of class RegressionTree can predict responses for new data with the predict method. The object contains the data used for training, so can compute resubstitution predictions.

Construction

Create a RegressionTree object by using fitrtree.

Properties

`BinEdges`	Bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors. The software bins numeric predictors only if you specify the `'NumBins'` name-value argument as a positive integer scalar when training a model with tree learners. The `BinEdges` property is empty if the `'NumBins'` value is empty (default). You can reproduce the binned predictor data `Xbinned` by using the `BinEdges` property of the trained model `mdl`. X = mdl.X; % Predictor data Xbinned = zeros(size(X)); edges = mdl.BinEdges; % Find indices of binned predictors. idxNumeric = find(~cellfun(@isempty,edges)); if iscolumn(idxNumeric) idxNumeric = idxNumeric'; end for j = idxNumeric x = X(:,j); % Convert x to array if x is a table. if istable(x) x = table2array(x); end % Group x into bins by using the `discretize` function. xbinned = discretize(x,[-inf; edges{j}; inf]); Xbinned(:,j) = xbinned; end `Xbinned` contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. `Xbinned` values are 0 for categorical predictors. If `X` contains `NaN`s, then the corresponding `Xbinned` values are `NaN`s.
`CategoricalPredictors`	Categorical predictor indices, specified as a vector of positive integers. `CategoricalPredictors` contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and `p`, where `p` is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty (`[]`).
`CategoricalSplit`	An n-by-2 cell array, where `n` is the number of categorical splits in `tree`. Each row in `CategoricalSplit` gives left and right values for a categorical split. For each branch node with categorical split `j` based on a categorical predictor variable `z`, the left child is chosen if `z` is in `CategoricalSplit(j,1)` and the right child is chosen if `z` is in `CategoricalSplit(j,2)`. The splits are in the same order as nodes of the tree. Nodes for these splits can be found by running `cuttype` and selecting `'categorical'` cuts from top to bottom.
`Children`	An n-by-2 array containing the numbers of the child nodes for each node in `tree`, where n is the number of nodes. Leaf nodes have child node `0`.
`CutCategories`	An n-by-2 cell array of the categories used at branches in `tree`, where n is the number of nodes. For each branch node `i` based on a categorical predictor variable `x`, the left child is chosen if `x` is among the categories listed in `CutCategories{i,1}`, and the right child is chosen if `x` is among those listed in `CutCategories{i,2}`. Both columns of `CutCategories` are empty for branch nodes based on continuous predictors and for leaf nodes. `CutPoint` contains the cut points for `'continuous'` cuts, and `CutCategories` contains the set of categories.
`CutPoint`	An n-element vector of the values used as cut points in `tree`, where n is the number of nodes. For each branch node `i` based on a continuous predictor variable `x`, the left child is chosen if `x<CutPoint(i)` and the right child is chosen if `x>=CutPoint(i)`. `CutPoint` is `NaN` for branch nodes based on categorical predictors and for leaf nodes.
`CutType`	An n-element cell array indicating the type of cut at each node in `tree`, where n is the number of nodes. For each node `i`, `CutType{i}` is: `'continuous'` — If the cut is defined in the form `x < v` for a variable `x` and cut point `v`. `'categorical'` — If the cut is defined by whether a variable `x` takes a value in a set of categories. `''` — If `i` is a leaf node. `CutPoint` contains the cut points for `'continuous'` cuts, and `CutCategories` contains the set of categories.
`CutPredictor`	An n-element cell array of the names of the variables used for branching in each node in `tree`, where n is the number of nodes. These variables are sometimes known as cut variables. For leaf nodes, `CutPredictor` contains an empty character vector. `CutPoint` contains the cut points for `'continuous'` cuts, and `CutCategories` contains the set of categories.
`CutPredictorIndex`	An n-element array of numeric indices for the variables used for branching in each node in `tree`, where n is the number of nodes. For more information, see `CutPredictor`.
`ExpandedPredictorNames`	Expanded predictor names, stored as a cell array of character vectors. If the model uses encoding for categorical variables, then `ExpandedPredictorNames` includes the names that describe the expanded variables. Otherwise, `ExpandedPredictorNames` is the same as `PredictorNames`.
`HyperparameterOptimizationResults`	Description of the cross-validation optimization of hyperparameters, stored as a `BayesianOptimization` object or a table of hyperparameters and associated values. Nonempty when the `OptimizeHyperparameters` name-value pair is nonempty at creation. Value depends on the setting of the `HyperparameterOptimizationOptions` name-value pair at creation: `'bayesopt'` (default) — Object of class `BayesianOptimization` `'gridsearch'` or `'randomsearch'` — Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)
`IsBranchNode`	An n-element logical vector `ib` that is `true` for each branch node and `false` for each leaf node of `tree`.
`ModelParameters`	Object holding parameters of `tree`.
`NumObservations`	Number of observations in the training data, a numeric scalar. `NumObservations` can be less than the number of rows of input data `X` when there are missing values in `X` or response `Y`.
`NodeError`	An n-element vector `e` of the errors of the nodes in `tree`, where n is the number of nodes. `e(i)` is the mean squared error for node `i`.
`NodeMean`	An n-element numeric array with mean values in each node of `tree`, where n is the number of nodes in the tree. Every element in `NodeMean` is the average of the true `Y` values over all observations in the node.
`NodeProbability`	An n-element vector `p` of the probabilities of the nodes in `tree`, where n is the number of nodes. The probability of a node is computed as the proportion of observations from the original data that satisfy the conditions for the node.
`NodeRisk`	An n-element vector of the risk of the nodes in the tree, where n is the number of nodes. The risk for each node is the node error weighted by the node probability.
`NodeSize`	An n-element vector `sizes` of the sizes of the nodes in `tree`, where n is the number of nodes. The size of a node is defined as the number of observations from the data used to create the tree that satisfy the conditions for the node.
`NumNodes`	The number of nodes `n` in `tree`.
`Parent`	An n-element vector `p` containing the number of the parent node for each node in `tree`, where n is the number of nodes. The parent of the root node is `0`.
`PredictorNames`	A cell array of names for the predictor variables, in the order in which they appear in `X`.
`PruneAlpha`	Numeric vector with one element per pruning level. If the pruning level ranges from 0 to M, then `PruneAlpha` has M + 1 elements sorted in ascending order. `PruneAlpha(1)` is for pruning level 0 (no pruning), `PruneAlpha(2)` is for pruning level 1, and so on.
`PruneList`	An n-element numeric vector with the pruning levels in each node of `tree`, where n is the number of nodes. The pruning levels range from 0 (no pruning) to M, where M is the distance between the deepest leaf and the root node.
`ResponseName`	A character vector that specifies the name of the response variable (`Y`).
`ResponseTransform`	Function handle for transforming the raw response values (mean squared error). The function handle must accept a matrix of response values and return a matrix of the same size. The default `'none'` means `@(x)x`, or no transformation. Add or change a `ResponseTransform` function using dot notation: tree.ResponseTransform = @function
`RowsUsed`	An n-element logical vector indicating which rows of the original predictor data (`X`) were used in fitting. If the software uses all rows of `X`, then `RowsUsed` is an empty array (`[]`).
`SurrogateCutCategories`	An n-element cell array of the categories used for surrogate splits in `tree`, where n is the number of nodes in `tree`. For each node `k`, `SurrogateCutCategories{k}` is a cell array. The length of `SurrogateCutCategories{k}` is equal to the number of surrogate predictors found at this node. Every element of `SurrogateCutCategories{k}` is either an empty character vector for a continuous surrogate predictor, or is a two-element cell array with categories for a categorical surrogate predictor. The first element of this two-element cell array lists categories assigned to the left child by this surrogate split, and the second element of this two-element cell array lists categories assigned to the right child by this surrogate split. The order of the surrogate split variables at each node is matched to the order of variables in `SurrogateCutPredictor`. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, `SurrogateCutCategories` contains an empty cell.
`SurrogateCutFlip`	An n-element cell array of the numeric cut assignments used for surrogate splits in `tree`, where n is the number of nodes in `tree`. For each node `k`, `SurrogateCutFlip{k}` is a numeric vector. The length of `SurrogateCutFlip{k}` is equal to the number of surrogate predictors found at this node. Every element of `SurrogateCutFlip{k}` is either zero for a categorical surrogate predictor, or a numeric cut assignment for a continuous surrogate predictor. The numeric cut assignment can be either –1 or +1. For every surrogate split with a numeric cut C based on a continuous predictor variable Z, the left child is chosen if Z < C and the cut assignment for this surrogate split is +1, or if Z ≥ C and the cut assignment for this surrogate split is –1. Similarly, the right child is chosen if Z ≥ C and the cut assignment for this surrogate split is +1, or if Z < C and the cut assignment for this surrogate split is –1. The order of the surrogate split variables at each node is matched to the order of variables in `SurrogateCutPredictor`. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, `SurrogateCutFlip` contains an empty array.
`SurrogateCutPoint`	An n-element cell array of the numeric values used for surrogate splits in `tree`, where n is the number of nodes in `tree`. For each node `k`, `SurrogateCutPoint{k}` is a numeric vector. The length of `SurrogateCutPoint{k}` is equal to the number of surrogate predictors found at this node. Every element of `SurrogateCutPoint{k}` is either `NaN` for a categorical surrogate predictor, or a numeric cut for a continuous surrogate predictor. For every surrogate split with a numeric cut C based on a continuous predictor variable Z, the left child is chosen if Z<C and `SurrogateCutFlip` for this surrogate split is +1, or if Z≥C and `SurrogateCutFlip` for this surrogate split is –1. Similarly, the right child is chosen if Z ≥ C and `SurrogateCutFlip` for this surrogate split is +1, or if Z < C and `SurrogateCutFlip` for this surrogate split is –1. The order of the surrogate split variables at each node is matched to the order of variables returned by `SurrCutPredictor`. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, `SurrogateCutPoint` contains an empty cell.
`SurrogateCutType`	An n-element cell array indicating types of surrogate splits at each node in `tree`, where n is the number of nodes in `tree`. For each node `k`, `SurrogateCutType{k}` is a cell array with the types of the surrogate split variables at this node. The variables are sorted by the predictive measure of association with the optimal predictor in the descending order, and only variables with the positive predictive measure are included. The order of the surrogate split variables at each node is matched to the order of variables in `SurrogateCutPredictor`. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, `SurrogateCutType` contains an empty cell. A surrogate split type can be either `'continuous'` if the cut is defined in the form `Z` < `V` for a variable `Z` and cut point `V` or `'categorical'` if the cut is defined by whether `Z` takes a value in a set of categories.
`SurrogateCutPredictor`	An n-element cell array of the names of the variables used for surrogate splits in each node in `tree`, where n is the number of nodes in `tree`. Every element of `SurrogateCutPredictor` is a cell array with the names of the surrogate split variables at this node. The variables are sorted by the predictive measure of association with the optimal predictor in the descending order, and only variables with the positive predictive measure are included. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, `SurrogateCutPredictor` contains an empty cell.
`SurrogatePredictorAssociation`	An n-element cell array of the predictive measures of association for surrogate splits in `tree`, where n is the number of nodes in `tree`. For each node `k`, `SurrogatePredictorAssociation{k}` is a numeric vector. The length of `SurrogatePredictorAssociation{k}` is equal to the number of surrogate predictors found at this node. Every element of `SurrogatePredictorAssociation{k}` gives the predictive measure of association between the optimal split and this surrogate split. The order of the surrogate split variables at each node is the order of variables in `SurrogateCutPredictor`. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, `SurrogatePredictorAssociation` contains an empty cell.
`W`	The scaled `weights`, a vector with length `n`, the number of rows in `X`.
`X`	A matrix or table of predictor values. Each column of `X` represents one variable, and each row represents one observation.
`Y`	A numeric column vector with the same number of rows as `X`. Each entry in `Y` is the response to the data in the corresponding row of `X`.

Object Functions

`compact`	Compact regression tree
`crossval`	Cross-validated decision tree
`cvloss`	Regression error by cross validation
`gather`	Gather properties of Statistics and Machine Learning Toolbox object from GPU
`lime`	Local interpretable model-agnostic explanations (LIME)
`loss`	Regression error
`nodeVariableRange`	Retrieve variable range of decision tree node
`partialDependence`	Compute partial dependence
`plotPartialDependence`	Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots
`predict`	Predict responses using regression tree
`predictorImportance`	Estimates of predictor importance for regression tree
`prune`	Produce sequence of regression subtrees by pruning
`resubLoss`	Regression error by resubstitution
`resubPredict`	Predict resubstitution response of tree
`shapley`	Shapley values
`surrogateAssociation`	Mean predictive measure of association for surrogate splits in regression tree
`view`	View regression tree

Copy Semantics

Value. To learn how value classes affect copy operations, see Copying Objects.

Examples

collapse all

Construct Regression Tree

Open Live Script

Load the sample data.

load carsmall

Construct a regression tree using the sample data. The response variable is miles per gallon, MPG.

tree = fitrtree([Weight, Cylinders],MPG,...
                'CategoricalPredictors',2,'MinParentSize',20,...
                'PredictorNames',{'W','C'})

tree = 
  RegressionTree
           PredictorNames: {'W'  'C'}
             ResponseName: 'Y'
    CategoricalPredictors: 2
        ResponseTransform: 'none'
          NumObservations: 94

Predict the mileage of 4,000-pound cars with 4, 6, and 8 cylinders.

MPG4Kpred = predict(tree,[4000 4; 4000 6; 4000 8])

MPG4Kpred = 3×1

   19.2778
   19.2778
   14.3889

References

[1] Breiman, L., J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Boca Raton, FL: CRC Press, 1984.

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

The predict and update functions support code generation.
To integrate the prediction of a regression tree model into Simulink^®, you can use the RegressionTree Predict block in the Statistics and Machine Learning Toolbox™ library or a MATLAB^® Function block with the predict function.
When you train a regression tree model by using fitrtree, the following restrictions apply.
- The value of the ResponseTransform name-value argument cannot be an anonymous function. For fixed-point code generation, the value must be 'none' (default).
- You cannot use surrogate splits; that is, the value of the Surrogate name-value argument must be 'off'.
- Fixed-point code generation and code generation with a coder configurer do not support categorical predictors (logical, categorical, char, string, or cell). You cannot use the CategoricalPredictors name-value argument. To include categorical predictors in a model, preprocess them by using dummyvar before fitting the model.

For more information, see Introduction to Code Generation.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Usage notes and limitations:

The following object functions fully support GPU arrays:
The following object functions offer limited support for GPU arrays:
The object functions execute on a GPU if any of the following apply:
- The model was fitted with GPU arrays.
- The predictor data that you pass to the object function is a GPU array.
- The response data that you pass to the object function is a GPU array.

For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2011a

RegressionTree class

Description

Construction

Properties

Object Functions

Copy Semantics

Examples

Construct Regression Tree

References

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

Topics

RegressionTree class

Description

Construction

Properties

Object Functions

Copy Semantics

Examples

Construct Regression Tree

References

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

Topics

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.