stepwisefit
Fit linear regression model using stepwise regression
Syntax
Description
returns a vector b
= stepwisefit(X
,y
)b
of coefficient estimates from stepwise
regression of the response vector y
on the predictor variables in
matrix X
. stepwisefit
begins with an initial
constant model and takes forward or backward steps to add or remove variables, until
a stopping criterion is satisfied.
specifies additional options using one or more name-value pair arguments. For
example, you can specify a nonconstant initial model, or a maximum number of steps
that b
= stepwisefit(X
,y
,Name,Value
)stepwisefit
can take.
[
also returns a specification of the variables in the final regression model
b
,se
,pval
,finalmodel
,stats
] = stepwisefit(___)finalmodel
, and statistics stats
about
the final model.
Examples
Input Arguments
Output Arguments
More About
Algorithms
Stepwise regression is a method for adding terms to and removing terms from a multilinear model based on their statistical significance. This method begins with an initial model and then takes successive steps to modify the model by adding or removing terms. At each step, the p-value of an F-statistic is computed to test models with and without a potential term. If a term is not currently in the model, the null hypothesis is that the term would have a zero coefficient if added to the model. If there is sufficient evidence to reject the null hypothesis, the term is added to the model. Conversely, if a term is currently in the model, the null hypothesis is that the term has a zero coefficient. If there is insufficient evidence to reject the null hypothesis, the term is removed from the model. The method proceeds as follows:
Fit the initial model.
If any terms not in the model have p-values less than an entry tolerance, add the one with the smallest p-value and repeat this step. For example, assume the initial model is the default constant model and the entry tolerance is the default
0.05
. The algorithm first fits all models consisting of the constant plus another term and identifies the term that has the smallest p-value, for example term4
. If the term4
p-value is less than0.05
, then term4
is added to the model. Next, the algorithm performs a search among all models consisting of the constant, term4
, and another term. If a term not in the model has a p-value less than0.05
, the term with the smallest p-value is added to the model and the process is repeated. When no further terms exist that can be added to the model, the algorithm proceeds to step 3.If any terms in the model have p-values greater than an exit tolerance, remove the one with the largest p-value and go to step 2; otherwise, end.
In each step of the algorithm, stepwisefit
uses the method of least
squares to estimate the model coefficients. After adding a term to the model at an
earlier stage, the algorithm might subsequently drop that term if it is no longer
helpful in combination with other terms added later. The method terminates when no
single step improves the model. However, the final model is not guaranteed to be
optimal, which means having the best fit to the data. A different initial model or a
different sequence of steps might lead to a better fit. In this sense, stepwise models
are locally optimal, but are not necessarily globally optimal.
Alternative Functionality
You can create a model using
fitlm
, and then manually adjust the model usingstep
,addTerms
, andremoveTerms
.Use
stepwiselm
if you have data in a table, you have a mix of continuous and categorical predictors, or you want to specify model formulas that can potentially include higher-order and interaction terms.Use
stepwiseglm
to create stepwise generalized linear models (for example, if you have a binary response variable and want to fit a classification model).
References
[1] Draper, Norman R., and Harry Smith. Applied Regression Analysis. Hoboken, NJ: Wiley-Interscience, 1998. pp. 307–312.
Version History
Introduced before R2006a
See Also
stepwise
| addedvarplot
| regress
| stepwiselm
| stepwiseglm