A nested model is simply a regression model that contains a subset of the predictor variables in another regression model.
For example, suppose we have the following regression model (let’s call it Model A) that predicts the number of points scored by a basketball player based on four predictor variables:
Points = β0 + β1(minutes) + β2(height) + β3(position) + β4(shots) + ε
One example of a nested model (let’s call it Model B) would be the following model with only two of the predictor variables from model A:
Points = β0 + β1(minutes) + β2(height) + ε
We would say that Model B is nested in Model A because Model B contains a subset of the predictor variables from Model A.
However, consider if we had another model (let’s call it Model C) that contains three predictor variables:
Points = β0 + β1(minutes) + β2(height) + β3(free throws attempted)
We would not say that Model C is nested in Model A because each model contains predictor variables that the other model does not.
The Importance of Nested Models
We often use nested models in practice when we want to know if a model with a full set of predictor variables can fit a dataset better than a model with a subset of those predictor variables.
For example, in the scenario above we might fit a full model using minutes played, height, position, and shots attempted to predict the number of points scored by basketball players.
However, we might suspect that position and shots attempted don’t do a very good job of predicting points scored.
Thus, we might fit a nested model that only uses minutes played and height to predict points scored.
We can then compare the two models to determine if there is a statistically significant difference.
If there is no significant difference between the models, we can drop position and shots attempted as predictor variables since they don’t significantly improve the model.
How to Analyze Nested Models
To determine if a nested model is significantly different than a “full” model, we typically perform a likelihood ratio test which uses the following null and alternative hypotheses:
H0: The full model and the nested model fit the data equally well. Thus, you should use the nested model.
HA: The full model fits the data significantly better than the nested model. Thus, you should use the full model.
A likelihood ratio test produces a Chi-Square test statistic and a corresponding p-value.
If the p-value of the test is below a certain significance level (e.g. 0.05), then we can reject the null hypothesis and conclude that the full model offers a significantly better fit.
The following tutorials explain how to perform a likelihood ratio test using R and Python: