Two terms that students often confuse in statistics are interpolation and extrapolation.
Here’s the difference:
Interpolation refers to predicting values that are inside of a range of data points.
Extrapolation refers to predicting values that are outside of a range of data points.
The following example illustrates the difference between the two terms.
Example: Interpolation vs. Extrapolation
Suppose we have the following dataset:
We may decide to fit a simple linear regression model to these points:
We could then use the fitted regression model to predict the values of points both inside and outside of the range of data points.
When we use the fitted regression model to predict the values of points inside the existing range of data points it is known as interpolation.
Conversely, when we use the fitted regression model to predict the values of points outside the existing range it is known as extrapolation:
The Potential Danger of Extrapolation
When we perform extrapolation, we assume that the same pattern that exists inside the current range of data points also exists outside of the range as well.
However, this can be a dangerous assumption because it’s possible that the pattern that exists outside the current range of data points is quite different:
For this reason, it can be dangerous to use extrapolation to predict the values of data points that fall outside of the range of values that was used to build the regression model.
In practice, it’s often fine to use extrapolation to predict the values of points that fall just slightly outside of the range of existing values but the further outside the range the higher the likelihood that the difference between the predicted value and the actual value will be large.
When to Use Extrapolation
Often it requires domain-specific expertise to determine if extrapolation is a reasonable idea or not.
For example, suppose a marketing department at a business fits a simple linear regression model using advertising spend as the predictor variable and total revenue as the response variable.
In this scenario, it may be reasonable to assume that a steady increase in advertising spend will lead to a predictable increase in total revenue:
In this scenario, we may be quite confident in our ability to extrapolate values.
However, consider a scenario where a biologist wants to use total fertilizer to predict plant growth.
She may decide to fit a simple linear regression model to the data points, but since there is an upper limit on how tall plants can grow, it probably doesn’t make sense to use extrapolation to predict the values of points outside of the range of values used to fit the model:
In this scenario, we may be considerably less confident in our ability to extrapolate values.
The Takeaway: Extrapolation can make sense in some fields more than others, but there is always a potential danger that the pattern that exists within the range of values used to fit the model does not exist outside of the range.
Additional Resources
How to Perform Linear Interpolation in Excel
How to Make Predictions with Linear Regression