Robust regression is a method we can use as an alternative to ordinary least squares regression when there are outliers or influential observations in the dataset we’re working with.
To perform robust regression in R, we can use the rlm() function from the MASS package, which uses the following syntax:
The following step-by-step example shows how to perform robust regression in R for a given dataset.
Step 1: Create the Data
First, let’s create a fake dataset to work with:
#create data df frame(x1=c(1, 3, 3, 4, 4, 6, 6, 8, 9, 3, 11, 16, 16, 18, 19, 20, 23, 23, 24, 25), x2=c(7, 7, 4, 29, 13, 34, 17, 19, 20, 12, 25, 26, 26, 26, 27, 29, 30, 31, 31, 32), y=c(17, 170, 19, 194, 24, 2, 25, 29, 30, 32, 44, 60, 61, 63, 63, 64, 61, 67, 59, 70)) #view first six rows of data head(df) x1 x2 y 1 1 7 17 2 3 7 170 3 3 4 19 4 4 29 194 5 4 13 24 6 6 34 2
Step 2: Perform Ordinary Least Squares Regression
Next, let’s fit an ordinary least squares regression model and create a plot of the standardized residuals.
In practice, we often consider any standardized residual with an absolute value greater than 3 to be an outlier.
#fit ordinary least squares regression model ols #create plot of y-values vs. standardized residuals plot(df$y, rstandard(ols), ylab='Standardized Residuals', xlab='y') abline(h=0)
From the plot we can see that there are two observations with standardized residuals around 3.
This is an indication that there are two potential outliers in the dataset and thus we may benefit from performing robust regression instead.
Step 3: Perform Robust Regression
Next, let’s use the rlm() function to fit a robust regression model:
library(MASS)
#fit robust regression model
robust
To determine if this robust regression model offers a better fit to the data compared to the OLS model, we can calculate the residual standard error of each model.
The residual standard error (RSE) is a way to measure the standard deviation of the residuals in a regression model. The lower the value for RSE, the more closely a model is able to fit the data.
The following code shows how to calculate the RSE for each model:
#find residual standard error of ols model summary(ols)$sigma [1] 49.41848 #find residual standard error of ols model summary(robust)$sigma [1] 9.369349
We can see that the RSE for the robust regression model is much lower than the ordinary least squares regression model, which tells us that the robust regression model offers a better fit to the data.
Additional Resources
How to Perform Simple Linear Regression in R
How to Perform Multiple Linear Regression in R
How to Perform Polynomial Regression in R