3.1 C
London
Friday, December 20, 2024
HomeStatistics TutorialRHow to Perform Robust Regression in R (Step-by-Step)

How to Perform Robust Regression in R (Step-by-Step)

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

Robust regression is a method we can use as an alternative to ordinary least squares regression when there are outliers or influential observations in the dataset we’re working with.

To perform robust regression in R, we can use the rlm() function from the MASS package, which uses the following syntax:

The following step-by-step example shows how to perform robust regression in R for a given dataset.

Step 1: Create the Data

First, let’s create a fake dataset to work with:

#create data
df frame(x1=c(1, 3, 3, 4, 4, 6, 6, 8, 9, 3,
                      11, 16, 16, 18, 19, 20, 23, 23, 24, 25),
                 x2=c(7, 7, 4, 29, 13, 34, 17, 19, 20, 12,
                      25, 26, 26, 26, 27, 29, 30, 31, 31, 32),
                  y=c(17, 170, 19, 194, 24, 2, 25, 29, 30, 32,
                      44, 60, 61, 63, 63, 64, 61, 67, 59, 70))

#view first six rows of data
head(df)

  x1 x2   y
1  1  7  17
2  3  7 170
3  3  4  19
4  4 29 194
5  4 13  24
6  6 34   2

Step 2: Perform Ordinary Least Squares Regression

Next, let’s fit an ordinary least squares regression model and create a plot of the standardized residuals.

In practice, we often consider any standardized residual with an absolute value greater than 3 to be an outlier.

#fit ordinary least squares regression model
ols #create plot of y-values vs. standardized residuals
plot(df$y, rstandard(ols), ylab='Standardized Residuals', xlab='y') 
abline(h=0)

From the plot we can see that there are two observations with standardized residuals around 3.

This is an indication that there are two potential outliers in the dataset and thus we may benefit from performing robust regression instead.

Step 3: Perform Robust Regression

Next, let’s use the rlm() function to fit a robust regression model:

library(MASS)

#fit robust regression model
robust 

To determine if this robust regression model offers a better fit to the data compared to the OLS model, we can calculate the residual standard error of each model.

The residual standard error (RSE) is a way to measure the standard deviation of the residuals in a regression model. The lower the value for RSE, the more closely a model is able to fit the data.

The following code shows how to calculate the RSE for each model:

#find residual standard error of ols model
summary(ols)$sigma

[1] 49.41848

#find residual standard error of ols model
summary(robust)$sigma

[1] 9.369349

We can see that the RSE for the robust regression model is much lower than the ordinary least squares regression model, which tells us that the robust regression model offers a better fit to the data.

Additional Resources

How to Perform Simple Linear Regression in R
How to Perform Multiple Linear Regression in R
How to Perform Polynomial Regression in R

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories