15.1 C
London
Friday, July 5, 2024
HomeStatistics TutorialRHow to Interpret Diagnostic Plots in R

How to Interpret Diagnostic Plots in R

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

Linear regression models are used to describe the relationship between one or more predictor variables and a response variable.

However, once we’ve fit a regression model it’s a good idea to also produce diagnostic plots to analyze the residuals of the model and make sure that a linear model is appropriate to use for the particular data we’re working with.

This tutorial explains how to create and interpret diagnostic plots for a given regression model in R.

Example: Create & Interpret Diagnostic Plots in R

Suppose we fit a simple linear regression model using ‘hours studied’ to predict ‘exam score’ for students in a certain class:

#create data frame
df frame(hours=c(1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4, 5, 5, 6),
                 score=c(67, 65, 68, 77, 73, 79, 81, 88, 80, 67, 84, 93, 90, 91)) 

#fit linear regression model
model = lm(score ~ hours, data=df)

We can use the plot() command to produce four diagnostic plots for this regression model:

#produce diagnostic plots for regression model
plot(model)

diagnostic plots in R

Diagnostic Plot #1: Residuals vs. Leverage Plot

This plot is used to identify influential observations. If any points in this plot fall outside of Cook’s distance (the dashed lines) then it is an influential observation.

residuals vs. leverage plot in R

In our example we can see that observation #10 lies closest to the border of Cook’s distance, but it doesn’t fall outside of the dashed line. This means there aren’t any overly influential points in our dataset.

Diagnostic Plot #2: Scale-Location Plot

This plot is used to check the assumption of equal variance (also called “homoscedasticity”) among the residuals in our regression model. If the red line is roughly horizontal across the plot, then the assumption of equal variance is likely met.

scale-location plot in R

In our example we can see that the red line isn’t exactly horizontal across the plot, but it doesn’t deviate too wildly at any point. We would likely declare that the assumption of equal variance is not violated in this case.

Related: Understanding Heteroscedasticity in Regression Analysis

Diagnostic Plot #3: Normal Q-Q Plot

This plot is used to determine if the residuals of the regression model are normally distributed. If the points in this plot fall roughly along a straight diagonal line, then we can assume the residuals are normally distributed.

In our example we can see that the points fall roughly along the straight diagonal line. The observations #10 and #8 deviate a bit from the line at the tail ends, but not enough to declare that the residuals are non-normally distributed.

Diagnostic Plot #4: Residuals vs. Fitted Plot

This plot is used to determine if the residuals exhibit non-linear patterns. If the red line across the center of the plot is roughly horizontal then we can assume that the residuals follow a linear pattern.

In our example we can see that the red line deviates from a perfect horizontal line but not severely. We would likely declare that the residuals follow a roughly linear pattern and that a linear regression model is appropriate for this dataset.

Additional Resources

The Four Assumptions of Linear Regression
What Are Residuals in Statistics?
How to Create a Residual Plot in R
How to Interpret a Scale-Location Plot

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories