0.4 C
London
Friday, March 14, 2025
HomeStatistics TutorialRHow to Create a Histogram of Residuals in R

How to Create a Histogram of Residuals in R

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

One of the main assumptions of linear regression is that the residuals are normally distributed.

One way to visually check this assumption is to create a histogram of the residuals and observe whether or not the distribution follows a “bell-shape” reminiscent of the normal distribution.

This tutorial provides a step-by-step example of how to create a histogram of residuals for a regression model in R.

Step 1: Create the Data

First, let’s create some fake data to work with:

#make this example reproducible
set.seed(0)

#create data
x1 #view first six rows of data
head(data)

        x1        x2          y
1 3.262954 6.3455776 -1.1371530
2 1.673767 1.6696701 -0.6886338
3 3.329799 2.1520303  5.8081615
4 3.272429 4.1397409  3.7815228
5 2.414641 0.6088427  4.3269030
6 0.460050 5.7301563  6.6721111

Step 2: Fit the Regression Model

Next, we’ll fit a multiple linear regression model to the data:

#fit multiple linear regression model
model 

Step 3: Create a Histogram of Residuals

Lastly, we’ll use the ggplot visualization package to create a histogram of the residuals from the model:

#load ggplot2
library(ggplot2)

#create histogram of residuals
ggplot(data = data, aes(x = model$residuals)) +
    geom_histogram(fill = 'steelblue', color = 'black') +
    labs(title = 'Histogram of Residuals', x = 'Residuals', y = 'Frequency')

Histogram of residuals in R

Note that we can also specify the number of bins to place the residuals in by using the bin argument.

The fewer the bins, the wider the bars will be in the histogram. For example, we could specify 20 bins:

#create histogram of residuals
ggplot(data = data, aes(x = model$residuals)) +
    geom_histogram(bins = 20, fill = 'steelblue', color = 'black') +
    labs(title = 'Histogram of Residuals', x = 'Residuals', y = 'Frequency')

Residual histogram in R

Or we could specify 10 bins:

#create histogram of residuals
ggplot(data = data, aes(x = model$residuals)) +
    geom_histogram(bins = 10, fill = 'steelblue', color = 'black') +
    labs(title = 'Histogram of Residuals', x = 'Residuals', y = 'Frequency')

No matter how many bins we specify, we can see that the residuals are roughly normally distributed.

We could also perform a formal statistical test like the Shapiro-Wilk, Kolmogorov-Smirnov, or Jarque-Bera to test for normality.

However, keep in mind that these tests are sensitive to large sample sizes – that is, they often conclude that the residuals are not normal when the sample size is large.

For this reason, it’s often easier to assess normality by creating a histogram of the residuals.

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories