11.1 C
London
Sunday, July 7, 2024
HomeStatistics TutorialRHow to Calculate Leverage Statistics in R

How to Calculate Leverage Statistics in R

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

In statistics, an observation is considered an outlier if it has a value for the response variable that is much larger than the rest of the observations in the dataset.

Similarly, an observation is considered to have high leverage if it has a value (or values) for the predictor variables that are much more extreme compared to the rest of the observations in the dataset.

One of the first steps in any type of analysis is to take a closer look at the observations that have high leverage since they could have a large impact on the results of a given model.

This tutorial shows a step-by-step example of how to calculate and visualize the leverage for each observation in a model in R.

Step 1: Build a Regression Model

First, we’ll build a multiple linear regression model using the built-in mtcars dataset in R:

#load the dataset
data(mtcars)

#fit a regression model
model #view model summary
summary(model)

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 30.735904   1.331566  23.083  

Step 2: Calculate the Leverage for each Observation

Next, we’ll use the hatvalues() function to calculate the leverage for each observation in the model:

#calculate leverage for each observation in the model
hats as.data.frame(hatvalues(model))

#display leverage stats for each observation
hats

                    hatvalues(model)
Mazda RX4                 0.04235795
Mazda RX4 Wag             0.04235795
Datsun 710                0.06287776
Hornet 4 Drive            0.07614472
Hornet Sportabout         0.08097817
Valiant                   0.05945972
Duster 360                0.09828955
Merc 240D                 0.08816960
Merc 230                  0.05102253
Merc 280                  0.03990060
Merc 280C                 0.03990060
Merc 450SE                0.03890159
Merc 450SL                0.03890159
Merc 450SLC               0.03890159
Cadillac Fleetwood        0.19443875
Lincoln Continental       0.16042361
Chrysler Imperial         0.12447530
Fiat 128                  0.08346304
Honda Civic               0.09493784
Toyota Corolla            0.08732818
Toyota Corona             0.05697867
Dodge Challenger          0.06954069
AMC Javelin               0.05767659
Camaro Z28                0.10011654
Pontiac Firebird          0.12979822
Fiat X1-9                 0.08334018
Porsche 914-2             0.05785170
Lotus Europa              0.08193899
Ford Pantera L            0.13831817
Ferrari Dino              0.12608583
Maserati Bora             0.49663919
Volvo 142E                0.05848459

Typically we take a closer look at observations that have a leverage value greater than 2.

An easy way to do this is to sort the observations based on their leverage value, descending:

#sort observations by leverage, descending
hats[order(-hats['hatvalues(model)']), ]

 [1] 0.49663919 0.19443875 0.16042361 0.13831817 0.12979822 0.12608583
 [7] 0.12447530 0.10011654 0.09828955 0.09493784 0.08816960 0.08732818
[13] 0.08346304 0.08334018 0.08193899 0.08097817 0.07614472 0.06954069
[19] 0.06287776 0.05945972 0.05848459 0.05785170 0.05767659 0.05697867
[25] 0.05102253 0.04235795 0.04235795 0.03990060 0.03990060 0.03890159
[31] 0.03890159 0.03890159

We can see that the largest leverage value is 0.4966. Since this isn’t greater than 2, we know that none of the observations in our dataset have high leverage.

Step 3: Visualize the Leverage for each Observation

Lastly, we can create a quick plot to visualize the leverage for each observation:

#plot leverage values for each observation
plot(hatvalues(model), type = 'h')

leverage in R

The x-axis displays the index of each observation in the dataset and the y-value displays the corresponding leverage statistic for each observation.

Additional Resources

How to Perform Simple Linear Regression in R
How to Perform Multiple Linear Regression in R
How to Create a Residual Plot in R

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories