What is an Influential Observation in Statistics?

In statistics, an influential observation is an observation in a dataset that, when removed, dramatically changes the coefficient estimates of a regression model.

The most common way to measure the influence of observations is to use Cook’s distance, which quantifies how much all of the fitted values in a regression model change when the i^th observation is deleted.

As a rule of thumb, any observation with a Cook’s distance greater than 1 is considered to be an observation with high leverage.

The following example shows how to calculate and interpret Cook’s distance for a given dataset to detect potential influential observations.

Example: Detecting Influential Observations

Suppose we have the following dataset with 14 values:

Now suppose we fit a simple linear regression model. The regression output is shown below:

Using statistical software, we can calculate the following values for Cook’s distance for each observation:

Notice that the last observation has a value significantly greater than 1 for Cook’s distance, which tells us that it’s an influential observation.

Suppose we remove this value from the dataset and fit a new simple linear regression model. The output for this model is shown below:

Notice that the regression coefficients for the intercept and x both changed dramatically. This tells us that removing the influential observation from the dataset completely changed the fitted regression model.

The following plots show the difference between these two fitted regression equations:

Notice how much the one influential observation changes the regression line. By removing this observation, we were able to find a regression line that fits the data much more closely.

Notes

It’s important to note that Cook’s distance should be used as a way to identify potentially influential observations. However, just because an observation is influential doesn’t necessarily mean that it should be deleted from the dataset.

First, you should verify that the observation isn’t a result of a data entry error or some other odd occurrence. If it turns out to be a legit value, you can then decide to deal with it in one of the following ways:

Delete it from the dataset.
Leave it in the dataset.
Replace it with an alternative value like the mean or median.

Depending on your specific scenario, one of these options may make more sense than the others.

How to Calculate Cook’s Distance in Practice

The following tutorials explain how to calculate Cook’s distance for a given dataset in Python and R:

How to Calculate Cook’s Distance in Python
How to Calculate Cook’s Distance in R

Highlights of the 2023 Union Budget: Announcements for 15 Key Sectors

Gold Prices May Rise as Import Duty on Gold raised by 5%

Relief to MSMEs as Mandatory GST Registration waived for online sellers

GST Council Meet Highlights, Full List of Items to get Costlier

Highlights of the 2023 Union Budget: Announcements for 15 Key Sectors

Gold Prices May Rise as Import Duty on Gold raised by 5%

Relief to MSMEs as Mandatory GST Registration waived for online sellers

GST Council Meet Highlights, Full List of Items to get Costlier

Learn About Opening an Automobile Repair Shop in India

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

Income Splitting in Canada for 2023

Can I Deduct Home Office Expenses on my Tax Return 2023?

Canadian Tax – Personal Tax Deadline 2022

Example: Detecting Influential Observations

Notes

How to Calculate Cook’s Distance in Practice

Learn About Opening an Automobile Repair Shop in India

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

Income Splitting in Canada for 2023

Can I Deduct Home Office Expenses on my Tax Return 2023?

ABOUT US

Latest

Learn About Opening an Automobile Repair Shop in India

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

Income Splitting in Canada for 2023

Popular

How to Create a Stem-and-Leaf Plot in SPSS

How to Create a Correlation Matrix in SPSS

How to Add Target Line to Graph in Excel

Sitemap