16.6 C
London
Friday, July 5, 2024
HomePythonHypothesis Tests in PythonHow to Perform a Shapiro-Wilk Test in Python

How to Perform a Shapiro-Wilk Test in Python

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

The Shapiro-Wilk test is a test of normality. It is used to determine whether or not a sample comes from a normal distribution.

To perform a Shapiro-Wilk test in Python we can use the scipy.stats.shapiro() function, which takes on the following syntax:

scipy.stats.shapiro(x)

where:

  • x: An array of sample data.

This function returns a test statistic and a corresponding p-value.

If the p-value is below a certain significance level, then we have sufficient evidence to say that the sample data does not come from a normal distribution.

This tutorial shows a couple examples of how to use this function in practice.

Example 1: Shapiro-Wilk Test on Normally Distributed Data

Suppose we have the following sample data:

from numpy.random import seed
from numpy.random import randn

#set seed (e.g. make this example reproducible)
seed(0)

#generate dataset of 100 random values that follow a standard normal distribution
data = randn(100)

The following code shows how to perform a Shapiro-Wilk test on this sample of 100 data values to determine if it came from a normal distribution:

from scipy.stats import shapiro

#perform Shapiro-Wilk test
shapiro(data)

ShapiroResult(statistic=0.9926937818527222, pvalue=0.8689165711402893)

From the output we can see that the test statistic is 0.9927 and the corresponding p-value is 0.8689.

Since the p-value is not less than .05, we fail to reject the null hypothesis. We do not have sufficient evidence to say that the sample data does not come from a normal distribution.

This result shouldn’t be surprising since we generated the sample data using the randn() function, which generates random values that follow a standard normal distribution.

Example 2: Shapiro-Wilk Test on Non-Normally Distributed Data

Now suppose we have the following sample data:

from numpy.random import seed
from numpy.random import poisson

#set seed (e.g. make this example reproducible)
seed(0)

#generate dataset of 100 values that follow a Poisson distribution with mean=5
data = poisson(5, 100)

The following code shows how to perform a Shapiro-Wilk test on this sample of 100 data values to determine if it came from a normal distribution:

from scipy.stats import shapiro

#perform Shapiro-Wilk test
shapiro(data)

ShapiroResult(statistic=0.9581913948059082, pvalue=0.002994443289935589)

From the output we can see that the test statistic is 0.9582 and the corresponding p-value is 0.00299.

Since the p-value is less than .05, we reject the null hypothesis. We have sufficient evidence to say that the sample data does not come from a normal distribution.

This result also shouldn’t be surprising since we generated the sample data using the poisson() function, which generates random values that follow a Poisson distribution.

Additional Resources

The following tutorials explain how to perform other normality tests in various statistical software:

How to Perform a Shapiro-Wilk Test in R
How to Perform an Anderson-Darling Test in Python
How to Perform a Kolmogorov-Smirnov Test in Python

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories