10.7 C
London
Sunday, July 7, 2024
HomeSoftware TutorialsPythonHow to Perform an Anderson-Darling Test in Python

How to Perform an Anderson-Darling Test in Python

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

An Anderson-Darling Test is a goodness of fit test that measures how well your data fit a specified distribution.

This test is most commonly used to determine whether or not your data follow a normal distribution.

This type of test is useful for testing for normality, which is a common assumption used in many statistical tests including regression, ANOVA, t-tests, and many others.

Example: Anderson-Darling Test in Python

To conduct an Anderson-Darling Test in Python, we can use the anderson() function from the scipy.stats library, which uses the following syntax:

anderson(x, dist=’norm’)

where:

  • x: array of sample data
  • dist: the type of distribution to test against. Default is ‘norm’ but you can also specify ‘expon’ or ‘logistic.’

For example, here’s how to perform an Anderson-Darling Test on a sample of 50 normally distributed random variables:

import numpy as np

#create data
np.random.seed(0)
data = np.random.normal(size=50)

#perform Anderson-Darling Test 
from scipy.stats import anderson
anderson(data)

AndersonResult(statistic=0.15006999533388665,
               critical_values=array([0.538, 0.613, 0.736, 0.858, 1.021]),
               significance_level=array([15. , 10. ,  5. ,  2.5,  1. ]))

The test statistic is 0.150. We can compare this value to each critical value that corresponds to each significance level to see if the test results are significant. For example:

  • The critical value for α = 0.01 is 1.021. Because the test statistic (0.150) is not greater than this critical value, the results are not significant at a significance level of 0.01.
  • The critical value for α = 0.025 is 0.858. Because the test statistic (0.150) is not greater than this critical value, the results are not significant at a significance level of 0.025.

And so on.

We can see that the test results are not significant at any significance level, which means we would not reject the null hypothesis of the test. Thus, we don’t have sufficient evidence to say that the sample data is not normally distributed.

This result shouldn’t be surprising considering we used the np.rand.normal() function to generate a sample of 50 normally distributed values.

Consider instead if we performed the Anderson-Darling Test on a sample of 50 random integers between 0 and 10:

import numpy as np

#create data
np.random.seed(0)
data = np.random.randint(0, 10, size=50)

#perform Anderson-Darling Test 
from scipy.stats import anderson
anderson(data)

AndersonResult(statistic=1.1926463985076836,
               critical_values=array([0.538, 0.613, 0.736, 0.858, 1.021]),
               significance_level=array([15. , 10. ,  5. ,  2.5,  1. ]))

The test statistic is 1.1926. We can compare this value to each critical value that corresponds to each significance level to see if the test results are significant. For example:

  • The critical value for α = 0.01 is 1.021. Because the test statistic (1.1926) is greater than this critical value, the results are significant at a significance level of 0.01.
  • The critical value for α = 0.025 is 0.858. Because the test statistic (1.1926) is greater than this critical value, the results are significant at a significance level of 0.025.

And so on.

We can see that the test results are significant at every significance level, which means we would reject the null hypothesis of the test no matter which significance level we choose to use. Thus, we have sufficient evidence to say that the sample data is not normally distributed.

This result also shouldn’t be surprising considering we used the np.rand.randint() function to generate a sample of 50 random integers between 0 and 10, which is unlikely to follow a normal distribution.

You can find more Python tutorials here.

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories