15.1 C
London
Friday, July 5, 2024
HomePythonDescriptive Statistics in PythonHow to Perform a Correlation Test in Python (With Example)

How to Perform a Correlation Test in Python (With Example)

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

One way to quantify the relationship between two variables is to use the Pearson correlation coefficient, which measures the linear association between two variables.

It always takes on a value between -1 and 1 where:

  • -1 indicates a perfectly negative linear correlation
  • 0 indicates no linear correlation
  • 1 indicates a perfectly positive linear correlation

To determine if a correlation coefficient is statistically significant, you can calculate the corresponding t-score and p-value.

The formula to calculate the t-score of a correlation coefficient (r) is:

t = r * √n-2 / √1-r2

The p-value is then calculated as the corresponding two-sided p-value for the t-distribution with n-2 degrees of freedom.

Example: Correlation Test in Python

To determine if the correlation coefficient between two variables is statistically significant, you can perform a correlation test in Python using the pearsonr function from the SciPy library.

This function returns the correlation coefficient between two variables along with the two-tailed p-value.

For example, suppose we have the following two arrays in Python:

#create two arrays
x = [3, 4, 4, 5, 7, 8, 10, 12, 13, 15]
y = [2, 4, 4, 5, 4, 7, 8, 19, 14, 10]

We can import the pearsonr function and calculate the Pearson correlation coefficient between the two arrays:

from scipy.stats.stats import pearsonr

#calculation correlation coefficient and p-value between x and y
pearsonr(x, y)

(0.8076177030748631, 0.004717255828132089)

Here’s how to interpret the output:

  • Pearson correlation coefficient (r): 0.8076
  • Two-tailed p-value: 0.0047

Since the correlation coefficient is close to 1, this tells us that there is a strong positive association between the two variables.

And since the corresponding p-value is less than .05, we conclude that there is a statistically significant association between the two variables.

Note that we can also extract the individual correlation coefficient and p-value from the pearsonr function as well:

#extract correlation coefficient (rounded to 4 decimal places)
r = round(pearsonr(x, y)[0], 4)

print(r)

0.8076

#extract p-value (rounded to 4 decimal places) 
p = round(pearsonr(x, y)[1], 4)

print(p) 

0.0047

These values are a bit easier to read compared to the output from the original pearsonr function.

Additional Resources

The following tutorials provide additional information about correlation coefficients:

An Introduction to the Pearson Correlation Coefficient
What is Considered to Be a “Strong” Correlation?
The Five Assumptions for Pearson Correlation

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories