20 C
London
Monday, July 21, 2025
HomePythonDescriptive Statistics in PythonHow to Calculate Point-Biserial Correlation in Python

How to Calculate Point-Biserial Correlation in Python

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

Point-biserial correlation is used to measure the relationship between a binary variable, x, and a continuous variable, y.

Similar to the Pearson correlation coefficient, the point-biserial correlation coefficient takes on a value between -1 and 1 where:

  • -1 indicates a perfectly negative correlation between two variables
  • 0 indicates no correlation between two variables
  • 1 indicates a perfectly positive correlation between two variables

This tutorial explains how to calculate the point-biserial correlation between two variables in Python.

Example: Point-Biserial Correlation in Python

Suppose we have a binary variable, x, and a continuous variable, y:

x = [0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0]
y = [12, 14, 17, 17, 11, 22, 23, 11, 19, 8, 12]

We can use the pointbiserialr() function from the scipy.stats library to calculate the point-biserial correlation between the two variables.

Note that this function returns a correlation coefficient along with a corresponding p-value:

import scipy.stats as stats

#calculate point-biserial correlation
stats.pointbiserialr(x, y)

PointbiserialrResult(correlation=0.21816, pvalue=0.51928)

The point-biserial correlation coefficient is 0.21816 and the corresponding p-value is 0.51928.

Since the correlation coefficient is positive, this indicates that when the variable x takes on the value “1” that the variable y tends to take on higher values compared to when the variable x takes on the value “0.”

Since the p-value of this correlation is not less than .05, this correlation is not statistically significant. 

You can find the exact details of how this correlation is calculated in the scipy.stats documentation.

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories