29.3 C
London
Thursday, June 19, 2025
HomePythonDescriptive Statistics in PythonHow to Calculate Conditional Probability in Python

How to Calculate Conditional Probability in Python

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

The conditional probability that event A occurs, given that event B has occurred, is calculated as follows:

P(A|B) = P(A∩B) / P(B)

where:

P(A∩B) = the probability that event and event both occur. 

P(B) = the probability that event B occurs.

The following example shows how to use this formula to calculate conditional probabilities in Python.

Example: Calculate Conditional Probability in Python

Suppose we send out a survey to 300 individuals asking them which sport they like best: baseball, basketball, football, or soccer.

We can create the following table in Python to hold the survey responses:

import pandas as pd
import numpy as np

#create pandas DataFrame with raw data
df = pd.DataFrame({'gender': np.repeat(np.array(['Male', 'Female']), 150),
                   'sport': np.repeat(np.array(['Baseball', 'Basketball', 'Football',
                                                'Soccer', 'Baseball', 'Basketball',
                                                'Football', 'Soccer']), 
                                    (34, 40, 58, 18, 34, 52, 20, 44))})

#produce contingency table to summarize raw data
survey_data = pd.crosstab(index=df['gender'], columns=df['sport'], margins=True)

#view contingency table
survey_data

sport	Baseball	Basketball	Football	Soccer	 All
gender					
Female	      34	        52	      20	    44	 150
Male	      34	        40	      58	    18	 150
All	      68	        92	      78	    62	 300

Related: How to Use pd.crosstab() to Create Contingency Tables in Python

We can use the following syntax to extract values from the table:

#extract value in second row and first column 
survey_data.iloc[1, 0]

[1] 34

We can use the following syntax to calculate the probability that an individual is male, given that they prefer baseball as their favorite sport:

#calculate probability of being male, given that individual prefers baseball
survey_data.iloc[1, 0] / survey_data.iloc[2, 0]

0.5

And we can use the following syntax to calculate the probability that an individual prefers basketball as their favorite sport, given that they’re female:

#calculate probability of preferring basketball, given that individual is female
survey_data.iloc[0, 1] / survey_data.iloc[0, 4]

0.3466666666666667

We can use this basic approach to calculate any conditional probability we’d like from the contingency table.

Additional Resources

The following tutorials provide additional information on dealing with probability:

Law of Total Probability
How to Find the Mean of a Probability Distribution
How to Find the Standard Deviation of a Probability Distribution

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories