6 C
London
Tuesday, March 11, 2025
HomePythonDescriptive Statistics in PythonHow to Create Frequency Tables in Python

How to Create Frequency Tables in Python

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

frequency table is a table that displays the frequencies of different categories. This type of table is particularly useful for understanding the distribution of values in a dataset.

This tutorial explains how to create frequency tables in Python.

One-Way Frequency Table for a Series

To find the frequencies of individual values in a pandas Series, you can use the value_counts() function:

import pandas as pd

#define Series
data = pd.Series([1, 1, 1, 2, 3, 3, 3, 3, 4, 4, 5])

#find frequencies of each value
data.value_counts()

3    4
1    3
4    2
5    1
2    1

You can add the argument sort=False if you don’t want the data values sorted by frequency:

data.value_counts(sort=False)

1    3
2    1
3    4
4    2
5    1

The way to interpret the output is as follows:

  • The value “1” occurs times in the Series.
  • The value “2” occurs time in the Series.
  • The value “3” occurs times in the Series.

And so on.

One-Way Frequency Table for a DataFrame

To find frequencies of a pandas DataFrame you can use the crosstab() function, which uses the following sytax:

crosstab(index, columns)

where:

  • index: name of column to group by
  • columns: name to give to frequency column

For example, suppose we have a DataFrame with information about the letter grade, age, and gender of 10 different students in a class. Here’s how to find the frequency for each letter grade:

#create data
df = pd.DataFrame({'Grade': ['A','A','A','B','B', 'B', 'B', 'C', 'D', 'D'],
                   'Age': [18, 18, 18, 19, 19, 20, 18, 18, 19, 19],
                   'Gender': ['M','M', 'F', 'F', 'F', 'M', 'M', 'F', 'M', 'F']})

#view data
df

	Grade	Age	Gender
0	    A	 18	     M
1	    A	 18	     M
2	    A	 18	     F
3	    B	 19	     F
4	    B	 19	     F
5	    B	 20	     M
6	    B	 18	     M
7	    C	 18	     F
8	    D	 19	     M
9	    D	 19	     F 	  

#find frequency of each letter grade
pd.crosstab(index=df['Grade'], columns='count')

col_0	count
Grade	
A	    3
B	    4
C	    1
D	    2

The way to interpret this is as follows:

  • students received an ‘A’ in the class.
  • students received a ‘B’ in the class.
  • student received a ‘C’ in the class.
  • students received a ‘D’ in the class.

We can use a similar syntax to find the frequency counts for other columns. For example, here’s how to find frequency by age:

pd.crosstab(index=df['Age'], columns='count') 

col_0	count
Age	
18   	    5
19	    4
20	    1

The way to interpret this is as follows:

  • students are 18 years old.
  • students are 19 years old.
  • student is 20 years old.

You can also easily display the frequencies as proportions of the entire dataset by dividing by the sum:

#define crosstab
tab = pd.crosstab(index=df['Age'], columns='count')

#find proportions 
tab/tab.sum()

col_0	count
Age	
18	  0.5
19	  0.4
20	  0.1

The way to interpret this is as follows:

  • 50% of students are 18 years old.
  • 40% of students are 19 years old.
  • 10% of students are 20 years old.

Two-Way Frequency Tables for a DataFrame

You can also create a two-way frequency table to display the frequencies for two different variables in the dataset. For example, here’s how to create a two-way frequency table for the variables Age and Grade:

pd.crosstab(index=df['Age'], columns=df['Grade'])


Grade	A	B	C	D
Age				
18	3	1	1	0
19	0	2	0	2
20	0	1	0	0

The way to interpret this is as follows:

  • There are students who are 18 years old and received an ‘A’ in the class.
  • There is student who is 18 years old and received a ‘B’ in the class.
  • There is student who is 18 years old and received a ‘C’ in the class.
  • There are students who are 18 years old and received a ‘D’ in the class.

And so on.

You can find the complete documentation for the crosstab() function here.

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories