You can use the following methods to calculate a cumulative count in a pandas DataFrame:
Method 1: Cumulative Count by Group
df['cum_count'] = df.groupby('col1').cumcount()
Method 2: Cumulative Count by Multiple Groups
df['cum_count'] = df.groupby(['col1', 'col2']).cumcount()
The following examples shows how to use each method in practice with the following pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'], 'position': ['G', 'G', 'G', 'F', 'G','G', 'F', 'F'], 'points': [14, 22, 25, 34, 30, 12, 10, 18]}) #view DataFrame print(df) team position points 0 A G 14 1 A G 22 2 A G 25 3 A F 34 4 B G 30 5 B G 12 6 B F 10 7 B F 18
Example 1: Cumulative Count by Group in Pandas
We can use the following syntax to create a new column called team_cum_count that displays the cumulative count for each team in the DataFrame:
#calculate cumulative count by team
df['team_cum_count'] = df.groupby('team').cumcount()
#view updated DataFrame
print(df)
team position points team_cum_count
0 A G 14 0
1 A G 22 1
2 A G 25 2
3 A F 34 3
4 B G 30 0
5 B G 12 1
6 B F 10 2
7 B F 18 3
The new column called team_cum_count contains the cumulative count of each team, starting with a value of zero.
If you’d like the count to start at one instead, simply add one to the end of the line:
#calculate cumulative count (starting at 1) by team
df['team_cum_count'] = df.groupby('team').cumcount() + 1
#view updated DataFrame
print(df)
team position points team_cum_count
0 A G 14 1
1 A G 22 2
2 A G 25 3
3 A F 34 4
4 B G 30 1
5 B G 12 2
6 B F 10 3
7 B F 18 4
The new column called team_cum_count contains the cumulative count of each team, starting with a value of one.
Example 2: Calculate Cumulative Count by Group in Pandas
We can use the following syntax to create a new column called team_pos_cum_count that displays the cumulative count for each team and position in the DataFrame:
#calculate cumulative count by team
df['team_pos_cum_count'] = df.groupby(['team', 'position']).cumcount()
#view updated DataFrame
print(df)
team position points team_pos_cum_count
0 A G 14 0
1 A G 22 1
2 A G 25 2
3 A F 34 0
4 B G 30 0
5 B G 12 1
6 B F 10 0
7 B F 18 1
The new column called team_pos_cum_count contains the cumulative count of each team and position starting with a value of zero.
Note: You can find the complete documentation for the cumcount function in pandas here.
Additional Resources
The following tutorials explain how to perform other common tasks in pandas:
How to Sum Specific Columns in Pandas
How to Sum Columns Based on a Condition in Pandas
How to Calculate a Reversed Cumulative Sum in Pandas