11.1 C
London
Sunday, July 7, 2024
HomePandas in PythonGeneral Functions in PythonPandas: How to Rename Columns in Groupby Function

Pandas: How to Rename Columns in Groupby Function

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

You can use the following basic syntax to rename columns in a groupby() function in pandas:

df.groupby('group_col').agg(sum_col1=('col1', 'sum'),
                            mean_col2=('col2', 'mean'),
                            max_col3=('col3', 'max'))

This particular example calculates three aggregated columns and names them sum_col1, mean_col2, and max_col3.

The following example shows how to use this syntax in practice.

Example: Rename Columns in Groupby Function in Pandas

Suppose we have the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'points': [30, 22, 19, 14, 14, 11, 20, 28],
                   'assists': [5, 6, 6, 5, 8, 7, 7, 9],
                   'rebounds': [4, 13, 15, 10, 7, 7, 5, 11]})

#view DataFrame
print(df)

  team  points  assists  rebounds
0    A      30        5         4
1    A      22        6        13
2    A      19        6        15
3    A      14        5        10
4    B      14        8         7
5    B      11        7         7
6    B      20        7         5
7    B      28        9        11

We can use the following syntax to group the rows by the team column, then calculate three aggregated columns while providing specific names to the aggregated columns:

#calculate several aggregated columns by group and rename aggregated columns
df.groupby('team').agg(sum_points=('points', 'sum'),
                       mean_assists=('assists', 'mean'),
                       max_rebounds=('rebounds', 'max'))

	sum_points	mean_assists	max_rebounds
team			
A	        85	        5.50	          15
B	        73	        7.75	          11

Notice that the three aggregated columns have the custom names that we provided in the agg() function.

Also note that we could use NumPy functions to calculate the sum, mean, and max values within the agg() function if we’d like.

import numpy as np

#calculate several aggregated columns by group and rename aggregated columns
df.groupby('team').agg(sum_points=('points', np.sum),
                       mean_assists=('assists', np.mean),
                       max_rebounds=('rebounds', np.max))

	sum_points	mean_assists	max_rebounds
team			
A	        85	        5.50	          15
B	        73	        7.75	          11

These results match the ones from the previous example.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

How to List All Column Names in Pandas
How to Sort Columns by Name in Pandas
How to Drop Duplicate Columns in Pandas

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories