You can use the following basic syntax to combine rows with the same column values in a pandas DataFrame:
#define how to aggregate various fields agg_functions = {'field1': 'first', 'field2': 'sum', 'field': 'sum'} #create new DataFrame by combining rows with same id values df_new = df.groupby(df['id']).aggregate(agg_functions)
The following example shows how to use this syntax in practice.
Example: Combine Rows with Same Column Values in Pandas
Suppose we have the following pandas DataFrame that contains information about sales and returns made by various employees at a company:
import pandas as pd #create dataFrame df = pd.DataFrame({'id': [101, 101, 102, 103, 103, 103], 'employee': ['Dan', 'Dan', 'Rick', 'Ken', 'Ken', 'Ken'], 'sales': [4, 1, 3, 2, 5, 3], 'returns': [1, 2, 2, 1, 3, 2]}) #view DataFrame print(df) id employee sales returns 0 101 Dan 4 1 1 101 Dan 1 2 2 102 Rick 3 2 3 103 Ken 2 1 4 103 Ken 5 3 5 103 Ken 3 2
We can use the following syntax to combine rows that have the same value in the id column and then aggregate the remaining columns:
#define how to aggregate various fields agg_functions = {'employee': 'first', 'sales': 'sum', 'returns': 'sum'} #create new DataFrame by combining rows with same id values df_new = df.groupby(df['id']).aggregate(agg_functions) #view new DataFrame print(df_new) employee sales returns id 101 Dan 5 3 102 Rick 3 2 103 Ken 10 6
The new DataFrame combined all of the rows in the previous DataFrame that had the same value in the id column and then calculated the sum of the values in the sales and returns columns.
Note: Refer to the pandas documentation for a complete list of aggregations available to use with the GroupBy() function.
Additional Resources
The following tutorials explain how to perform other common tasks in pandas:
Pandas: How to Find the Difference Between Two Columns
Pandas: How to Find the Difference Between Two Rows
Pandas: How to Sort Columns by Name