You can use the following methods to coalesce the values from multiple columns of a pandas DataFrame into one column:
Method 1: Coalesce Values by Default Column Order
df['coalesce'] = df.bfill(axis=1).iloc[:, 0]
Method 2: Coalesce Values Using Specific Column Order
df['coalesce'] = df[['col3', 'col1', 'col2']].bfill(axis=1).iloc[:, 0]
The following examples show how to use each method in practice with the following pandas DataFrame:
import pandas as pd import numpy as np #create DataFrame df = pd.DataFrame({'points': [np.nan, np.nan, 19, np.nan, 14], 'assists': [np.nan, 7, 7, 9, np.nan], 'rebounds': [3, 4, np.nan, np.nan, 6]}) #view DataFrame print(df) points assists rebounds 0 NaN NaN 3.0 1 NaN 7.0 4.0 2 19.0 7.0 NaN 3 NaN 9.0 NaN 4 14.0 NaN 6.0
Method 1: Coalesce Values by Default Column Order
The following code shows how to coalesce the values in the points, assists, and rebounds columns into one column, using the first non-null value across the three columns as the coalesced value:
#create new column that contains first non-null value from three existing columns
df['coalesce'] = df.bfill(axis=1).iloc[:, 0]
#view updated DataFrame
print(df)
points assists rebounds coalesce
0 NaN NaN 3.0 3.0
1 NaN 7.0 4.0 7.0
2 19.0 7.0 NaN 19.0
3 NaN 9.0 NaN 9.0
4 14.0 NaN 6.0 14.0
Here’s how the value in the coalesce column was chosen:
- First row: The first non-null value was 3.0.
- Second row: The first non-null value was 7.0.
- Third row: The first non-null value was 19.0.
- Fourth row: The first non-null value was 9.0.
- Fifth row: The first non-null value was 14.0.
Method 2: Coalesce Values Using Specific Column Order
The following code shows how to coalesce the values in the three columns by analyzing the columns in the following order: assists, rebounds, points.
#coalesce values in specific column order
df['coalesce'] = df[['assists', 'rebounds', 'points']].bfill(axis=1).iloc[:, 0]
#view updated DataFrame
print(df)
points assists rebounds coalesce
0 NaN NaN 3.0 3.0
1 NaN 7.0 4.0 7.0
2 19.0 7.0 NaN 7.0
3 NaN 9.0 NaN 9.0
4 14.0 NaN 6.0 6.0
Here’s the logic that was used to decide which value to place in the coalesce column:
- If the value in the assists column is non-null then use that value.
- Otherwise, if the value in the rebounds column is non-null then use that value.
- Otherwise, if the value in the points column is non-null then use that value.
Note: You can find the complete documentation for the bfill() function here.
Additional Resources
The following tutorials explain how to perform other common operations in pandas:
How to Combine Two Columns in Pandas
How to Sum Specific Columns in Pandas
How to Sort by Multiple Columns in Pandas