11.1 C
London
Sunday, July 7, 2024
HomePandas in PythonGeneral Functions in PythonHow to Compare Two Columns in Pandas (With Examples)

How to Compare Two Columns in Pandas (With Examples)

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

Often you may want to compare two columns in a Pandas DataFrame and write the results of the comparison to a third column.

You can easily do this by using the following syntax:

conditions=[(condition1),(condition2)]
choices=["choice1","choice2"]

df["new_column_name"]=np.select(conditions, choices, default)

Here’s what this code does:

  • conditions are the conditions to check for between the two columns
  • choices are the results to return based on the conditions
  • np.select is used to return the results to the new column

The following example shows how to use this code in practice.

Example: Compare Two Columns in Pandas

Suppose we have the following DataFrame that shows the number of goals scored by two soccer teams in five different matches:

import numpy as np
import pandas as pd

#create DataFrame
df = pd.DataFrame({'A_points': [1, 3, 3, 3, 5],
                   'B_points': [4, 5, 2, 3, 2]})
             
#view DataFrame      
df

          A_points  B_points
0         1         4
1         3         5
2         3         2
3         3         3
4         5         2

We can use the following code to compare the number of goals by row and output the winner of the match in a third column:

#define conditions
conditions = [df['A_points'] > df['B_points'], 
              df['A_points'] B_points']]

#define choices
choices = ['A', 'B']

#create new column in DataFrame that displays results of comparisons
df['winner'] = np.select(conditions, choices, default='Tie')

#view the DataFrame
df

          A_points  B_points  winner
0         1         4         B
1         3         5         B
2         3         2         A
3         3         3         Tie
4         5         2         A

The results of the comparison are shown in the new column called winner.

Notes

Here are a few things to keep in mind when comparing two columns in a pandas DataFrame:

  • The number of conditions and choices should be equal.
  • The default value specifies the value to display in the new column if none of the conditions are met.
  • Both NumPy and Pandas are required to make this code work.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Rename Columns in Pandas
How to Add a Column to a Pandas DataFrame
How to Change the Order of Columns in Pandas DataFrame

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories