2.4 C
London
Friday, December 20, 2024
HomePandas in PythonGeneral Functions in PythonHow to Perform an Anti-Join in Pandas

How to Perform an Anti-Join in Pandas

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

An anti-join allows you to return all rows in one dataset that do not have matching values in another dataset.

You can use the following syntax to perform an anti-join between two pandas DataFrames:

outer = df1.merge(df2, how='outer', indicator=True)

anti_join = outer[(outer._merge=='left_only')].drop('_merge', axis=1)

The following example shows how to use this syntax in practice.

Example: Perform an Anti-Join in Pandas

Suppose we have the following two pandas DataFrames:

import pandas as pd

#create first DataFrame
df1 = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E'],
                    'points': [18, 22, 19, 14, 30]})

print(df1)

  team  points
0    A      18
1    B      22
2    C      19
3    D      14
4    E      30

#create second DataFrame
df2 = pd.DataFrame({'team': ['A', 'B', 'C', 'F', 'G'],
                    'points': [18, 22, 19, 22, 29]})

print(df2)

  team  points
0    A      18
1    B      22
2    C      19
3    F      22
4    G      29

We can use the following code to return all rows in the first DataFrame that do not have a matching team in the second DataFrame:

#perform outer join
outer = df1.merge(df2, how='outer', indicator=True)

#perform anti-join
anti_join = outer[(outer._merge=='left_only')].drop('_merge', axis=1)

#view results
print(anti_join)

  team  points
3    D      14
4    E      30

We can see that there are exactly two teams from the first DataFrame that do not have a matching team name in the second DataFrame.

The anti-join worked as expected.

The end result is one DataFrame that only contains the rows where the team name belongs to the first DataFrame but not the second DataFrame.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Do an Inner Join in Pandas
How to Do a Left Join in Pandas
How to Do a Cross Join in Pandas

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories