An anti-join allows you to return all rows in one dataset that do not have matching values in another dataset.
You can use the following syntax to perform an anti-join between two pandas DataFrames:
outer = df1.merge(df2, how='outer', indicator=True) anti_join = outer[(outer._merge=='left_only')].drop('_merge', axis=1)
The following example shows how to use this syntax in practice.
Example: Perform an Anti-Join in Pandas
Suppose we have the following two pandas DataFrames:
import pandas as pd
#create first DataFrame
df1 = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E'],
'points': [18, 22, 19, 14, 30]})
print(df1)
team points
0 A 18
1 B 22
2 C 19
3 D 14
4 E 30
#create second DataFrame
df2 = pd.DataFrame({'team': ['A', 'B', 'C', 'F', 'G'],
'points': [18, 22, 19, 22, 29]})
print(df2)
team points
0 A 18
1 B 22
2 C 19
3 F 22
4 G 29
We can use the following code to return all rows in the first DataFrame that do not have a matching team in the second DataFrame:
#perform outer join outer = df1.merge(df2, how='outer', indicator=True) #perform anti-join anti_join = outer[(outer._merge=='left_only')].drop('_merge', axis=1) #view results print(anti_join) team points 3 D 14 4 E 30
We can see that there are exactly two teams from the first DataFrame that do not have a matching team name in the second DataFrame.
The anti-join worked as expected.
The end result is one DataFrame that only contains the rows where the team name belongs to the first DataFrame but not the second DataFrame.
Additional Resources
The following tutorials explain how to perform other common tasks in pandas:
How to Do an Inner Join in Pandas
How to Do a Left Join in Pandas
How to Do a Cross Join in Pandas