20.5 C
London
Monday, June 2, 2025
HomePandas in PythonGeneral Functions in PythonPandas: Get Rows Which Are Not in Another DataFrame

Pandas: Get Rows Which Are Not in Another DataFrame

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

You can use the following basic syntax to get the rows in one pandas DataFrame which are not in another DataFrame:

#merge two DataFrames and create indicator column
df_all = df1.merge(df2.drop_duplicates(), on=['col1','col2'],
                   how='left', indicator=True)

#create DataFrame with rows that exist in first DataFrame only
df1_only = df_all[df_all['_merge'] == 'left_only']

The following example shows how to use this syntax in practice.

Example: Get Rows in Pandas DataFrame Which Are Not in Another DataFrame

Suppose we have the following two pandas DataFrames:

import pandas as pd

#create first DataFrame
df1 = pd.DataFrame({'team' : ['A', 'B', 'C', 'D', 'E'], 
                    'points' : [12, 15, 22, 29, 24]}) 

print(df1)

  team  points
0    A      12
1    B      15
2    C      22
3    D      29
4    E      24

#create second DataFrame
df2 = pd.DataFrame({'team' : ['A', 'D', 'F', 'G', 'H'],
                    'points' : [12, 29, 15, 19, 10]})

print(df2)

  team  points
0    A      12
1    D      29
2    F      15
3    G      19
4    H      10

We can use the following syntax to merge the two DataFrames and create an indicator column to indicate which rows belong in each DataFrame:

#merge two DataFrames and create indicator column
df_all = df1.merge(df2.drop_duplicates(), on=['team','points'],
                   how='left', indicator=True)

#view result
print(df_all)

We can then use the following syntax to only get the rows in the first DataFrame that are not in the second DataFrame:

#create DataFrame with rows that exist in first DataFrame only
df1_only = df_all[df_all['_merge'] == 'left_only']

#view DataFrame
print(df1_only)

  team  points     _merge
1    B      15  left_only
2    C      22  left_only
4    E      24  left_only

Lastly, we can drop the _merge column if we’d like:

#drop '_merge' column
df1_only = df1_only.drop('_merge', axis=1)

#view DataFrame
print(df1_only)

  team  points
1    B      15
2    C      22
4    E      24

The result is a DataFrame in which all of the rows exist in the first DataFrame but not in the second DataFrame.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Add Column from One DataFrame to Another in Pandas
How to Change the Order of Columns in Pandas
How to Sort Columns by Name in Pandas

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories