2.4 C
London
Friday, December 20, 2024
HomePandas in PythonGeneral Functions in PythonPandas: How to Check if Two DataFrames Are Equal

Pandas: How to Check if Two DataFrames Are Equal

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

You can use the following basic syntax to check if two pandas DataFrames are equal:

df1.equals(df2)

This will return a value of True or False.

If two DataFrames are not equal, then you can use the following syntax to find the rows in the second DataFrame that do not exist in the first DataFrame:

#perform outer join on two DataFrames
all_df = df1.merge(df2, indicator=True, how='outer')

#find which rows only exist in second DataFrame
only_df2 = all_df[all_df['_merge'] == 'right_only']
only_df2 = only_df2.drop('_merge', axis=1)

The following example shows how to use this syntax in practice.

Example: Check if Two pandas DataFrames Are Equal

Suppose we have the following two pandas DataFrames:

import pandas as pd

#create first DataFrame
df1 = pd.DataFrame({'team' : ['A', 'B', 'C', 'D', 'E'], 
                    'points' : [12, 15, 22, 29, 24]}) 

print(df1)

  team  points
0    A      12
1    B      15
2    C      22
3    D      29
4    E      24

#create second DataFrame
df2 = pd.DataFrame({'team' : ['A', 'D', 'F', 'G', 'H'],
                    'points' : [12, 29, 15, 19, 10]})

print(df2)

  team  points
0    A      12
1    D      29
2    F      15
3    G      19
4    H      10

We can use the following syntax to check if the two DataFrames are equal:

#check if two DataFrames are equal
df1.equals(df2)

False

The output returns False, which means the two DataFrames are not equal.

We can then use the following syntax to find which rows exist in the second DataFrame but not in the first:

#perform outer join on two DataFrames
all_df = df1.merge(df2, indicator=True, how='outer')

#find which rows only exist in second DataFrame
only_df2 = all_df[all_df['_merge'] == 'right_only']
only_df2 = only_df2.drop('_merge', axis=1)

#view results
print(only_df2)

  team  points
5    F      15
6    G      19
7    H      10

From the output we can see that there are three rows in the second DataFrame that do not exist in the first DataFrame.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

Pandas: Add Column from One DataFrame to Another
Pandas: Get Rows Which Are Not in Another DataFrame
Pandas: How to Check if Multiple Columns are Equal

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories