You can use the following basic syntax to check if two pandas DataFrames are equal:
df1.equals(df2)
This will return a value of True or False.
If two DataFrames are not equal, then you can use the following syntax to find the rows in the second DataFrame that do not exist in the first DataFrame:
#perform outer join on two DataFrames all_df = df1.merge(df2, indicator=True, how='outer') #find which rows only exist in second DataFrame only_df2 = all_df[all_df['_merge'] == 'right_only'] only_df2 = only_df2.drop('_merge', axis=1)
The following example shows how to use this syntax in practice.
Example: Check if Two pandas DataFrames Are Equal
Suppose we have the following two pandas DataFrames:
import pandas as pd #create first DataFrame df1 = pd.DataFrame({'team' : ['A', 'B', 'C', 'D', 'E'], 'points' : [12, 15, 22, 29, 24]}) print(df1) team points 0 A 12 1 B 15 2 C 22 3 D 29 4 E 24 #create second DataFrame df2 = pd.DataFrame({'team' : ['A', 'D', 'F', 'G', 'H'], 'points' : [12, 29, 15, 19, 10]}) print(df2) team points 0 A 12 1 D 29 2 F 15 3 G 19 4 H 10
We can use the following syntax to check if the two DataFrames are equal:
#check if two DataFrames are equal df1.equals(df2) False
The output returns False, which means the two DataFrames are not equal.
We can then use the following syntax to find which rows exist in the second DataFrame but not in the first:
#perform outer join on two DataFrames all_df = df1.merge(df2, indicator=True, how='outer') #find which rows only exist in second DataFrame only_df2 = all_df[all_df['_merge'] == 'right_only'] only_df2 = only_df2.drop('_merge', axis=1) #view results print(only_df2) team points 5 F 15 6 G 19 7 H 10
From the output we can see that there are three rows in the second DataFrame that do not exist in the first DataFrame.
Additional Resources
The following tutorials explain how to perform other common tasks in pandas:
Pandas: Add Column from One DataFrame to Another
Pandas: Get Rows Which Are Not in Another DataFrame
Pandas: How to Check if Multiple Columns are Equal