4 C
London
Friday, December 20, 2024
HomePandas in PythonGeneral Functions in PythonPandas: How to Find the Difference Between Two Columns

Pandas: How to Find the Difference Between Two Columns

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

To find the difference between any two columns in a pandas DataFrame, you can use the following syntax:

df['difference'] = df['column1'] - df['column2']

The following examples show how to use this syntax in practice.

Example 1: Find Difference Between Two Columns

Suppose we have the following pandas DataFrame that shows the total sales for two regions (A and B) during eight consecutive sales periods:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'period': [1, 2, 3, 4, 5, 6, 7, 8],
                   'A_sales': [12, 14, 15, 13, 18, 20, 19, 24],
                   'B_sales': [14, 19, 20, 22, 24, 20, 17, 23]})

#view DataFrame
df

period	A_sales	B_sales
0	1	12	14
1	2	14	19
2	3	15	20
3	4	13	22
4	5	18	24
5	6	20	20
6	7	19	17
7	8	24	23

The following code shows how calculate the difference between the sales in region B and region A for each sales period:

#add new column to represent difference between B sales and A sales
df['diff'] = df['B_sales'] - df['A_sales']

#view DataFrame
df

        period	A_sales	B_sales	 diff
0	1	12	14	 2
1	2	14	19	 5
2	3	15	20	 5
3	4	13	22	 9
4	5	18	24	 6
5	6	20	20	 0
6	7	19	17	-2
7	8	24	23	-1

We could also calculate the absolute difference in sales by using the pandas.Series.abs() function:

#add new column to represent absolute difference between B sales and A sales
df['diff'] = pd.Series.abs(df['B_sales'] - df['A_sales'])

#view DataFrame
df

	period	A_sales	B_sales	diff
0	1	12	14	2
1	2	14	19	5
2	3	15	20	5
3	4	13	22	9
4	5	18	24	6
5	6	20	20	0
6	7	19	17	2
7	8	24	23	1

Example 2: Find Difference Between Columns Based on Condition

We can also filter the DataFrame to only show rows where the difference between the columns is less than or greater than some value.

For example, the following code returns only the rows where the the sales in region A is greater than the sales in region B:

#add new column to represent difference between B sales and A sales
df['diff'] = df['B_sales'] - df['A_sales']

#display rows where sales in region A is greater than sales in region B
df[df['diff']0]


        period	A_sales	B_sales	diff
6	7	19	17	-2
7	8	24	23	-1

Additional Resources

Pandas: How to Find the Difference Between Two Rows
Pandas: How to Group and Aggregate by Multiple Columns

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories