13.2 C
London
Tuesday, July 2, 2024
HomePandas in PythonDataFrame Functions in PythonHow to Drop Duplicate Columns in Pandas (With Examples)

How to Drop Duplicate Columns in Pandas (With Examples)

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

You can use the following basic syntax to drop duplicate columns in pandas:

df.T.drop_duplicates().T

The following examples show how to use this syntax in practice.

Example: Drop Duplicate Columns in Pandas

Suppose we have the following pandas DataFrame:

import pandas as pd

#create DataFrame with duplicate columns
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'points': [25, 12, 15, 14, 19, 23, 25, 29],
                   'assists': [25, 12, 15, 14, 19, 23, 25, 29],
                   'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]})

df.columns = ['team', 'points', 'points', 'rebounds']

#view DataFrame
df

	team	points	points	rebounds
0	A	25	25	11
1	A	12	12	8
2	A	15	15	10
3	A	14	14	6
4	B	19	19	6
5	B	23	23	5
6	B	25	25	9
7	B	29	29	12

We can use the following code to remove the duplicate ‘points’ column:

#remove duplicate columns
df.T.drop_duplicates().T

        team	points	rebounds
0	A	25	11
1	A	12	8
2	A	15	10
3	A	14	6
4	B	19	6
5	B	23	5
6	B	25	9
7	B	29	12

Notice that the ‘points’ column has been removed while all other columns remained in the DataFrame.

It’s also worth noting that this code will remove duplicate columns even if the columns have different names, yet contain identical values.

For example, suppose we have the following pandas DataFrame:

import pandas as pd

#create DataFrame with duplicate columns
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'points': [25, 12, 15, 14, 19, 23, 25, 29],
                   'points2': [25, 12, 15, 14, 19, 23, 25, 29],
                   'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]})

#view DataFrame
df

	team	points	points2	rebounds
0	A	25	25	11
1	A	12	12	8
2	A	15	15	10
3	A	14	14	6
4	B	19	19	6
5	B	23	23	5
6	B	25	25	9
7	B	29	29	12

Notice that the ‘points’ and ‘points2’ columns contain identical values.

We can use the following code to remove the duplicate ‘points2’ column:

#remove duplicate columns
df.T.drop_duplicates().T

        team	points	rebounds
0	A	25	11
1	A	12	8
2	A	15	10
3	A	14	6
4	B	19	6
5	B	23	5
6	B	25	9
7	B	29	12

Additional Resources

The following tutorials explain how to perform other common functions in pandas:

How to Drop Duplicate Rows in a Pandas DataFrame
How to Drop Columns in Pandas
How to Exclude Columns in Pandas

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories