How to Drop Duplicate Rows in a Pandas DataFrame

The easiest way to drop duplicate rows in a pandas DataFrame is by using the drop_duplicates() function, which uses the following syntax:

df.drop_duplicates(subset=None, keep=’first’, inplace=False)

where:

subset: Which columns to consider for identifying duplicates. Default is all columns.
keep: Indicates which duplicates (if any) to keep.
- first: Delete all duplicate rows except first.
- last: Delete all duplicate rows except last.
- False: Delete all duplicates.
inplace: Indicates whether to drop duplicates in place or return a copy of the DataFrame.

This tutorial provides several examples of how to use this function in practice on the following DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['a', 'b', 'b', 'c', 'c', 'd'],
                   'points': [3, 7, 7, 8, 8, 9],
                   'assists': [8, 6, 7, 9, 9, 3]})

#display DataFrame
print(df)

  team  points  assists
0    a       3        8
1    b       7        6
2    b       7        7
3    c       8        9
4    c       8        9
5    d       9        3

Example 1: Remove Duplicates Across All Columns

The following code shows how to remove rows that have duplicate values across all columns:

df.drop_duplicates()

        team	points	assists
0	a	3	8
1	b	7	6
2	b	7	7
3	c	8	9
5	d	9	3

By default, the drop_duplicates() function deletes all duplicates except the first.

However, we could use the keep=False argument to delete all duplicates entirely:

df.drop_duplicates(keep=False)

	team	points	assists
0	a	3	8
1	b	7	6
2	b	7	7
5	d	9	3

Example 2: Remove Duplicates Across Specific Columns

The following code shows how to remove rows that have duplicate values across just the columns titled team and points:

df.drop_duplicates(subset=['team', 'points'])

        team	points	assists
0	a	3	8
1	b	7	6
3	c	8	9
5	d	9	3

Additional Resources

How to Drop Duplicate Columns in Pandas
How to Sort Values in a Pandas DataFrame
How to Filter a Pandas DataFrame on Multiple Conditions
How to Insert a Column Into a Pandas DataFrame

Highlights of the 2023 Union Budget: Announcements for 15 Key Sectors

Gold Prices May Rise as Import Duty on Gold raised by 5%

Relief to MSMEs as Mandatory GST Registration waived for online sellers

GST Council Meet Highlights, Full List of Items to get Costlier

Highlights of the 2023 Union Budget: Announcements for 15 Key Sectors

Gold Prices May Rise as Import Duty on Gold raised by 5%

Relief to MSMEs as Mandatory GST Registration waived for online sellers

GST Council Meet Highlights, Full List of Items to get Costlier

Learn About Opening an Automobile Repair Shop in India

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

Income Splitting in Canada for 2023

Can I Deduct Home Office Expenses on my Tax Return 2023?

Canadian Tax – Personal Tax Deadline 2022

Example 1: Remove Duplicates Across All Columns

Example 2: Remove Duplicates Across Specific Columns

Additional Resources

Learn About Opening an Automobile Repair Shop in India

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

Income Splitting in Canada for 2023

Can I Deduct Home Office Expenses on my Tax Return 2023?

ABOUT US

Latest

Learn About Opening an Automobile Repair Shop in India

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

Income Splitting in Canada for 2023

Popular

How to Create a Stem-and-Leaf Plot in SPSS

How to Create a Correlation Matrix in SPSS

Excel: How to Use IF Function with Multiple Conditions

Sitemap