16.2 C
London
Thursday, July 4, 2024
HomeStatistics TutorialStatologyHow to Select Unique Rows in a Pandas DataFrame

How to Select Unique Rows in a Pandas DataFrame

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

You can use the following syntax to select unique rows in a pandas DataFrame:

df = df.drop_duplicates()

And you can use the following syntax to select unique rows across specific columns in a pandas DataFrame:

df = df.drop_duplicates(subset=['col1', 'col2', ...])

The following examples show how to use this syntax in practice with the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'a': [4, 4, 3, 8],
                   'b': [2, 2, 6, 8],
                   'c': [2, 2, 9, 9]})

#view DataFrame
df

	a	b	c
0	4	2	2
1	4	2	2
2	3	6	9
3	8	8	9

Example 1: Select Unique Rows Across All Columns

The following code shows how to select unique rows across all columns of the pandas DataFrame:

#drop duplicates from DataFrame
df = df.drop_duplicates()

#view DataFrame
df

	a	b	c
0	4	2	2
2	3	6	9
3	8	8	9

The first and second row were duplicates, so pandas dropped the second row.

By default, the drop_duplicates() function will keep the first duplicate. However, you can specify to keep the last duplicate instead:

#drop duplicates from DataFrame, keep last duplicate
df = df.drop_duplicates(keep='last')

#view DataFrame
df

	a	b	c
1	4	2	2
2	3	6	9
3	8	8	9

Example 2: Select Unique Rows Across Specific Columns

The following code shows how to select unique rows across just column ‘c’ in the DataFrame:

#drop duplicates from column 'c' in DataFrame
df = df.drop_duplicates(subset=['c'])

#view DataFrame
df
	a	b	c
0	4	2	2
2	3	6	9

Two rows were dropped from the DataFrame.

Additional Resources

How to Select Rows by Index in a Pandas DataFrame
How to Get Row Numbers in a Pandas DataFrame
How to Find Unique Values in a Column in Pandas

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories