You can use the following syntax to select unique rows in a pandas DataFrame:
df = df.drop_duplicates()
And you can use the following syntax to select unique rows across specific columns in a pandas DataFrame:
df = df.drop_duplicates(subset=['col1', 'col2', ...])
The following examples show how to use this syntax in practice with the following pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'a': [4, 4, 3, 8], 'b': [2, 2, 6, 8], 'c': [2, 2, 9, 9]}) #view DataFrame df a b c 0 4 2 2 1 4 2 2 2 3 6 9 3 8 8 9
Example 1: Select Unique Rows Across All Columns
The following code shows how to select unique rows across all columns of the pandas DataFrame:
#drop duplicates from DataFrame df = df.drop_duplicates() #view DataFrame df a b c 0 4 2 2 2 3 6 9 3 8 8 9
The first and second row were duplicates, so pandas dropped the second row.
By default, the drop_duplicates() function will keep the first duplicate. However, you can specify to keep the last duplicate instead:
#drop duplicates from DataFrame, keep last duplicate df = df.drop_duplicates(keep='last') #view DataFrame df a b c 1 4 2 2 2 3 6 9 3 8 8 9
Example 2: Select Unique Rows Across Specific Columns
The following code shows how to select unique rows across just column ‘c’ in the DataFrame:
#drop duplicates from column 'c' in DataFrame df = df.drop_duplicates(subset=['c']) #view DataFrame df a b c 0 4 2 2 2 3 6 9
Two rows were dropped from the DataFrame.
Additional Resources
How to Select Rows by Index in a Pandas DataFrame
How to Get Row Numbers in a Pandas DataFrame
How to Find Unique Values in a Column in Pandas