You can use the following methods to select unique rows from a data frame in R:
Method 1: Select Unique Rows Across All Columns
library(dplyr)
df %>% distinct()
Method 2: Select Unique Rows Based on One Column
library(dplyr)
df %>% distinct(column1, .keep_all=TRUE)
Method 3: Select Unique Rows Based on Multiple Columns
library(dplyr)
df %>% distinct(column1, column2, .keep_all=TRUE)
This tutorial explains how to use each method in practice with the following data frame:
#create data frame
df frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
position=c('G', 'G', 'F', 'F', 'G', 'G', 'F', 'F'),
points=c(10, 10, 8, 14, 15, 15, 17, 17))
#view data frame
df
team position points
1 A G 10
2 A G 10
3 A F 8
4 A F 14
5 B G 15
6 B G 15
7 B F 17
8 B F 17
Example 1: Select Unique Rows Across All Columns
The following code shows how to select rows that have unique values across all columns in the data frame:
library(dplyr)
#select rows with unique values across all columns
df %>% distinct()
team position points
1 A G 10
2 A F 8
3 A F 14
4 B G 15
5 B F 17
We can see that there are five unique rows in the data frame.
Note: When duplicate rows are encountered, only the first unique row is kept.
Example 2: Select Unique Rows Based on One Column
The following code shows how to select unique rows based on the team column only.
library(dplyr)
#select rows with unique values based on team column only
df %>% distinct(team, .keep_all=TRUE)
team position points
1 A G 10
2 B G 15
Since there are only two unique values in the team column, only the rows with the first occurrence of each value are kept.
Note: The argument .keep_all=TRUE tells R to keep all other columns in the output.
Example 3: Select Unique Rows Based on Multiple Columns
The following code shows how to select unique rows based on the team and position columns only.
library(dplyr)
#select rows with unique values based on team and position columns only
df %>% distinct(team, position, .keep_all=TRUE)
team position points
1 A G 10
2 A F 8
3 B G 15
4 B F 17
Four rows are returned, since there are four unique combinations of values across the team and position columns.
Additional Resources
The following tutorials explain how to perform other common tasks in R:
How to Filter for Unique Values Using dplyr
How to Filter by Multiple Conditions Using dplyr
How to Count Number of Occurrences in Columns in R