3.1 C
London
Friday, December 20, 2024
HomeRDescriptive Statistics in RHow to Select Unique Rows in a Data Frame in R

How to Select Unique Rows in a Data Frame in R

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

You can use the following methods to select unique rows from a data frame in R:

Method 1: Select Unique Rows Across All Columns

library(dplyr)

df %>% distinct()

Method 2: Select Unique Rows Based on One Column

library(dplyr)

df %>% distinct(column1, .keep_all=TRUE)

Method 3: Select Unique Rows Based on Multiple Columns

library(dplyr)

df %>% distinct(column1, column2, .keep_all=TRUE)

This tutorial explains how to use each method in practice with the following data frame:

#create data frame
df frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 position=c('G', 'G', 'F', 'F', 'G', 'G', 'F', 'F'),
                 points=c(10, 10, 8, 14, 15, 15, 17, 17))

#view data frame
df

  team position points
1    A        G     10
2    A        G     10
3    A        F      8
4    A        F     14
5    B        G     15
6    B        G     15
7    B        F     17
8    B        F     17

Example 1: Select Unique Rows Across All Columns

The following code shows how to select rows that have unique values across all columns in the data frame:

library(dplyr)

#select rows with unique values across all columns
df %>% distinct()

  team position points
1    A        G     10
2    A        F      8
3    A        F     14
4    B        G     15
5    B        F     17

We can see that there are five unique rows in the data frame.

Note: When duplicate rows are encountered, only the first unique row is kept.

Example 2: Select Unique Rows Based on One Column

The following code shows how to select unique rows based on the team column only.

library(dplyr)

#select rows with unique values based on team column only
df %>% distinct(team, .keep_all=TRUE)

  team position points
1    A        G     10
2    B        G     15

Since there are only two unique values in the team column, only the rows with the first occurrence of each value are kept.

Note: The argument .keep_all=TRUE tells R to keep all other columns in the output.

Example 3: Select Unique Rows Based on Multiple Columns

The following code shows how to select unique rows based on the team and position columns only.

library(dplyr)

#select rows with unique values based on team and position columns only
df %>% distinct(team, position, .keep_all=TRUE)

  team position points
1    A        G     10
2    A        F      8
3    B        G     15
4    B        F     17

Four rows are returned, since there are four unique combinations of values across the team and position columns.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Filter for Unique Values Using dplyr
How to Filter by Multiple Conditions Using dplyr
How to Count Number of Occurrences in Columns in R

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories