16.6 C
London
Thursday, July 4, 2024
HomeStatistics TutorialRHow to Remove Duplicate Rows in R so None are Left

How to Remove Duplicate Rows in R so None are Left

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

You can use the following methods in R to remove duplicate rows from a data frame so that none are left in the resulting data frame:

Method 1: Use Base R

new_df !(duplicated(df) | duplicated(df, fromLast=TRUE)), ]

Method 2: Use dplyr

library(dplyr)

new_df %
          group_by(across(everything())) %>%
          filter(n()==1)

The following examples show how to use each method in practice with the following data frame:

#create data frame
df frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 points=c(20, 20, 28, 14, 13, 18, 27, 13))

#view data frame
df

  team points
1    A     20
2    A     20
3    A     28
4    A     14
5    B     13
6    B     18
7    B     27
8    B     13

Example 1: Use Base R

The following code shows how to use functions from base R to remove duplicate rows from the data frame so that none are left:

#create new data frame that removes duplicates so none are left
new_df !(duplicated(df) | duplicated(df, fromLast=TRUE)), ]

#view new data frame
new_df

  team points
3    A     28
4    A     14
6    B     18
7    B     27

Notice that each of the duplicate rows have been removed from the data frame and none of the duplicates remain.

Example 2: Use dplyr

The following code shows how to use functions from the dplyr package in R to remove duplicate rows from the data frame so that none are left:

library(dplyr)

#create new data frame that removes duplicates so none are left
new_df %
          group_by(across(everything())) %>%
          filter(n()==1)

#view new data frame
new_df

# A tibble: 4 x 2
# Groups:   team, points [4]
  team  points
    
1 A         28
2 A         14
3 B         18
4 B         27

Notice that each of the duplicate rows have been removed from the data frame and none of the duplicates remain.

Also notice that this produces the same result as the previous method.

Note: For extremely large data frames, the dplyr method will be faster than the base R method.

Additional Resources

The following tutorials explain how to perform other common functions in R:

How to Remove Rows in R Based on Condition
How to Remove Rows with NA in One Specific Column in R

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories