22.1 C
London
Wednesday, July 23, 2025
HomeStatistics TutorialRHow to Use the setdiff Function in R (With Examples)

How to Use the setdiff Function in R (With Examples)

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

The setdiff() function in R can be used to find differences between two sets. This function uses the following syntax:

setdiff(x, y)

where:

  • x, y: Vectors or data frames containing a sequence of items

This tutorial provides several examples of how to use this function in practice.

Example 1: Setdiff with Numeric Vectors

The following code shows how to use setdiff() to identify all of the values in vector a that do not occur in vector b:

#define vectors
a #find all values in a that do not occur in b
setdiff(a, b)

[1]  9 10

There are two values that occur in vector a that do not occur in vector b9 and 10.

If we reverse the order of the vectors in the setdiff() function, we can instead identify all of the values in vector b that do not occur in vector a:

#find all values in b that do not occur in a
setdiff(b, a)

[1] 2 6

There are two values that occur in vector b that do not occur in vector a: 2 and 6.

Example 2: Setdiff with Character Vectors

The following code shows how to use setdiff() to identify all of the values in vector char1 that do not occur in vector char2:

#define character vectors
char1 #find all values in char1 that do not occur in char2
setdiff(char1, char2)

[1] "C" "D"

Example 3: Setdiff with Data Frames

The following code shows how to use setdiff() to identify all of the values in one data frame column that do not appear in the same column of a second data frame:

#define data frames
df1 frame(team=c('A', 'B', 'C', 'D'),
                 conference=c('West', 'West', 'East', 'East'),
                 points=c(88, 97, 94, 104))

df2 frame(team=c('A', 'B', 'C', 'D'),
                 conference=c('West', 'West', 'East', 'East'),
                 points=c(88, 97, 98, 99))

#find differences between the points columns in the two data frames
setdiff(df1$points, df2$points)

[1]  94 104

We can see that the values 94 and 104 occur in the points column of the first data frame, but not in the points column of the second data frame.

Additional Resources

How to Sum Specific Columns in R
How to Sum Specific Rows in R
How to Perform Partial String Matching in R

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories