6.6 C
London
Friday, December 27, 2024
HomeStatistics TutorialRR: How to Merge Data Frames Based on Multiple Columns

R: How to Merge Data Frames Based on Multiple Columns

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

You can use the following basic syntax to merge two data frames in R based on multiple columns:

merge(df1, df2, by.x=c('col1', 'col2'), by.y=c('col1', 'col2'))

The following example shows how to use this syntax in practice.

Example: Merge Data Frames on Multiple Columns

Suppose we have the following two data frames in R:

#define data frames
df1 = data.frame(playerID=c(1, 2, 3, 4, 5, 6),
                 team=c('A', 'B', 'B', 'B', 'C', 'C'),
                 points=c(19, 22, 25, 29, 34, 39))

df2 = data.frame(playerID=c(1, 2, 3, 4),
                 tm=c('A', 'B', 'B', 'B'),
                 rebounds=c(7, 8, 8, 14))

#view first data frame
df1

  playerID team points
1        1    A     19
2        2    B     22
3        3    B     25
4        4    B     29
5        5    C     34
6        6    C     39

#view second data frame
df2 

  playerID tm rebounds
1        1  A        7
2        2  B        8
3        3  B        8
4        4  B       14

Notice that the two data frames share the playerID column, but the team columns have different names in each data frame:

  • The first data frame has column ‘team
  • The second data frame has column ‘tm

In order to merge these data frames based on the playerID and the team columns, we need to use the by.x and by.y arguments.

We can use the following code to perform this merge:

#merge two data frames
merged = merge(df1, df2, by.x=c('playerID', 'team'), by.y=c('playerID', 'tm'))

#view merged data frame
merged

  playerID team points rebounds
1        1    A     19        7
2        2    B     22        8
3        3    B     25        8
4        4    B     29       14

The final merged data frame contains data for the four players that belong to both original data frames.

Additional Resources

The following tutorials explain how to perform other common functions related to data frames in R:

How to Do a Left Join in R
How to Perform a VLOOKUP in R
How to Append Rows to Data Frame in R

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories