7.7 C
London
Sunday, March 9, 2025
HomeRDescriptive Statistics in RHow to Calculate Correlation By Group in R

How to Calculate Correlation By Group in R

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

You can use the following basic syntax to calculate the correlation between two variables by group in R:

library(dplyr)

df %>%
  group_by(group_var) %>%
  summarize(cor=cor(var1, var2))

This particular syntax calculates the correlation between var1 and var2, grouped by group_var.

The following example shows how to use this syntax in practice.

Example: Calculate Correlation By Group in R

Suppose we have the following data frame that contains information about basketball players on various teams:

#create data frame
df frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 points=c(18, 22, 19, 14, 14, 11, 20, 28),
                 assists=c(2, 7, 9, 3, 12, 10, 14, 21))

#view data frame
df

  team points assists
1    A     18       2
2    A     22       7
3    A     19       9
4    A     14       3
5    B     14      12
6    B     11      10
7    B     20      14
8    B     28      21

We can use the following syntax from the dplyr package to calculate the correlation between points and assists, grouped by team:

library(dplyr)

df %>%
  group_by(team) %>%
  summarize(cor=cor(points, assists))

# A tibble: 2 x 2
  team    cor
   
1 A     0.603
2 B     0.982

From the output we can see:

  • The correlation coefficient between points and assists for team A is .603.
  • The correlation coefficient between points and assists for team B is .982.

Since both correlation coefficients are positive, this tells us that the relationship between points and assists for both teams is positive.

Related: What is Considered to Be a “Strong” Correlation?

Additional Resources

The following tutorials explain how to perform other common operations in R:

How to Count Unique Values by Group in R
How to Calculate the Sum by Group in R
How to Calculate the Mean by Group in R
How to Calculate Summary Statistics by Group in R

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories