You can use the following basic syntax to calculate the correlation between two variables by group in R:
library(dplyr)
df %>%
group_by(group_var) %>%
summarize(cor=cor(var1, var2))
This particular syntax calculates the correlation between var1 and var2, grouped by group_var.
The following example shows how to use this syntax in practice.
Example: Calculate Correlation By Group in R
Suppose we have the following data frame that contains information about basketball players on various teams:
#create data frame df frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'), points=c(18, 22, 19, 14, 14, 11, 20, 28), assists=c(2, 7, 9, 3, 12, 10, 14, 21)) #view data frame df team points assists 1 A 18 2 2 A 22 7 3 A 19 9 4 A 14 3 5 B 14 12 6 B 11 10 7 B 20 14 8 B 28 21
We can use the following syntax from the dplyr package to calculate the correlation between points and assists, grouped by team:
library(dplyr)
df %>%
group_by(team) %>%
summarize(cor=cor(points, assists))
# A tibble: 2 x 2
team cor
1 A 0.603
2 B 0.982
From the output we can see:
- The correlation coefficient between points and assists for team A is .603.
- The correlation coefficient between points and assists for team B is .982.
Since both correlation coefficients are positive, this tells us that the relationship between points and assists for both teams is positive.
Related: What is Considered to Be a “Strong” Correlation?
Additional Resources
The following tutorials explain how to perform other common operations in R:
How to Count Unique Values by Group in R
How to Calculate the Sum by Group in R
How to Calculate the Mean by Group in R
How to Calculate Summary Statistics by Group in R