21.3 C
London
Thursday, July 10, 2025
HomeStataCorrelations in StataCorrelations in Stata: Pearson, Spearman, and Kendall

Correlations in Stata: Pearson, Spearman, and Kendall

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

In statistics, correlation refers to the strength and direction of a relationship between two variables. The value of a correlation coefficient can range from -1 to 1, with -1 indicating a perfect negative relationship, 0 indicating no relationship, and 1 indicating a perfect positive relationship.

There are three common ways to measure correlation:

Pearson Correlation: Used to measure the correlation between two continuous variables. (e.g. height and weight)

Spearman Correlation: Used to measure the correlation between two ranked variables. (e.g. rank of a student’s math exam score vs. rank of their science exam score in a class)

Kendall’s Correlation: Used when you wish to use Spearman Correlation but the sample size is small and there are many tied ranks.

This tutorial explains how to find all three types of correlations in Stata.

Loading the Data

For each of the following examples we will use a dataset called auto. You can load this dataset by typing the following into the Command box:

use https://www.stata-press.com/data/r13/auto

We can get a quick look at the dataset by typing the following into the Command box:

summarize

Summarize example command in Stata

We can see that there are 12 total variables in the dataset.

How to Find Pearson Correlation in Stata

We can find the Pearson Correlation Coefficient between the variables weight and length by using the pwcorr command:

pwcorr weight length

Pearson correlation in Stata

The Pearson Correlation coefficient between these two variables is 0.9460. To determine if this correlation coefficient is significant, we can find the p-value by using the sig command:

pwcorr weight length, sig

Pearson correlation significance in Stata

The p-value is 0.000. Since this is less than 0.05, the correlation between these two variables is statistically significant.

To find the Pearson Correlation Coefficient for multiple variables, simply type in a list of variables after the pwcorr command:

pwcorr weight length displacement, sig

Pearson Correlation for multiple variables in Stata

Here is how to interpret the output:

  • Pearson Correlation between weight and length = 0.9460 | p-value = 0.000
  • Pearson Correlation between weight and displacement = 0.8949 | p-value = 0.000
  • Pearson Correlation between displacement and length = 0.8351 | p-value = 0.000

How to Find Spearman Correlation in Stata

We can find the Spearman Correlation Coefficient between the variables trunk and rep78 by using the spearman command:

spearman trunk rep78

Spearman correlation in Stata

Here is how to interpret the output:

  • Number of obs: This is the number of pairwise observations used to calculate the Spearman Correlation Coefficient. Because there were some missing values for the variable rep78, Stata used only 69 (rather than the full 74) pairwise observations.
  • Spearman’s rho: This is the Spearman correlation coefficient. In this case, it’s -0.2235, indicating there is a negative correlation between the two variables. As one increases, the other tends to decrease.
  • Prob > |t|: This is the p-value associated with the hypothesis test. In this case, the p-value is 0.0649, which indicates there is not a statistically significant correlation between the two variables at α = 0.05.

We can find the Spearman Correlation Coefficient for multiple variables by simply typing more variables after the spearman command. We can find the correlation coefficient and the corresponding p-value for each pairwise correlation by using the stats(rho p) command:

spearman trunk rep78 gear_ratio, stats(rho p)

Spearman correlation for multiple variables in Stata

Here is how to interpret the output:

  • Spearman Correlation between trunk and rep78 = -0.2235 | p-value = 0.0649
  • Spearman Correlation between trunk and gear_ratio = -0.5187 | p-value = 0.0000
  • Spearman Correlation between gear_ratio and rep78 = 0.4275 | p-value = 0.0002

How to Find Kendall’s Correlation in Stata

We can find Kendall’s Correlation Coefficient between the variables trunk and rep78 by using the ktau command:

ktau trunk rep78

Kendall's correlation in Stata

Here is how to interpret the output:

  • Number of obs: This is the number of pairwise observations used to calculate Kendall’s Correlation Coefficient. Because there were some missing values for the variable rep78, Stata used only 69 (rather than the full 74) pairwise observations.
  • Kendall’s tau-b: This is Kendall’s correlation coefficient between the two variables. We typically use this value instead of tau-a because tau-b makes adjustments for ties. In this case, tau-b = -0.1752, indicating a negative correlation between the two variables.
  • Prob > |z|: This is the p-value associated with the hypothesis test. In this case, the p-value is 0.0662, which indicates there is not a statistically significant correlation between the two variables at α = 0.05.

We can find Kendall’s Correlation Coefficient for multiple variables by simply typing more variables after the ktau command. We can find the correlation coefficient and the corresponding p-value for each pairwise correlation by using the stats(taub p) command:

ktau trunk rep78 gear_ratio, stats(taub p)

Kendall's Tau for multiple variables in Stata

  • Kendall’s Correlation between trunk and rep78 = -0.1752 | p-value = 0.0662
  • Kendall’s Correlation between trunk and gear_ratio = -0.3753 | p-value = 0.0000
  • Kendall’s Correlation between gear_ratio and rep78 = 0.3206 | p-value = 0.0006

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories