Pearson residuals are used in a Chi-Square Test of Independence to analyze the difference between observed cell counts and expected cell counts in a contingency table.
The formula to calculate a Pearson residual is:
rij = (Oij – Eij) / √Eij
where:
- rij: The Pearson residual for the cell in the ith column and jth row
- Oij: The observed value for the cell in the ith column and jth row
- Eij: The expected value for the cell in the ith column and jth row
A similar metric is the Standardized (adjusted) Pearson residual, which is calculated as:
rij = (Oij – Eij) / √Eij(1-ni+)(1-n+j)
where:
- rij: The Pearson residual for the cell in the ith column and jth row
- Oij: The observed value for the cell in the ith column and jth row
- Eij: The expected value for the cell in the ith column and jth row
- pi+: The row total divided by the grand total
- p+j: The column total divided by the grand total
Standardized Pearson residuals are normally distributed with a mean of 0 and standard deviation of 1. Any standardized Pearson residual with an absolute value above certain thresholds (e.g. 2 or 3) indicates a lack of fit.
The following example shows how to calculate Pearson residuals in practice.
Example: Calculating Pearson Residuals
Suppose researchers want to use a Chi-Square Test of Independence to determine whether or not gender is associated with political party preference.
They decide to take a simple random sample of 500 voters and survey them on their political party preference.
The following contingency table shows the results of the survey:
Republican | Democrat | Independent | Total | |
Male | 120 | 90 | 40 | 250 |
Female | 110 | 95 | 45 | 250 |
Total | 230 | 185 | 85 | 500 |
Before we calculate the Pearson residuals, we must first calculate the expected counts for each cell in the contingency table. We can use the following formula to do so:
Expected value = (row sum * column sum) / table sum.
For example, the expected value for Male Republicans is: (230*250) / 500 = 115.
We can repeat this formula to obtain the expected value for each cell in the table:
Republican | Democrat | Independent | Total | |
Male | 115 | 92.5 | 42.5 | 250 |
Female | 115 | 92.5 | 42.5 | 250 |
Total | 230 | 185 | 85 | 500 |
Next, we can calculate the Pearson residual for each cell in the table.
For example, the Pearson residual for the cell that contains Male Republicans would be calculated as:
- rij = (Oij – Eij) / √Eij
- rij = (120 – 115) / √115
- rij = 0.466
We can repeat this formula to obtain the Pearson residual for each cell in the table:
Republican | Democrat | Independent | |
Male | 0.446 | -0.259 | -0.383 |
Female | -0.446 | 0.259 | 0.383 |
Next, we can calculate the Standardized Pearson residual for each cell in the table.
For example, the Standardized Pearson residual for the cell that contains Male Republicans would be calculated as:
- rij = (Oij – Eij) / √Eij(1-pi+)(1-p+j)
- rij = (120 – 115) / √115(1-250/500)(1-230/500)
- rij = 0.897
We can repeat this formula to obtain the Standardized Pearson residual for each cell in the table:
Republican | Democrat | Independent | |
Male | 0.897 | -0.463 | -0.595 |
Female | -0.897 | 0.463 | 0.595 |
We can see that none of the Pearson Standardized Residuals have an absolute value greater than 3, which indicates that none of the cells contribute to a significant lack of fit.
If we use this online calculator to perform a Chi-Square Test of Independence, we’ll find that the p-value of the test is 0.649198.
Since this p-value is not less than .05, we do not have sufficient evidence to say that there is an association between gender and political party preference.
Additional Resources
The following tutorials explain how to perform a Chi-Square Test of Independence using different statistical software:
An Introduction to the Chi-Square Test of Independence
How to Perform a Chi-Square Test of Independence in Excel
How to Perform a Chi-Square Test of Independence in R
Chi-Square Test of Independence Calculator