A Chi-Square Goodness of Fit Test is used to determine whether or not a categorical variable follows a hypothesized distribution.
This tutorial explains how to perform a Chi-Square Goodness of Fit Test in Excel.
Example: Chi-Square Goodness of Fit Test in Excel
A shop owner claims that an equal number of customers come into his shop each weekday. To test this hypothesis, an independent researcher records the number of customers that come into the shop on a given week and finds the following:
- Monday: 50 customers
- Tuesday: 60 customers
- Wednesday: 40 customers
- Thursday: 47 customers
- Friday: 53 customers
We will use the following steps to perform a Chi-Square goodness of fit test to determine if the data is consistent with the shop owner’s claim.
Step 1: Input the data.
First, we will input the data values for the expected number of customers each day in one column and the observed number of customers each day in another column:
Note: There were 250 customers total. Thus, if the shop owner expects an equal number to come into the shop each day then he would expect 50 customers per day.
Step 2: Find the difference between the observed and expected values.
The Chi-Square test statistic for the Goodness of Fit test is X2 = Σ(O-E)2 / E
where:
- Σ: is a fancy symbol that means “sum”
- O: observed value
- E: expected value
The following formula shows how to calculate (O-E)2 / E for each row:
Step 3: Calculate the Chi-Square test statistic and the corresponding p-value.
Lastly, we will calculate the Chi-Square test statistic along with the corresponding p-value using the following formulas:
Note: The Excel function CHISQ.DIST.RT(x, deg_freedom) returns the right-tailed probability of the Chi-Square distribution associated with a test statistic x and a certain degrees of freedom. The degrees of freedom is calculated as n-1. In this case, deg_freedom = 5 – 1 = 4.
Step 4: Interpret the results.
The X2 test statistic for the test is 4.36 and the corresponding p-value is 0.3595. Since this p-value is not less than 0.05, we fail to reject the null hypothesis. This means we do not have sufficient evidence to say that the true distribution of customers is different from the distribution that the shop owner claimed.