3.1 C
London
Friday, December 20, 2024
HomeRProbability Distributions in RHow to Calculate the P-Value of an F-Statistic in R

How to Calculate the P-Value of an F-Statistic in R

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

An F-test produces an F-statistic. To find the p-value associated with an F-statistic in R, you can use the following command:

pf(fstat, df1, df2, lower.tail = FALSE)

  • fstat – the value of the f-statistic
  • df1 – degrees of freedom 1
  • df2 – degrees of freedom 2
  • lower.tail – whether or not to return the probability associated with the lower tail of the F distribution. This is TRUE by default.

For example, here is how to find the p-value associated with an F-statistic of 5, with degrees of freedom 1 = 3 and degrees of freedom 2 = 14:

pf(5, 3, 14, lower.tail = FALSE)

#[1] 0.01457807

One of the most common uses of an F-test is for testing the overall significance of a regression model. In the following example, we show how to calculate the p-value of the F-statistic for a regression model.

Example: Calculating p-value from F-statistic

Suppose we have a dataset that shows the total number of hours studied, total prep exams taken, and final exam score received for 12 different students:

#create dataset
data 
                   prep_exams = c(2, 6, 5, 2, 7, 4, 4, 2, 8, 4, 1, 3),
                   final_score = c(76, 88, 96, 90, 98, 80, 86, 89, 68, 75, 72, 76))

#view first six rows of dataset
head(data)

#  study_hours prep_exams final_score
#1           3          2          76
#2           7          6          88
#3          16          5          96
#4          14          2          90
#5          12          7          98
#6           7          4          80

Next, we can fit a linear regression model to this data using study hours and prep exams as the predictor variables and final score as the response variable. Then, we can view the output of the model:

#fit regression model
model #view output of the model
summary(model)

#Call:
#lm(formula = final_score ~ study_hours + prep_exams, data = data)
#
#Residuals:
#    Min      1Q  Median      3Q     Max 
#-13.128  -5.319   2.168   3.458   9.341 
#
#Coefficients:
#            Estimate Std. Error t value Pr(>|t|)    
#(Intercept)   66.990      6.211  10.785  1.9e-06 ***
#study_hours    1.300      0.417   3.117   0.0124 *  
#prep_exams     1.117      1.025   1.090   0.3041    
#---
#Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
#Residual standard error: 7.327 on 9 degrees of freedom
#Multiple R-squared:  0.5308,	Adjusted R-squared:  0.4265 
#F-statistic: 5.091 on 2 and 9 DF,  p-value: 0.0332

On the very last line of the output we can see that the F-statistic for the overall regression model is 5.091. This F-statistic has 2 degrees of freedom for the numerator and 9 degrees of freedom for the denominator. R automatically calculates that the p-value for this F-statistic is 0.0332.

In order to calculate this equivalent p-value ourselves, we could use the following code:

pf(5.091, 2, 9, lower.tail = FALSE)

#[1] 0.0331947

Notice that we get the same answer (but with more decimals displayed) as the linear regression output above.

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories