15.1 C
London
Friday, July 5, 2024
HomeRFix Common Errors in RHow to Handle: glm.fit: fitted probabilities numerically 0 or 1 occurred

How to Handle: glm.fit: fitted probabilities numerically 0 or 1 occurred

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

One warning message you may encounter in R is:

Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred 

This warning occurs when you fit a logistic regression model and the predicted probabilities of one or more observations in your data frame are indistinguishable from 0 or 1.

It’s worth noting that this is a warning message and not an error. Even if you receive this error, your logistic regression model will still be fit, but it may be worth analyzing the original data frame to see if there are any outliers causing this warning message to appear.

This tutorial shares how to address this warning message in practice.

How to Reproduce the Warning

Suppose we fit a logistic regression model to the following data frame in R:

#create data frame
df frame(y = c(0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1),
                 x1 = c(3, 3, 4, 4, 3, 2, 5, 8, 9, 9, 9, 8, 9, 9, 9),
                 x2 = c(8, 7, 7, 6, 5, 6, 5, 2, 2, 3, 4, 3, 7, 4, 4))

#fit logistic regression model
model #view model summary
summary(model)

Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred 

Call:
glm(formula = y ~ x1 + x2, family = binomial, data = df)

Deviance Residuals: 
       Min          1Q      Median          3Q         Max  
-1.729e-05  -2.110e-08   2.110e-08   2.110e-08   1.515e-05  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept)    -75.205 307338.933       0        1
x1              13.309  28512.818       0        1
x2              -2.793  37342.280       0        1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2.0728e+01  on 14  degrees of freedom
Residual deviance: 5.6951e-10  on 12  degrees of freedom
AIC: 6

Number of Fisher Scoring iterations: 24

Our logistic regression model is successfully fit to the data, but we receive a warning message that fitted probabilities numerically 0 or 1 occurred.

If we use the fitted logistic regression model to make predictions on the response value of the observations in the original data frame, we can see that nearly all of the predicted probabilities are indistinguishable from 0 and 1:

#use fitted model to predict response values
df$y_pred = predict(model, df, type="response")

#view updated data frame
df

   y x1 x2       y_pred
1  0  3  8 2.220446e-16
2  0  3  7 2.220446e-16
3  0  4  7 2.220446e-16
4  0  4  6 2.220446e-16
5  0  3  5 2.220446e-16
6  0  2  6 2.220446e-16
7  0  5  5 1.494599e-10
8  1  8  2 1.000000e+00
9  1  9  2 1.000000e+00
10 1  9  3 1.000000e+00
11 1  9  4 1.000000e+00
12 1  8  3 1.000000e+00
13 1  9  7 1.000000e+00
14 1  9  4 1.000000e+00
15 1  9  4 1.000000e+00

How to Handle the Warning

There are three ways to deal with this warning message:

(1) Ignore it. 

In some cases, you can simply ignore this warning message because it doesn’t necessarily indicate that something is wrong with the logistic regression model. It simply means that one or more observations in the data frame have predicted values indistinguishable from 0 or 1.

(2) Increase the sample size.

In other cases, this warning message appears when you’re working with small data frames where there’s simply not enough data to provide a reliable model fit. To address this error, simply increase the sample size of observations that you feed into the model.

(3) Remove outliers.

In other cases, this error occurs when there are outliers in the original data frame and where only a small number of observations have fitted probabilities close to 0 or 1. By removing these outliers, the warning message often goes away.

Additional Resources

The following tutorials explain how to handle other warnings and errors in R:

How to Fix in R: invalid model formula in ExtractVars
How to Fix in R: argument is not numeric or logical: returning na
How to Fix: randomForest.default(m, y, …) : Na/NaN/Inf in foreign function call

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories