Logistic regression is a method we can use to fit a regression model when the response variable is binary.
Logistic regression uses a method known as maximum likelihood estimation to find an equation of the following form:
log[p(X) / (1-p(X))] = β0 + β1X1 + β2X2 + … + βpXp
where:
- Xj: The jth predictor variable
- βj: The coefficient estimate for the jth predictor variable
The formula on the right side of the equation predicts the log odds of the response variable taking on a value of 1.
The following step-by-step example shows how to fit a logistic regression model in SAS.
Step 1: Create the Dataset
First, we’ll create a dataset that contains information on the following three variables for 18 students:
- Acceptance into a certain college (1 = yes, 0 = no)
- GPA (scale of 1 to 4)
- ACT score (scale of 1 to 36)
/*create dataset*/ data my_data; input acceptance gpa act; datalines; 1 3 30 0 1 21 0 2 26 0 1 24 1 3 29 1 3 34 0 3 31 1 2 29 0 1 21 1 2 21 0 1 15 1 3 32 1 4 31 1 4 29 0 1 24 1 4 29 1 3 21 1 4 34 ; run; /*view dataset*/ proc print data=my_data;
Step 2: Fit the Logistic Regression Model
Next, we’ll use proc logistic to fit the logistic regression model, using “acceptance” as the response variable and “gpa” and “act” as the predictor variables.
Note: We must specify descending so SAS knows to predict the probability that the response variable will take on a value of 1. By default, SAS predicts the probability that the response variable will take on a value of 0.
/*fit logistic regression model*/
proc logistic data=my_data descending;
model acceptance = gpa act;
run;
The first table of interest is titled Model Fit Statistics.
From this table we can see the AIC value of the model, which turns out to be 16.595. The lower the AIC value, the better a model is able to fit the data.
However, there is no threshold for what is considered a “good” AIC value. Rather, we use AIC to compare the fit of several models fit to the same dataset. The model with the lowest AIC value is generally considered the best.
The next table of interest is titled Testing Global Null Hypothesis: BETA=0.
From this table we can see the Likelihood Ratio Chi-square value of 13.4620 with a corresponding p-value of 0.0012.
Since this p-value is less than .05, this tells us that the logistic regression model as a whole is statistically significant.
Next, we can analyze the coefficient estimates in the table titled Analysis of Maximum Likelihood Estimates.
From this table we can see the coefficients for gpa and act, which indicate the average change in log odds of getting accepted into the university for a one unit increase in each variable.
For example:
- A one-unit increase in GPA value is associated with an average increase of 2.9665 in the log odds of getting accepted into the university.
- A one-unit increase in ACT score is associated with an average decrease of 0.1145 in the log odds of getting accepted into the university.
The corresponding p-values in the output also give us an idea of how effective each predictor variable is at predicting the probability of getting accepted:
- P-value of GPA: 0.0679
- P-value of ACT: 0.6289
This tells us that GPA seems to be a statistically significant predictor of university acceptance while ACT score seems to not be statistically significant.
Additional Resources
The following tutorials explain how to fit other regression models in SAS:
How to Perform Simple Linear Regression in SAS
How to Perform Multiple Linear Regression in SAS