How to Create a Confusion Matrix in R (Step-by-Step)

Logistic regression is a type of regression we can use when the response variable is binary.

One common way to evaluate the quality of a logistic regression model is to create a confusion matrix, which is a 2×2 table that shows the predicted values from the model vs. the actual values from the test dataset.

The following step-by-step example shows how to create a confusion matrix in R.

Step 1: Fit the Logistic Regression Model

For this example we’ll use the Default dataset from the ISLR package. We’ll use student status, bank balance, and annual income to predict the probability that a given individual defaults on their loan.

The following code shows how to fit a logistic regression model to this dataset:

#load necessary packages
library(caret)
library(InformationValue)
library(ISLR)

#load dataset
data #split dataset into training and testing set
set.seed(1)
sample TRUE, prob=c(0.7,0.3))
train #fit logistic regression model
model binomial", data=train)

Step 2: Create the Confusion Matrix

Next, we’ll use the confusionMatrix() function from the caret package to create a confusion matrix:

#use model to predict probability of default
predicted 
#convert defaults from "Yes" and "No" to 1's and 0's
test$default Yes", 1, 0)

#find optimal cutoff probability to use to maximize accuracy
optimal 
#create confusion matrix
confusionMatrix(test$default, predicted)

     0  1
0 2912 64
1   21 39

Step 3: Evaluate the Confusion Matrix

We can also calculate the following metrics using the confusion matrix:

Sensitivity: The “true positive rate” – the percentage of individuals the model correctly predicted would default.
Specificity: The “true negative rate” – the percentage of individuals the model correctly predicted would not default.
Total misclassification rate: The percentage of total incorrect classifications made by the model.

The following code shows how to calculate these metrics:

#calculate sensitivity
sensitivity(test$default, predicted)

[1] 0.3786408

#calculate specificity
specificity(test$default, predicted)

[1] 0.9928401

#calculate total misclassification error rate
misClassError(test$default, predicted, threshold=optimal)

[1] 0.027

The total misclassification error rate is 2.7% for this model.

In general, the lower this rate the better the model is able to predict outcomes, so this particular model turns out to be very good at predicting whether an individual will default or not.

Highlights of the 2023 Union Budget: Announcements for 15 Key Sectors

Gold Prices May Rise as Import Duty on Gold raised by 5%

Relief to MSMEs as Mandatory GST Registration waived for online sellers

GST Council Meet Highlights, Full List of Items to get Costlier

Highlights of the 2023 Union Budget: Announcements for 15 Key Sectors

Gold Prices May Rise as Import Duty on Gold raised by 5%

Relief to MSMEs as Mandatory GST Registration waived for online sellers

GST Council Meet Highlights, Full List of Items to get Costlier

Learn About Opening an Automobile Repair Shop in India

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

Income Splitting in Canada for 2023

Can I Deduct Home Office Expenses on my Tax Return 2023?

Canadian Tax – Personal Tax Deadline 2022

Step 1: Fit the Logistic Regression Model

Step 2: Create the Confusion Matrix

Step 3: Evaluate the Confusion Matrix

Learn About Opening an Automobile Repair Shop in India

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

Income Splitting in Canada for 2023

Can I Deduct Home Office Expenses on my Tax Return 2023?

ABOUT US

Latest

Learn About Opening an Automobile Repair Shop in India

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

Income Splitting in Canada for 2023

Popular

How to Create a Stem-and-Leaf Plot in SPSS

How to Create a Correlation Matrix in SPSS

How to Add Target Line to Graph in Excel

Sitemap