15.1 C
London
Friday, July 5, 2024
HomeStatistics TutorialRHow to Plot a Decision Tree in R (With Example)

How to Plot a Decision Tree in R (With Example)

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

In machine learning, a decision tree is a type of model that uses a set of predictor variables to build a decision tree that predicts the value of a response variable.

The easiest way to plot a decision tree in R is to use the prp() function from the rpart.plot package.

The following example shows how to use this function in practice.

Example: Plotting a Decision Tree in R

For this example, we’ll use the Hitters dataset from the ISLR package, which contains various information about 263 professional baseball players.

We will use this dataset to build a regression tree that uses home runs and years played to predict the salary of a given player.

The following code shows how to fit this regression tree and how to use the prp() function to plot the tree:

library(ISLR)
library(rpart)
library(rpart.plot)

#build the initial decision tree
tree control(cp=.0001))

#identify best cp value to use
best min(tree$cptable[,"xerror"]),"CP"]

#produce a pruned tree based on the best cp value
pruned_tree prune(tree, cp=best)

#plot the pruned tree
prp(pruned_tree)

Note that we can also customize the appearance of the decision tree by using the faclen, extra, roundint, and digits arguments within the prp() function:

#plot decision tree using custom arguments
prp(pruned_tree,
    faclen=0, #use full names for factor labels
    extra=1, #display number of observations for each terminal node
    roundint=F, #don't round to integers in output
    digits=5) #display 5 decimal places in output

plotting a decision tree in R

We can see that the tree has six terminal nodes.

Each terminal node shows the predicted salary of players in that node along with the number of observations from the original dataset that belong to that note.

For example, we can see that in the original dataset there were 90 players with less than 4.5 years of experience and their average salary was $225.83k.

Interpreting a regression tree in R

We can also use the tree to predict a given player’s salary based on their years of experience and average home runs.

For example, a player who has 7 years of experience and 4 average home runs has a predicted salary of $502.81k.

Regression tree example in R

This is one advantage of using a decision tree: We can easily visualize and interpret the results.

Additional Resources

The following tutorials provide additional information about decision trees:

An Introduction to Classification and Regression Trees
Decision Tree vs. Random Forests: What’s the Difference?
How to Fit Classification and Regression Trees in R

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories