17.9 C
London
Thursday, July 24, 2025
HomeTidyverse in Rggplot2 in RHow to Label Outliers in Boxplots in ggplot2

How to Label Outliers in Boxplots in ggplot2

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

This tutorial provides a step-by-step example of how to label outliers in boxplots in ggplot2.

Step 1: Create the Data Frame

First, let’s create the following data frame that contains information on points scored by 60 different basketball players on three different teams:

#make this example reproducible
set.seed(1)

#create data frame
df frame(team=rep(c('A', 'B', 'C'), each=20),
                 player=rep(LETTERS[1:20], times=3),
                 points=round(rnorm(n=60, mean=30, sd=10), 2))

#view head of data frame
head(df)

  team player points
1    A      A  23.74
2    A      B  31.84
3    A      C  21.64
4    A      D  45.95
5    A      E  33.30
6    A      F  21.80

Note: We used the set.seed() function to ensure that this example is reproducible.

Step 2: Define a Function to Identify Outliers

In ggplot2, an observation is defined as an outlier if it meets one of the following two requirements:

  • The observation is 1.5 times the interquartile range less than the first quartile (Q1)
  • The observation is 1.5 times the interquartile range greater than the third quartile (Q3).

We can create the following function in R to label observations as outliers if they meet one of these two requirements:

find_outlier function(x) {
  return(x  quantile(x, .75) + 1.5*IQR(x))
}

Related: How to Interpret Interquartile Range

Step 3: Label Outliers in Boxplots in ggplot2

Next, we can use the following code to label outliers in boxplots in ggplot2:

library(ggplot2)
library(dplyr)

#add new column to data frame that indicates if each observation is an outlier
df %
        group_by(team) %>%
        mutate(outlier = ifelse(find_outlier(points), points, NA))

#create box plot of points by team and label outliers
ggplot(df, aes(x=team, y=points)) +
  geom_boxplot() +
  geom_text(aes(label=outlier), na.rm=TRUE, hjust=-.5)

label outliers in boxplots in ggplot2

Notice that two outliers are labeled in the plot.

The first outlier is a player on team A who scored 7.85 points and the other outlier is a player on team B who scored 10.11 points.

Note that we could also use a different variable to label these outliers.

For example, we could swap out points for player in the mutate() function to instead label the outliers based on the player name:

library(ggplot2)
library(dplyr)

#add new column to data frame that indicates if each observation is an outlier
df %
        group_by(team) %>%
        mutate(outlier = ifelse(find_outlier(points), player, NA))

#create box plot of points by team and label outliers
ggplot(df, aes(x=team, y=points)) +
  geom_boxplot() +
  geom_text(aes(label=outlier), na.rm=TRUE, hjust=-.5)

The outlier on team A now has a label of N and the outlier on team B now has a label of D, since these represent the player names who have outlier values for points.

Note: The hjust argument in geom_text() is used to push the label horizontally to the right so that it doesn’t overlap the dot in the plot.

Additional Resources

The following tutorials explain how to perform other common tasks in ggplot2:

How to Change Font Size in ggplot2
How to Remove a Legend in ggplot2
How to Rotate Axis Labels in ggplot2

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories