11.1 C
London
Sunday, July 7, 2024
HomeStatistics TutorialRHow to Create Categorical Variable from Continuous in R

How to Create Categorical Variable from Continuous in R

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

You can use the cut() function in R to create a categorical variable from a continuous one.

This function uses the following basic syntax:

df$cat_variable A', 'B', 'C', 'D'))

Note that breaks specifies the values to split the continuous variable on and labels specifies the label to give to the values of the new categorical variable.

The following example shows how to use this syntax in practice.

Example: Create Categorical Variable from Continuous in R

Suppose we have the following data frame in R:

#create data frame
df frame(team=c('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'),
                 points=c(78, 82, 86, 94, 99, 104, 109, 110))

#view data frame
df

  team points
1    A     78
2    B     82
3    C     86
4    D     94
5    E     99
6    F    104
7    G    109
8    H    110

Currently points is a continuous variable.

We can use the cut() function to cut it into a categorical variable:

#add new column that cuts 'points' into categories
df$cat #view updated data frame
df

  team points   cat
1    A     78   Bad
2    B     82    OK
3    C     86    OK
4    D     94  Good
5    E     99  Good
6    F    104 Great
7    G    109 Great
8    H    110 Great

We created a new categorical variable called cat that classifies each team in the data frame as Bad, OK, Good, or Great based on their points.

We can use the class() function to check the class of this new variable:

#check class of 'cat' column
class(df$cat)

[1] "factor"

We can see that the cat variable is a factor.

We can also use the table() function to count the occurrences of each category in the cat variable:

#count occurrences of each category in 'cat' variable
table(df$cat)
  Bad    OK  Good Great 
    1     2     2     3 

Note that if you don’t provide a labels argument to the cut() function, R will simply use the interval range of values as the labels:

#add new column that cuts 'points' into categories
df$cat #view updated data frame
df

  team points       cat
1    A     78   (70,80]
2    B     82   (80,90]
3    C     86   (80,90]
4    D     94  (90,100]
5    E     99  (90,100]
6    F    104 (100,110]
7    G    109 (100,110]
8    H    110 (100,110]

In some cases, you may actually prefer this to using custom labels.

Additional Resources

The following tutorials explain how to perform other common operations in R:

How to Convert Categorical Variables to Numeric in R
How to Create Categorical Variables in R
How to Plot Categorical Data in R

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories