The unite() function from the tidyr package can be used to unite multiple data frame columns into a single column.
This function uses the following basic syntax:
unite(data, col, into, sep)
where:
- data: Name of the data frame
- col: Name of the new united column
- … : Vector of names for the columns to unite
- sep: How to join the data in the new united column
The following examples show how to use this function in practice.
Example 1: Unite Two Columns into One Column
Suppose we have the following data frame in R:
#create data frame df frame(player=c('A', 'A', 'B', 'B', 'C', 'C'), year=c(1, 2, 1, 2, 1, 2), points=c(22, 29, 18, 11, 12, 19), assists=c(2, 3, 6, 8, 5, 2)) #view data frame df player year points assists 1 A 1 22 2 2 A 2 29 3 3 B 1 18 6 4 B 2 11 8 5 C 1 12 5 6 C 2 19 2
We can use the unite() function to unite the “points” and “assists” columns into a single column:
library(tidyr) #unite points and assists columns into single column unite(df, col='points-assists', c('points', 'assists'), sep='-') player year points-assists 1 A 1 22-2 2 A 2 29-3 3 B 1 18-6 4 B 2 11-8 5 C 1 12-5 6 C 2 19-2
Example 2: Unite More Than Two Columns
Suppose we have the following data frame in R:
#create data frame df2 frame(player=c('A', 'A', 'B', 'B', 'C', 'C'), year=c(1, 2, 1, 2, 1, 2), points=c(22, 29, 18, 11, 12, 19), assists=c(2, 3, 6, 8, 5, 2), blocks=c(2, 3, 3, 2, 1, 0)) #view data frame df2 player year points assists blocks 1 A 1 22 2 2 2 A 2 29 3 3 3 B 1 18 6 3 4 B 2 11 8 2 5 C 1 12 5 1 6 C 2 19 2 0
We can use the unite() function to unite the points, assists, and blocks column into a single column:
library(tidyr) #unite points, assists, and blocks column into single column unite(df2, col='stats', c('points', 'assists', 'blocks'), sep='/') player year stats 1 A 1 22/2/2 2 A 2 29/3/3 3 B 1 18/6/3 4 B 2 11/8/2 5 C 1 12/5/1 6 C 2 19/2/0
Additional Resources
The goal of the tidyr package is to create “tidy” data, which has the following characteristics:
- Every column is a variable.
- Every row is an observation.
- Every cell is a single value.
The tidyr package uses four core functions to create tidy data:
1. The spread() function.
2. The gather() function.
3. The separate() function.
4. The unite() function.
If you can master these four functions, you will be able to create “tidy” data from any data frame.