You can use the following syntax to group data by hour and perform some aggregation in R:
library(dplyr) library(lubridate) #group by hours in time column and calculate sum of sales df %>% group_by(time=floor_date(time, '1 hour')) %>% summarize(sum_sales=sum(sales))
This particular example groups the values by hour in a column called time and then calculates the sum of values in the sales column for each hour.
The following example shows how to use this syntax in practice.
Example: Group Data by Hour in R
Suppose we have the following data frame that shows the number of sales made at various times throughout the day for some store:
#create data frame
df frame(time=as.POSIXct(c('2022-01-01 01:14:00', '2022-01-01 01:24:15',
'2022-01-01 02:52:19', '2022-01-01 02:54:00',
'2022-01-01 04:05:10', '2022-01-01 05:35:09')),
sales=c(18, 20, 15, 14, 10, 9))
#view data frame
df
time sales
1 2022-01-01 01:14:00 18
2 2022-01-01 01:24:15 20
3 2022-01-01 02:52:19 15
4 2022-01-01 02:54:00 14
5 2022-01-01 04:05:10 10
6 2022-01-01 05:35:09 9
We can use the following syntax to group the time column by hours and calculate the sum of sales for each hour:
library(dplyr) library(lubridate) #group by hours in time column and calculate sum of sales df %>% group_by(time=floor_date(time, '1 hour')) %>% summarize(sum_sales=sum(sales)) `summarise()` ungrouping output (override with `.groups` argument) # A tibble: 4 x 2 time sum_sales 1 2022-01-01 01:00:00 38 2 2022-01-01 02:00:00 29 3 2022-01-01 04:00:00 10 4 2022-01-01 05:00:00 9
From the output we can see:
- A total of 38 sales were made during the first hour.
- A total of 29 sales were made during the second hour.
- A total of 10 sales were made during the fourth hour.
- A total of 9 sales were made during the fifth hour.
Note that we can also perform some other aggregation.
For example, we could calculate the mean number of sales per hour:
library(dplyr) library(lubridate) #group by hours in time column and calculate mean of sales df %>% group_by(time=floor_date(time, '1 hour')) %>% summarize(mean_sales=mean(sales)) `summarise()` ungrouping output (override with `.groups` argument) # A tibble: 4 x 2 time mean_sales 1 2022-01-01 01:00:00 19 2 2022-01-01 02:00:00 14.5 3 2022-01-01 04:00:00 10 4 2022-01-01 05:00:00 9
From the output we can see:
- The mean sales made in the first hour were 19.
- The mean sales made in the second hour were 14.5.
- The mean sales made in the fourth hour were 10.
- The mean sales made in the fifth hour were 9.
Feel free to group your own data frame by hour and calculate any specific metric you’d like by modifying the metric in the summarize() function.
Additional Resources
The following tutorials explain how to perform other common operations in R:
How to Group Data by Month in R
How to Group Data by Week in R