13.7 C
London
Monday, July 7, 2025
HomeSoftware TutorialsPythonHow to Resample Time Series Data in Python (With Examples)

How to Resample Time Series Data in Python (With Examples)

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

To resample time series data means to summarize or aggregate the data by a new time period.

We can use the following basic syntax to resample time series data in Python:

#find sum of values in column1 by month
weekly_df['column1'] = df['column1'].resample('M').sum()

#find mean of values in column1 by week
weekly_df['column1'] = df['column1'].resample('W').mean() 

Note that we can resample the time series data by various time periods, including:

  • S: Seconds
  • min: Minutes
  • H: Hours
  • D: Day
  • W: Week
  • M: Month
  • Q: Quarter
  • A: Year

The following example shows how to resample time series data in practice.

Example: Resample Time Series Data in Python

Suppose we have the following pandas DataFrame that shows the total sales made each hour by some company during a one-year period:

import pandas as pd
import numpy as np

#make this example reproducible
np.random.seed(0)

#create DataFrame with hourly index
df = pd.DataFrame(index=pd.date_range('2020-01-06', '2020-12-27', freq='h'))

#add column to show sales by hour
df['sales'] = np.random.randint(low=0, high=20, size=len(df.index))

#view first five rows of DataFrame
df.head()

	             sales
2020-01-06 00:00:00	12
2020-01-06 01:00:00	15
2020-01-06 02:00:00	0
2020-01-06 03:00:00	3
2020-01-06 04:00:00	3

If we create a line plot to visualize the sales data, it would look like this:

import matplotlib.pyplot as plt

#plot time series data
plt.plot(df.index, df.sales, linewidth=3)

This plot is difficult to interpret, so we may instead summarize the sales data by week:

#create new DataFrame
weekly_df = pd.DataFrame()

#create 'sales' column that summarizes total sales by week
weekly_df['sales'] = df['sales'].resample('W').sum()

#view first five rows of DataFrame
weekly_df.head()

                sales
2020-01-12	1519
2020-01-19	1589
2020-01-26	1540
2020-02-02	1562
2020-02-09	1614

This new DataFrame shows the sum of sales by week.

We can then create a time series plot using this weekly data:

import matplotlib.pyplot as plt

#plot weekly sales data
plt.plot(weekly_df.index, weekly_df.sales, linewidth=3)

This plot is much easier to read because we only plot sales data for 51 individual weeks as opposed to sales data for 8,545 individual hours in the first example.

Note: In this example, we summarized the sales data by week but we could also summarize by month or quarter if we would like to plot even fewer data points.

Additional Resources

The following tutorials explain how to perform other common operations in Python:

How to Plot a Time Series in Matplotlib
How to Plot a Time Series in Seaborn
How to Calculate MAPE of Time Series in Python

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories