8.3 C
London
Sunday, March 9, 2025
HomePandas in PythonGeneral Functions in PythonHow to Create a Lag Column in Pandas (With Examples)

How to Create a Lag Column in Pandas (With Examples)

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

You can use the shift() function in pandas to create a column that displays the lagged values of another column.

This function uses the following basic syntax:

df['lagged_col1'] = df['col1'].shift(1)

Note that the value in the shift() function indicates the number of values to calculate the lag for.

The following example shows how to use this syntax in practice.

Example: Create a Lag Column in Pandas

Suppose we have the following pandas DataFrame that shows the sales made by some store on 10 consecutive days:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'day': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                   'sales': [18, 10, 14, 13, 19, 24, 25, 29, 15, 18]})
#view DataFrame
print(df)

   day  sales
0    1     18
1    2     10
2    3     14
3    4     13
4    5     19
5    6     24
6    7     25
7    8     29
8    9     15
9   10     18

We can use the shift() function to create a lag column that displays the sales for the previous day for each row:

#add column that represents lag of sales column
df['sales_previous_day'] = df['sales'].shift(1)

#view updated DataFrame
print(df)

   day  sales  sales_previous_day
0    1     18                 NaN
1    2     10                18.0
2    3     14                10.0
3    4     13                14.0
4    5     19                13.0
5    6     24                19.0
6    7     25                24.0
7    8     29                25.0
8    9     15                29.0
9   10     18                15.0

Here’s how to interpret the output:

  • The first value in the lag column is NaN since there is no prior value in the sales column.
  • The second value in the lag column is 18 since this is the prior value in the sales column.
  • The third value in the lag column is 10 since this is the prior value in the sales column.

And so on.

Note that we can also add multiple lag columns to the DataFrame if we’d like:

#add two lag columns
df['sales_previous_day'] = df['sales'].shift(1)
df['sales_previous_day2'] = df['sales'].shift(2) 

#view updated DataFrame
print(df)

   day  sales  sales_previous_day  sales_previous_day2
0    1     18                 NaN                  NaN
1    2     10                18.0                  NaN
2    3     14                10.0                 18.0
3    4     13                14.0                 10.0
4    5     19                13.0                 14.0
5    6     24                19.0                 13.0
6    7     25                24.0                 19.0
7    8     29                25.0                 24.0
8    9     15                29.0                 25.0
9   10     18                15.0                 29.0

You can use the same general approach to add as many lag columns as you’d like.

Note: To create a lead column, simply use negative values in the shift() function.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Drop Columns in Pandas
How to Exclude Columns in Pandas
How to Apply a Function to Selected Columns in Pandas
How to Change the Order of Columns in Pandas DataFrame

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories