4.5 C
London
Thursday, December 19, 2024
HomePandas in PythonGeneral Functions in PythonHow to Slice Pandas DataFrame into Chunks

How to Slice Pandas DataFrame into Chunks

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

You can use the following basic syntax to slice a pandas DataFrame into smaller chunks:

#specify number of rows in each chunk
n=3

#split DataFrame into chunks
list_df = [df[i:i+n] for i in range(0,len(df),n)]

You can then access each chunk by using the following syntax:

#access first chunk
list_df[0]

The following example shows how to use this syntax in practice.

Example: Split Pandas DataFrame into Chunks

Suppose we have the following pandas DataFrame with nine rows that contain information about various basketball players:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
                   'points': [18, 22, 19, 14, 14, 11, 20, 28, 23],
                   'assists': [5, 7, 7, 9, 12, 9, 9, 4, 11],
                   'rebounds': [11, 8, 10, 6, 6, 5, 9, 12, 10]})

#view DataFrame
print(df)

  team  points  assists  rebounds
0    A      18        5        11
1    B      22        7         8
2    C      19        7        10
3    D      14        9         6
4    E      14       12         6
5    F      11        9         5
6    G      20        9         9
7    H      28        4        12
8    I      23       11        10

We can use the following syntax to split the DataFrame into chunks where each chunk has 3 rows:

#specify number of rows in each chunk
n=3

#split DataFrame into chunks
list_df = [df[i:i+n] for i in range(0,len(df),n)]

We can then use the following syntax to access each chunk:

#view first chunk
print(list_df[0])

  team  points  assists  rebounds
0    A      18        5        11
1    B      22        7         8
2    C      19        7        10

#view second chunk
print(list_df[1])

  team  points  assists  rebounds
3    D      14        9         6
4    E      14       12         6
5    F      11        9         5

#view third chunk
print(list_df[2])

  team  points  assists  rebounds
6    G      20        9         9
7    H      28        4        12
8    I      23       11        10

Notice that each chunk contains three rows, just as we specified.

Note that in this example we used a DataFrame with only nine rows as a simple example.

In practice, you’ll likely be working with a DataFrame with hundreds of thousands or even millions of rows.

You can use the same syntax that we used in this example to split your DataFrame into chunks of specific sizes.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

Pandas: How to Split DataFrame By Column Value
Pandas: How to Split String Column into Multiple Columns
Pandas: How to Split a Column of Lists into Multiple Columns

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories