6.2 C
London
Thursday, December 19, 2024
HomePandas in PythonDataFrame Functions in PythonHow to Shuffle Rows in a Pandas DataFrame

How to Shuffle Rows in a Pandas DataFrame

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

You can use the following syntax to randomly shuffle the rows in a pandas DataFrame:

#shuffle entire DataFrame
df.sample(frac=1)

#shuffle entire DataFrame and reset index
df.sample(frac=1).reset_index(drop=True)

Here’s what each piece of the code does:

  • The sample() function takes a sample of all rows without replacement.
  • The frac argument specifies the fraction of rows to return in the sample. A frac value of 1 specifies to use all rows.
  • The reset_index(drop=True) function specifies to reset the index of the rows.

The following examples show how to use this syntax in practice.

Example 1: Shuffle Entire DataFrame

The following code shows how to shuffle all rows in a pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'C'],
                   'points': [77, 82, 86, 88, 80, 95],
                   'rebounds': [19, 22, 15, 28, 33, 29]})

#view DataFrame
df

	team	points	rebounds
0	A	77	19
1	A	82	22
2	A	86	15
3	B	88	28
4	B	80	33
5	C	95	29

#shuffle all rows of DataFrame
df.sample(frac=1)

	team	points	rebounds
1	A	82	22
3	B	88	28
2	A	86	15
5	C	95	29
4	B	80	33
0	A	77	19

Notice that the rows are shuffled and each row retained its original index value.

Also note that each time you run this function, the order of the rows will change. 

Example 2: Shuffle Entire DataFrame & Reset Index

The following code shows how to shuffle all rows in a pandas DataFrame and reset the index values:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'C'],
                   'points': [77, 82, 86, 88, 80, 95],
                   'rebounds': [19, 22, 15, 28, 33, 29]})

#view DataFrame
df

	team	points	rebounds
0	A	77	19
1	A	82	22
2	A	86	15
3	B	88	28
4	B	80	33
5	C	95	29

#shuffle all rows of DataFrame
df.sample(frac=1).reset_index(drop=True)

	team	points	rebounds
0	A	77	19
1	C	95	29
2	A	82	22
3	B	88	28
4	A	86	15
5	B	80	33

Notice that the rows are shuffled and the index is also reset so that the first row has an index value of 0, the second row has an index value of 1, and so on.

Additional Resources

How to Change the Order of Columns in Pandas DataFrame
How to Get Row Numbers in a Pandas DataFrame
How to Get First Row of Pandas DataFrame

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories