13.2 C
London
Tuesday, July 2, 2024
HomePandas in PythonGeneral Functions in PythonHow to Select Rows by Index in a Pandas DataFrame

How to Select Rows by Index in a Pandas DataFrame

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

Often you may want to select the rows of a pandas DataFrame based on their index value.

If you’d like to select rows based on integer indexing, you can use the .iloc function.

If you’d like to select rows based on label indexing, you can use the .loc function.

This tutorial provides an example of how to use each of these functions in practice.

Example 1: Select Rows Based on Integer Indexing

The following code shows how to create a pandas DataFrame and use .iloc to select the row with an index integer value of 4:

import pandas as pd
import numpy as np

#make this example reproducible
np.random.seed(0)

#create DataFrame
df = pd.DataFrame(np.random.rand(6,2), index=range(0,18,3), columns=['A', 'B'])

#view DataFrame
df

	       A	       B
0	0.548814	0.715189
3	0.602763	0.544883
6	0.423655	0.645894
9	0.437587	0.891773
12	0.963663	0.383442
15	0.791725	0.528895

#select the 5th row of the DataFrame
df.iloc[[4]]

	       A	       B
12	0.963663	0.383442

We can use similar syntax to select multiple rows:

#select the 3rd, 4th, and 5th rows of the DataFrame
df.iloc[[2, 3, 4]]

	       A	       B
6	0.423655	0.645894
9	0.437587	0.891773
12	0.963663	0.383442

Or we could select all rows in a range:

#select the 3rd, 4th, and 5th rows of the DataFrame
df.iloc[2:5]

	       A	       B
6	0.423655	0.645894
9	0.437587	0.891773
12	0.963663	0.383442

Example 2: Select Rows Based on Label Indexing

The following code shows how to create a pandas DataFrame and use .loc to select the row with an index label of 3:

import pandas as pd
import numpy as np

#make this example reproducible
np.random.seed(0)

#create DataFrame
df = pd.DataFrame(np.random.rand(6,2), index=range(0,18,3), columns=['A', 'B'])

#view DataFrame
df

	       A	       B
0	0.548814	0.715189
3	0.602763	0.544883
6	0.423655	0.645894
9	0.437587	0.891773
12	0.963663	0.383442
15	0.791725	0.528895

#select the row with index label '3'
df.loc[[3]]

               A	       B
3	0.602763	0.544883

We can use similar syntax to select multiple rows with different index labels:

#select the rows with index labels '3', '6', and '9'
df.loc[[3, 6, 9]]

	       A	       B
3	0.602763	0.544883
6	0.423655	0.645894
9	0.437587	0.891773

The Difference Between .iloc and .loc

The examples above illustrate the subtle difference between .iloc an .loc:

  • .iloc selects rows based on an integer index. So, if you want to select the 5th row in a DataFrame, you would use df.iloc[[4]] since the first row is at index 0, the second row is at index 1, and so on.
  • .loc selects rows based on a labeled index. So, if you want to select the row with an index label of 5, you would directly use df.loc[[5]].

Additional Resources

How to Get Row Numbers in a Pandas DataFrame
How to Drop Rows with NaN Values in Pandas
How to Drop the Index Column in Pandas

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories