15.1 C
London
Friday, July 5, 2024
HomePythonDescriptive Statistics in PythonHow to Calculate Percentiles in Python (With Examples)

How to Calculate Percentiles in Python (With Examples)

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

The nth percentile of a dataset is the value that cuts off the first n percent of the data values when all of the values are sorted from least to greatest.

For example, the 90th percentile of a dataset is the value that cuts of the bottom 90% of the data values from the top 10% of data values.

We can quickly calculate percentiles in Python by using the numpy.percentile() function, which uses the following syntax:

numpy.percentile(a, q)

where:

  • a: Array of values
  • q: Percentile or sequence of percentiles to compute, which must be between 0 and 100 inclusive.

This tutorial explains how to use this function to calculate percentiles in Python.

How to Find Percentiles of an Array

The following code illustrates how to find various percentiles for a given array in Python:

import numpy as np

#make this example reproducible
np.random.seed(0)

#create array of 100 random integers distributed between 0 and 500
data = np.random.randint(0, 500, 100)

#find the 37th percentile of the array
np.percentile(data, 37)

173.26

#Find the quartiles (25th, 50th, and 75th percentiles) of the array
np.percentile(data, [25, 50, 75])

array([116.5, 243.5, 371.5])

How to Find Percentiles of a DataFrame Column

The following code shows how to find the 95th percentile value for a single pandas DataFrame column:

import numpy as np 
import pandas as pd

#create DataFrame
df = pd.DataFrame({'var1': [25, 12, 15, 14, 19, 23, 25, 29, 33, 35],
                   'var2': [5, 7, 7, 9, 12, 9, 9, 4, 14, 15],
                   'var3': [11, 8, 10, 6, 6, 5, 9, 12, 13, 16]})

#find 90th percentile of var1 column
np.percentile(df.var1, 95)

34.1

How to Find Percentiles of Several DataFrame Columns

The following code shows how to find the 95th percentile value for a several columns in a pandas DataFrame:

import numpy as np 
import pandas as pd

#create DataFrame
df = pd.DataFrame({'var1': [25, 12, 15, 14, 19, 23, 25, 29, 33, 35],
                   'var2': [5, 7, 7, 9, 12, 9, 9, 4, 14, 15],
                   'var3': [11, 8, 10, 6, 6, 5, 9, 12, 13, 16]})

#find 95th percentile of each column
df.quantile(.95)

var1    34.10
var2    14.55
var3    14.65

#find 95th percentile of just columns var1 and var2
df[['var1', 'var2']].quantile(.95)

var1    34.10
var2    14.55

Note that we were able to use the pandas quantile() function in the examples above to calculate percentiles.

Related: How to Calculate Percentiles in R (With Examples)

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories