In statistics, deciles are numbers that split a dataset into ten groups of equal frequency.
The first decile is the point where 10% of all data values lie below it. The second decile is the point where 20% of all data values lie below it, and so on.
We can use the following syntax to calculate the deciles for a dataset in Python:
import numpy as np np.percentile(var, np.arange(0, 100, 10))
The following example shows how to use this function in practice.
Example: Calculate Deciles in Python
The following code shows how to create a fake dataset with 20 values and then calculate the values for the deciles of the dataset:
import numpy as np
#create data
data = np.array([56, 58, 64, 67, 68, 73, 78, 83, 84, 88,
89, 90, 91, 92, 93, 93, 94, 95, 97, 99])
#calculate deciles of data
np.percentile(data, np.arange(0, 100, 10))
array([56. , 63.4, 67.8, 76.5, 83.6, 88.5, 90.4, 92.3, 93.2, 95.2])
The way to interpret the deciles is as follows:
- 10% of all data values lie below 63.4
- 20% of all data values lie below 67.8.
- 30% of all data values lie below 76.5.
- 40% of all data values lie below 83.6.
- 50% of all data values lie below 88.5.
- 60% of all data values lie below 90.4.
- 70% of all data values lie below 92.3.
- 80% of all data values lie below 93.2.
- 90% of all data values lie below 95.2.
Note that the first value in the output (56) simply denotes the minimum value in the dataset.
Example: Place Values into Deciles in Python
To place each data value into a decile, we can use the qcut pandas function.
Here’s how to use this function for the dataset we created in the previous example:
import pandas as pd
#create data frame
df = pd.DataFrame({'values': [56, 58, 64, 67, 68, 73, 78, 83, 84, 88,
89, 90, 91, 92, 93, 93, 94, 95, 97, 99]})
#calculate decile of each value in data frame
df['Decile'] = pd.qcut(df['values'], 10, labels=False)
#display data frame
df
values Decile
0 56 0
1 58 0
2 64 1
3 67 1
4 68 2
5 73 2
6 78 3
7 83 3
8 84 4
9 88 4
10 89 5
11 90 5
12 91 6
13 92 6
14 93 7
15 93 7
16 94 8
17 95 8
18 97 9
19 99 9
The way to interpret the output is as follows:
- The data value 56 falls between the percentile 0% and 10%, thus it falls in decile 0.
- The data value 58 falls between the percentile 0% and 10%, thus it falls in decile 0.
- The data value 64 falls between the percentile 10% and 20%, thus it falls in decile 1..
- The data value 67 falls between the percentile 10% and 20%, thus it falls decile 1.
- The data value 68 falls between the percentile 20% and 30%, thus it falls decile 2.
And so on.
Additional Resources
How to Calculate Percentiles in Python
How to Calculate The Interquartile Range in Python