2.4 C
London
Friday, December 20, 2024
HomeStatistics TutorialRHow to Perform Quantile Normalization in R

How to Perform Quantile Normalization in R

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

In statistics, quantile normalization is a method that makes two distributions identical in statistical properties.

The following example shows how to perform quantile normalization in R.

Example: Quantile Normalization in R

Suppose we create the following data frame in R that contains two columns:

#make this example reproducible
set.seed(0)

#create data frame with two columns
df frame(x=rnorm(1000),
                 y=rnorm(1000))

#view first six rows of data frame
head(df)

           x           y
1  1.2629543 -0.28685156
2 -0.3262334  1.84110689
3  1.3297993 -0.15676431
4  1.2724293 -1.38980264
5  0.4146414 -1.47310399
6 -1.5399500 -0.06951893

We can use the sapply() and quantile() functions to calculate the quantiles for both x and y:

#calculate quantiles for x and y
sapply(df, function(x) quantile(x, probs = seq(0, 1, 1/4)))

               x           y
0%   -3.23638573 -3.04536393
25%  -0.70845589 -0.73331907
50%  -0.05887078 -0.03181533
75%   0.68763873  0.71755969
100%  3.26641452  3.03903341

Notice that x and y have similar values for the quantiles, but not identical values.

For example, the value at the 25th percentile for x is -0.708 and the value at the 25th percentile for y is -0.7333.

To perform quantile normalization, we can use the normalize.quantiles() function from the preprocessCore package in R:

library(preprocessCore)

#perform quantile normalization
df_norm data.frame(normalize.quantiles(as.matrix(df)))

#rename data frame columns
names(df_norm) x', 'y')

#view first six row of new data frame
head(df_norm)

           x           y
1  1.2632137 -0.28520228
2 -0.3469744  1.82440519
3  1.3465807 -0.16471644
4  1.2692599 -1.34472394
5  0.4161133 -1.43717759
6 -1.6269731 -0.07906793

We can then use the following code to calculate the quantiles for both x and y again:

#calculate quantiles for x and y
sapply(df_norm, function(x) quantile(x, probs = seq(0, 1, 1/4)))

               x           y
0%   -3.14087483 -3.14087483
25%  -0.72088748 -0.72088748
50%  -0.04534305 -0.04534305
75%   0.70259921  0.70259921
100%  3.15272396  3.15272396

Notice that the quantiles are identical for x and y now.

We would say that x and y have been quantile normalized. That is, the two distributions are now identical in statistical properties.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Normalize Data in R
How to Calculate Percentiles in R
How to Use the quantile() Function in R

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories