6 C
London
Tuesday, March 11, 2025
HomeStatistics TutorialRHow to Center Data in R (With Examples)

How to Center Data in R (With Examples)

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

To center a dataset means to subtract the mean value from each individual observation in the dataset.

For example, suppose we have the following dataset:

It turns out that the mean value is 14. Thus, to center this dataset we would subtract 14 from each individual observation:

How to center data

Note that the mean value of the centered dataset is zero.

This tutorial provides several examples of how to center data in R.

Example 1: Center the Values of a Vector

The following code shows how to use the scale() function from base R to center the values in a vector:

#create vector
data #subtract the mean value from each observation in the vector
scale(data, scale=FALSE)

      [,1]
 [1,]  -10
 [2,]   -8
 [3,]   -5
 [4,]   -1
 [5,]    0
 [6,]    3
 [7,]    4
 [8,]    5
 [9,]    5
[10,]    7

attr(,"scaled:center")
[1] 14

The resulting values are the centered values of the dataset. The scale() function also tells us that the mean value of the dataset is 14.

Note that the scale() function, by default, subtracts the mean from each individual observation and then divides by the standard deviation.

By specifying scale=FALSE, we tell R not to divide by the standard deviation.

Example 2: Center the Columns in a Data Frame

The following code shows how to use the sapply() function and the scale() function from base R to center the values of each column of a data frame:

#create data frame
df #center each column in the data frame
df_new function(x) scale(x, scale=FALSE))

#display data frame
df_new

              x          y          z
[1,] -4.5714286 -1.4285714 -1.8571429
[2,] -1.5714286 -1.4285714 -1.8571429
[3,] -0.5714286 -0.4285714 -0.8571429
[4,]  0.4285714 -0.4285714 -0.8571429
[5,]  0.4285714 -0.4285714  1.1428571
[6,]  2.4285714  0.5714286  2.1428571
[7,]  3.4285714  3.5714286  2.1428571

We can verify that the mean of each column in the new data frame is equal  to zero by using the colMeans() function:

colMeans(df_new)

            x             y             z 
 2.537653e-16 -2.537653e-16  3.806479e-16 

The values are shown in scientific notation, but each value is essentially equal to zero.

Additional Resources

How to Average Across Columns in R
How to Sum Specific Columns in R
How to Remove Outliers from Multiple Columns in R

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories