16.2 C
London
Thursday, July 24, 2025
HomeRDescriptive Statistics in RHow to Calculate Sample & Population Variance in R

How to Calculate Sample & Population Variance in R

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

The variance is a way to measure how spread out data values are around the mean.

The formula to find the variance of a population is:

σ2 = Σ (xi – μ)2 / N

where μ is the population mean, xi is the ith element from the population, N is the population size, and Σ is just a fancy symbol that means “sum.”

The formula to find the variance of a sample is:

s2 = Σ (xix)2 / (n-1)

where x is the sample mean, xi is the ith element in the sample, and n is the sample size.

Example: Calculate Sample & Population Variance in R

Suppose we have the following dataset in R:

#define dataset
data 

We can calculate the sample variance by using the var() function in R:

#calculate sample variance
var(data)

[1] 46.01111

And we can calculate the population variance by simply multiplying the sample variance by (n-1)/n as follows:

#determine length of data
n length(data)

#calculate population variance
var(data) * (n-1)/n

[1] 41.41

Note that the population variance will always be smaller than the sample variance.

In practice, we typically calculate sample variances for datasets since it’s unusual to collect data for an entire population.

Example: Calculate Sample Variance of Multiple Columns

Suppose we have the following data frame in R:

#create data frame
data #view data frame
data

   a  b  c
1  1  2  6
2  3  4  6
3  4  4  7
4  4  5  8
5  6  5  8
6  7  6  9
7  8  7  9
8 12 16 12

We can use the sapply() function to calculate the sample variance of each column in the data frame:

#find sample variance of each column
sapply(data, var)

        a         b         c 
11.696429 18.125000  3.839286 

And we can use the following code to calculate the sample standard deviation of each column, which is simply the square root of the sample variance:

#find sample standard deviation of each column
sapply(data, sd)

       a        b        c 
3.420004 4.257347 1.959410 

You can find more R tutorials here.

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories