How to Calculate Sample & Population Variance in R

The variance is a way to measure how spread out data values are around the mean.

The formula to find the variance of a population is:

σ² = Σ (x_i – μ)² / N

where μ is the population mean, x_i is the i^th element from the population, N is the population size, and Σ is just a fancy symbol that means “sum.”

The formula to find the variance of a sample is:

s² = Σ (x_i – x)² / (n-1)

where x is the sample mean, x_i is the i^th element in the sample, and n is the sample size.

Suppose we have the following dataset in R:

#define dataset
data

We can calculate the sample variance by using the var() function in R:

#calculate sample variance
var(data)

[1] 46.01111

And we can calculate the population variance by simply multiplying the sample variance by (n-1)/n as follows:

#determine length of data
n length(data)

#calculate population variance
var(data) * (n-1)/n

[1] 41.41

Note that the population variance will always be smaller than the sample variance.

In practice, we typically calculate sample variances for datasets since it’s unusual to collect data for an entire population.

Suppose we have the following data frame in R:

#create data frame
data #view data frame
data

   a  b  c
1  1  2  6
2  3  4  6
3  4  4  7
4  4  5  8
5  6  5  8
6  7  6  9
7  8  7  9
8 12 16 12

We can use the sapply() function to calculate the sample variance of each column in the data frame:

#find sample variance of each column
sapply(data, var)

        a         b         c 
11.696429 18.125000  3.839286

And we can use the following code to calculate the sample standard deviation of each column, which is simply the square root of the sample variance:

#find sample standard deviation of each column
sapply(data, sd)

       a        b        c 
3.420004 4.257347 1.959410

You can find more R tutorials here.