19.3 C
London
Monday, July 14, 2025
HomeStatistics TutorialRStratified Sampling in R (With Examples)

Stratified Sampling in R (With Examples)

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

Researchers often take samples from a population and use the data from the sample to draw conclusions about the population as a whole.

One commonly used sampling method is stratified random sampling, in which a population is split into groups and a certain number of members from each group are randomly selected to be included in the sample.

This tutorial explains how to perform stratified random sampling in R.

Example: Stratified Sampling in R

A high school is composed of 400 students who are either Freshman, Sophomores, Juniors, or Seniors. Suppose we’d like to take a stratified sample of 40 students such that 10 students from each grade are included in the sample.

The following code shows how to generate a sample data frame of 400 students:

#make this example reproducible
set.seed(1)

#create data frame
df each=100),
                 gpa = rnorm(400, mean=85, sd=3))

#view first six rows of data frame
head(df)

     grade      gpa
1 Freshman 83.12064
2 Freshman 85.55093
3 Freshman 82.49311
4 Freshman 89.78584
5 Freshman 85.98852
6 Freshman 82.53859

Stratified Sampling Using Number of Rows

The following code shows how to use the group_by() and sample_n() functions from the dplyr package to obtain a stratified random sample of 40 total students with 10 students from each grade:

library(dplyr)

#obtain stratified sample
strat_sample %
                  group_by(grade) %>%
                  sample_n(size=10)

#find frequency of students from each grade
table(strat_sample$grade)

 Freshman    Junior    Senior Sophomore 
       10        10        10        10 

Stratified Sampling Using Fraction of Rows

The following code shows how to use the group_by() and sample_frac() functions from the dplyr package to obtain a stratified random sample in which we randomly select 15% of students from each grade:

library(dplyr)

#obtain stratified sample
strat_sample %
                  group_by(grade) %>%
                  sample_frac(size=.15)

#find frequency of students from each grade
table(strat_sample$grade)

 Freshman    Junior    Senior Sophomore 
       15        15        15        15 

Additional Resources

Types of Sampling Methods
Cluster Sampling in R
Systematic Sampling in R

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories