How to Test for Normality in R (4 Methods)

Many statistical tests make the assumption that datasets are normally distributed.

There are four common ways to check this assumption in R:

1. (Visual Method) Create a histogram.

If the histogram is roughly “bell-shaped”, then the data is assumed to be normally distributed.

2. (Visual Method) Create a Q-Q plot.

If the points in the plot roughly fall along a straight diagonal line, then the data is assumed to be normally distributed.

3. (Formal Statistical Test) Perform a Shapiro-Wilk Test.

If the p-value of the test is greater than α = .05, then the data is assumed to be normally distributed.

4. (Formal Statistical Test) Perform a Kolmogorov-Smirnov Test.

If the p-value of the test is greater than α = .05, then the data is assumed to be normally distributed.

The following examples show how to use each of these methods in practice.

Method 1: Create a Histogram

The following code shows how to create a histogram for a normally distributed and non-normally distributed dataset in R:

#make this example reproducible
set.seed(0)

#create data that follows a normal distribution
normal_data #create data that follows an exponential distribution
non_normal_data #define plotting region
par(mfrow=c(1,2)) 

#create histogram for both datasets
hist(normal_data, col='steelblue', main='Normal')
hist(non_normal_data, col='steelblue', main='Non-normal')

The histogram on the left exhibits a dataset that is normally distributed (roughly a “bell-shape”) and the one on the right exhibits a dataset that is not normally distributed.

Method 2: Create a Q-Q plot

The following code shows how to create a Q-Q plot for a normally distributed and non-normally distributed dataset in R:

#make this example reproducible
set.seed(0)

#create data that follows a normal distribution
normal_data #create data that follows an exponential distribution
non_normal_data #define plotting region
par(mfrow=c(1,2)) 

#create Q-Q plot for both datasets
qqnorm(normal_data, main='Normal')
qqline(normal_data)

qqnorm(non_normal_data, main='Non-normal')
qqline(non_normal_data)

The Q-Q plot on the left exhibits a dataset that is normally distributed (the points fall along a straight diagonal line) and the Q-Q plot on the right exhibits a dataset that is not normally distributed.

Method 3: Perform a Shapiro-Wilk Test

The following code shows how to perform a Shapiro-Wilk test on a normally distributed and non-normally distributed dataset in R:

#make this example reproducible
set.seed(0)

#create data that follows a normal distribution
normal_data #perform shapiro-wilk test
shapiro.test(normal_data)

	Shapiro-Wilk normality test

data:  normal_data
W = 0.99248, p-value = 0.3952

#create data that follows an exponential distribution
non_normal_data #perform shapiro-wilk test
shapiro.test(non_normal_data)

	Shapiro-Wilk normality test

data:  non_normal_data
W = 0.84153, p-value = 1.698e-13

The p-value of the first test is not less than .05, which indicates that the data is normally distributed.

The p-value of the second test is less than .05, which indicates that the data is not normally distributed.

Method 4: Perform a Kolmogorov-Smirnov Test

The following code shows how to perform a Kolmogorov-Smirnov test on a normally distributed and non-normally distributed dataset in R:

#make this example reproducible
set.seed(0)

#create data that follows a normal distribution
normal_data #perform kolmogorov-smirnov test
ks.test(normal_data, 'pnorm')

	One-sample Kolmogorov-Smirnov test

data:  normal_data
D = 0.073535, p-value = 0.2296
alternative hypothesis: two-sided

#create data that follows an exponential distribution
non_normal_data #perform kolmogorov-smirnov test
ks.test(non_normal_data, 'pnorm') 
	One-sample Kolmogorov-Smirnov test

data:  non_normal_data
D = 0.50115, p-value

The p-value of the first test is not less than .05, which indicates that the data is normally distributed.

The p-value of the second test is less than .05, which indicates that the data is not normally distributed.

How to Handle Non-Normal Data

If a given dataset is not normally distributed, we can often perform one of the following transformations to make it more normally distributed:

1. Log Transformation: Transform the values from x to log(x).

2. Square Root Transformation: Transform the values from x to √x.

3. Cube Root Transformation: Transform the values from x to x^1/3.

By performing these transformations, the dataset typically becomes more normally distributed.

Read this tutorial to see how to perform these transformations in R.

Additional Resources

How to Create Histograms in R
How to Create & Interpret a Q-Q Plot in R
How to Perform a Shapiro-Wilk Test in R
How to Perform a Kolmogorov-Smirnov Test in R

Highlights of the 2023 Union Budget: Announcements for 15 Key Sectors

Gold Prices May Rise as Import Duty on Gold raised by 5%

Relief to MSMEs as Mandatory GST Registration waived for online sellers

GST Council Meet Highlights, Full List of Items to get Costlier

Highlights of the 2023 Union Budget: Announcements for 15 Key Sectors

Gold Prices May Rise as Import Duty on Gold raised by 5%

Relief to MSMEs as Mandatory GST Registration waived for online sellers

GST Council Meet Highlights, Full List of Items to get Costlier

Learn About Opening an Automobile Repair Shop in India

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

Income Splitting in Canada for 2023

Can I Deduct Home Office Expenses on my Tax Return 2023?

Canadian Tax – Personal Tax Deadline 2022

Method 1: Create a Histogram

Method 2: Create a Q-Q plot

Method 3: Perform a Shapiro-Wilk Test

Method 4: Perform a Kolmogorov-Smirnov Test

How to Handle Non-Normal Data

Additional Resources

Learn About Opening an Automobile Repair Shop in India

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

Income Splitting in Canada for 2023

Can I Deduct Home Office Expenses on my Tax Return 2023?

ABOUT US

Latest

Learn About Opening an Automobile Repair Shop in India

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

Income Splitting in Canada for 2023

Popular

How to Create a Stem-and-Leaf Plot in SPSS

How to Create a Correlation Matrix in SPSS

How to Add Target Line to Graph in Excel

Sitemap