Latin hypercube sampling is a method that can be used to sample random numbers in which samples are distributed evenly over a sample space.
It is widely used to generate samples that are known as controlled random samples and is often applied in Monte Carlo analysis because it can dramatically reduce the number of simulations needed to achieve accurate results.
Introductory Example
To wrap your head around the idea of latin hypercube sampling, consider the following simple example:
Suppose we’d like to obtain a sample of 2 values from a dataset that is normally distributed with a mean of 0 and a standard deviation of 1.
If we used a true random number generator to obtain this sample, it’s possible that both values could be greater than 0 or that both values could be less than 0.
However, if we used latin hypercube sampling to obtain this sample then it would be guaranteed that one value would be above 0 and one would be below 0 because we could specifically partition the sample space into one region with values above 0 and one region with values below 0, then select a random sample from each region.
One-Dimensional Latin Hypercube Sampling
The idea behind one-dimensional latin hypercube sampling is simple: Divide a given CDF into n different regions and randomly choose one value from each region to obtain a sample of size n.
The benefit of this approach is that it ensures that at least one value from each region is included in the sample.
Two-Dimensional Latin Hypercube Sampling
We can easily extend the idea of one-dimensional latin hypercube sampling into two dimensions as well.
For two variables, x and y, we can divide the sample space of each variable into n evenly spaced regions and pick a random sample from each sample space to obtain random values across two dimensions.
It’s important to note that the two variables must be independent for this sampling technique to achieve the desired results.
N-Dimensional Latin Hypercube Sampling
To perform latin hypercube sampling in greater dimensions, we can simply extend the idea of two-dimensional latin hypercube sampling into even more dimensions.
Each variable is simply split into evenly spaced regions and random samples are then chosen from each region to obtain a controlled random sample.
Related: What is High Dimensional Data?
Why Use Latin Hypercube Sampling?
The main advantage of latin hypercube sampling is that it produces samples that reflect the true underlying distribution and it tends to require much smaller sample sizes than simple random sampling.
This method of sampling can be particularly advantageous if you’re working with data that has a high number of dimensions and you need to obtain random samples that are sure to reflect the true underlying distribution of the data.