This tutorial provides a simple explanation of the difference between a PDF (probability density function) and a CDF (cumulative distribution function) in statistics.
Random Variables
Before we can define a PDF or a CDF, we first need to understand random variables.
A random variable, usually denoted as X, is a variable whose values are numerical outcomes of some random process. There are two types of random variables: discrete and continuous.
Discrete Random Variables
A discrete random variable is one which can take on only a countable number of distinct values like 0, 1, 2, 3, 4, 5…100, 1 million, etc. Some examples of discrete random variables include:
- The number of times a coin lands on tails after being flipped 20 times.
- The number of times a dice lands on the number 4 after being rolled 100 times.
Continuous Random Variables
A continuous random variable is one which can take on an infinite number of possible values. Some examples of continuous random variables include:
- Height of a person
- Weight of an animal
- Time required to run a mile
For example, the height of a person could be 60.2 inches, 65.2344 inches, 70.431222 inches, etc. There are an infinite amount of possible values for height.
Rule of Thumb: If you can count the number of outcomes, then you are working with a discrete random variable (e.g. counting the number of times a coin lands on heads). But if you can measure the outcome, you are working with a continuous random variable (e.g. measuring, height, weight, time, etc.)
Probability Density Functions
A probability density function (pdf) tells us the probability that a random variable takes on a certain value.
For example, suppose we roll a dice one time. If we let x denote the number that the dice lands on, then the probability density function for the outcome can be described as follows:
P(x : 0
P(x = 1) : 1/6
P(x = 2) : 1/6
P(x = 3) : 1/6
P(x = 4) : 1/6
P(x = 5) : 1/6
P(x = 6) : 1/6
P(x > 6) : 0
Note that this is an example of a discrete random variable, since x can only take on integer values.
For a continuous random variable, we cannot use a PDF directly, since the probability that x takes on any exact value is zero.
For example, suppose we want to know the probability that a burger from a particular restaurant weighs a quarter-pound (0.25 lbs). Since weight is a continuous variable, it can take on an infinite number of values.
For example, a given burger might actually weight 0.250001 pounds, or 0.24 pounds, or 0.2488 pounds. The probability that a given burger weights exactly .25 pounds is essentially zero.
Cumulative Distribution Functions
A cumulative distribution function (cdf) tells us the probability that a random variable takes on a value less than or equal to x.
For example, suppose we roll a dice one time. If we let x denote the number that the dice lands on, then the cumulative distribution function for the outcome can be described as follows:
P(x ≤ 0) : 0
P(x ≤ 1) : 1/6
P(x ≤ 2) : 2/6
P(x ≤ 3) : 3/6
P(x ≤ 4) : 4/6
P(x ≤ 5) : 5/6
P(x ≤ 6) : 6/6
P(x > 6) : 0
Notice that the probability that x is less than or equal to 6 is 6/6, which is equal to 1. This is because the dice will land on either 1, 2, 3, 4, 5, or 6 with 100% probability.
This example uses a discrete random variable, but a continuous density function can also be used for a continuous random variable.
Cumulative distribution functions have the following properties:
- The probability that a random variable takes on a value less than the smallest possible value is zero. For example, the probability that a dice lands on a value less than 1 is zero.
- The probability that a random variable takes on a value less than or equal to the largest possible value is one. For example, the probability that a dice lands on a value of 1, 2, 3, 4, 5, or 6 is one. It must land on one of those numbers.
- The cdf is always non-decreasing. That is, the probability that a dice lands on a number less than or equal to 1 is 1/6, the probability that it lands on a number less than or equal to 2 is 2/6, the probability that it lands on a number less than or equal to 3 is 3/6, etc. The cumulative probabilities are always non-decreasing.
Related: You can use an ogive graph to visualize a cumulative distribution function.
The Relationship Between a CDF and a PDF
In technical terms, a probability density function (pdf) is the derivative of a cumulative distribution function (cdf).
Furthermore, the area under the curve of a pdf between negative infinity and x is equal to the value of x on the cdf.
For an in-depth explanation of the relationship between a pdf and a cdf, along with the proof for why the pdf is the derivative of the cdf, refer to a statistical textbook.