Assume you take a random sample of people and measure their heights. You generate a distribution of heights by measuring heights. When you need to know which outcomes are most likely, the spread of potential values and the chance of distinct outcomes, this form of distribution comes in handy.
Discrete Distributions Types
You may model different sorts of data using a number of discrete probability distributions. The qualities of your data determine the correct discrete distribution. Use the following as an example:
To model binary data, such as coin tosses, a binomial distribution is used.
To represent count data, such as the number of library book checkouts each hour, use the Poisson distribution.
To describe several events with the same probability, such as rolling a die, use a uniform distribution.
You’ll learn about probability distributions for both discrete and continuous variables in this blog post. I’ll demonstrate how they function and provide examples of how to apply them.
General Probability Distribution Properties
Probability distributions show how likely an occurrence or outcome is. To express probabilities, statisticians use the following notation:
p(x) is the probability that a random variable will take a particular value of x.
All probabilities for all conceivable values must add up to one. In addition, the probability for a certain value or range of values must be between 0 and 1.
The dispersion of the values of a random variable is described by probability distributions. As a result, the kind of variable defines the probability distribution type. Statistical distributions are divided into two forms for a single random variable:
For discrete variables, discrete probability distributions
Continuous variable probability density functions
A probability distribution can be represented using equations and tables containing variable values and probabilities. However, I like to use probability distribution plots to graph them. The distinctions between discrete and continuous probability distributions are immediately obvious in the examples that follow. You’ll see why these graphs are so appealing to me!
Discrete Probability Distributions
Probability mass functions are discrete probability functions that can take on a discrete number of values. Coin tosses and event counts, for example, are discrete functions. Because there are no in-between values, these are discrete distributions. In a coin toss, for example, you can only get heads or tails.
Each feasible value for discrete probability distribution functions has a non-zero likelihood. Furthermore, all feasible values’ probability must add up to one. Because the total probability is 1, each opportunity must have one of the values.
The probability of rolling a given number on a die, for example, is 1/6. For all six values, the overall probability is one. When you roll a die, you will almost always get one of the potential outcomes.
Use of Discrete Probability Distributions in Practice
The examples in this post will demonstrate why I enjoy graphing probability distributions. The following example is from a blog post in which I offer a statistical analysis of flu vaccine effectiveness. To answer the question, how many times can I anticipate catching the flu over the course of 20 years with and without annual vaccines, I utilise the binomial distribution.
Because the two possible outcomes are being infected with the virus or not being sick with the flu, this example uses binary data. According to numerous studies, the long-term risk of catching the flu is 0.07 for the unprotected and 0.019 for the vaccinated. The graph plots the pattern of outcomes for both scenarios over a twenty-year period by plugging these probabilities into the binomial distribution. Each bar represents the probability of getting the flu a certain number of times. I’ve also coloured the bars red to show the chance of at least two flu illnesses in the next 20 years. The expected outcomes with no vaccinations are shown on the left panel, whereas the expected outcomes with annual immunizations are shown on the right panel.
A graph depicting the likelihood of contracting the flu over various periods of time demonstrates the efficacy of flu vaccinations.
You’ll notice a big difference, demonstrating the power of probability distribution graphs! The highest bar on the graph is in the right panel, which signifies zero incidences of flu in 20 years when flu vaccines are taken. You have a 68 percent probability of not getting the flu in 20 years if you get vaccinated annually! In contrast, if you don’t get vaccinated, your chances of avoiding the flu are only 23%. The distribution in the left panel is substantially wider than in the right panel. You have a 41% probability of getting the flu at least twice in 20 years without immunizations, compared to 5% with annual vaccinations. In that time, some unlucky unprotected people will get the flu four or five times!
Distributions of Continuous Probability
Probability density functions are another name for continuous probability functions. If a variable may take an endless number of values between any two values, you have a continuous distribution. Continuous variables, such as height, weight, and temperature, are frequently measured using a scale.
Specific values in continuous probability distributions have a 0 probability, unlike discrete probability distributions where each value has a non-zero likelihood. The chances of measuring a temperature that is exactly 32 degrees, for example, are none.
Why? Consider that the temperature could be an infinite number of degrees higher or lower than 32. According to statisticians, an individual value has an infinitesimally small probability of zero.
Finding Probabilities in Continuous Data
Continuous distribution probabilities are calculated over ranges of values rather than single points. The likelihood of a value falling within an interval is expressed as a probability. This trait is easy to demonstrate using a probability distribution plot, which we’ll discuss shortly.
The total area under the distribution curve on a probability map equals 1. For discrete distributions, this is identical to the statement that the sum of all probabilities must equal one. The probability of a value falling within a range of values along the X-axis is represented by the fraction of the area under the curve that falls within that range. Finally, you can’t have an area under the curve with just one value, which is why the probability for a single value is zero.
Continuous Probability Distributions: Characteristics
There are different probability distributions for continuous data, just as there are many types of discrete distributions for different types of discrete data. The shape of each probability distribution is defined by parameters. The majority of distributions have 1-3 parameters. The form of the distribution and all of its probabilities are completely determined by these factors. The central tendency and variability are two important features of the distribution represented by these parameters.
For continuous data, there are a variety of alternative probability distributions. The following are some of the distributions:
The Weibull distribution is a very adaptable distribution that analysts employ in a variety of situations. Can approximate the normal distribution by modelling left- and right-skewed data.
Right-skewed distributions are modelled using the lognormal distribution, which is useful in situations where growth rates are independent of size. Provides the most accurate match for my body fat % data.
Models variables in which tiny values occur more frequently than larger values in an exponential distribution. To simulate the length of time between separate events, use this formula.
Models right-skewed distributions with the gamma distribution. The shape parameter k is used to model the time till the kth occurrence.
Uniform distribution: Models symmetric, continuous data with the same probability for all ranges of equal size.
Models variables with values that fall inside a finite interval using the beta distribution.
CONCLUSION
I plot the probability density function for continuous distributions in this post. However, instead of probability density, you can plot the cumulative distribution function (CDF), which shows data values as percentiles. I hope you can understand why probability distributions are so important in statistics and why I think charting them is a great approach to communicate results!