Statistics deal with many different possibilities at once, which often makes it unintuitive to think about. In this introduction, we will discuss several concepts that are important for expressing knowledge about different possibilities, and how we can draw information from them.
A very important concept are random variables. Strictly speaking, random variables are neither random, nor are they variables. Instead, they are conceptual objects which assign numbers to distinct real-world phenomena, outcomes, or properties that are in some way random and mutually exclusive. For instance, a random variable might map "a dice roll of one" to 1, "a die roll of two" to 2, and so forth. A different random variable might map hair colors from black to blond to numbers between 0 and 1.
An important role of random variables is that they categorize and structure quantities of interests into a format to which we can later assign probabilities. These probabilities could represent the chance of an event occurring ("chance of rolling a one"), or the relative frequency of certain phenomena in a population ("chance of having red hair").
Joint probability distributions are an important ingredient of statistical inference. In a nutshell, these distributions describe the probability of all possible combinations of outcomes between two or more random variables. For discrete two-dimensional distributions, these probabilities are often represented as a so-called contingency table.
An example of this is provided in Figure 1. Here, you see the relative frequency of different eye colors and hair colors of 592 students. The probability of a specific combination can be calculated by dividing the number of students in the corresponding cell through the total number of students:
Question: What is the probability of your combination of hair and eye color in this contingency table?
Marginal probability distributions remove the influence of one or more random variables (here: eye color or hair color) from a contingency table by summing over the dimension we want to remove. For instance, if we want are interested in the probability of having green eyes irrespective of hair color, we can calculate this probability by summing over all columns:
The name marginal probability distribution derives from the fact these probability distributions are often visualized as being on the margins of the contingency table.
Question: What are the marginal probabilities of your hair color and your eye color?
Conditional probability distributions describe "if this, then what?" scenarios. For instance, if we assume that a person's eye colour is green, what is the probability of them also having red hair?
More generally, estimating conditional distributions is highly important in research and industry. They allow us to study scenarios such as "If global temperature raises by 5°C, what is the probability of the Netherlands flooding?" or play an integral part in statistical inference, for example in questions such as "If we observe a water table rise of 1m, how much has it rained?".
In a contingency table, conditional probabilities are calculated by dividing the frequency of a table entry (here: a combination of green eyes and any hair color) by the marginal probability of conditioning variable (here: green eyes).
Question: What are the conditional probabilities of (1) your hair color given your eye color, and (2) your eye color given your hair color? Which of the two is rarer?
So far, we have considered only discrete contingency tables with four different eye colors and four different hair colors (Figure 4, left). In practice, we are often free to choose the level of resolution at which we want to represent a system. For instance, we could choose to resolve the contingency table with nine different eye colors and hair colors instead (Figure 4, center). Observe that as we increase the resolution of the random variables, we require increasingly more total samples in order to retain the level of detail of the probability distribution underlying the contingency table.
As we take this increase in resolution to its extreme, the discrete contingency table gains an infinite number of rows and columns. This would require an infinite number of samples to populate, so we would never finish filling in this contingency table. Instead, we replace it with a continuous probability density function (pdf). Rather than storing the discrete probabilities of every possible value combination, pdfs describe the patterns of the underlying probability distribution directly (Figure 4, right). Integrating over a pdf always yields a value of 1, and integrating over a volume of parameter space (for instance, the grids of a contingency table) returns discrete probabilities.
The marginal and conditional distributions above are still defined the same way, but now they are also continuous pdfs. The summation operations above are replaced with integration.