Mathematics

Multiple Random Variables

Traders and quantitative researchers routinely deal with wide ranges of assets simultaneously using the mathematical framework concerning multiple random variables. In this section we will generalise on the concepts we have covered to an arbitrary number of random variables, ending with a key theorem in quantitative finance: the central limit theorem.

Consider a vector, $\textbf{x}$, of $n$ random variables $(X_1, X_2,...,X_n)$. In the case of discrete random variables, we can define a joint probability mass function $$ P_{\{X_1, X_2,...,X_n\}}(x_1, x_2, ..., x_n) = P(X_1=x_1, X_2=x_2, ..., X_n=x_n) $$ The random variables are independent if $$ P(X_1=x_1, X_2=x_2, ..., X_n=x_n) = P(X_1=x_1) \cdot P(X_2=x_2) \cdot ... \cdot P(X_n=x_n). $$ Independence means that the outcome of one random variable does not affect any of the outcomes of the other random variables. We can also define a joint probability distribution for continuous random variables. we can define the probability of the vector $\textbf{x}$ being in the set $A$ as $$ P(\textbf{x} \in A) = \int...\int_A f_{\{X_1, X_2, ..., X_n\}}(x_1, x_2, ..., x_n) \hspace{1mm} \text{d}x_1 \text{d}x_2 \hspace{1mm} ... \hspace{1mm} \text{d}x_n $$ Similarly, continuous random variables are independent if $$ f_{\{X_1, X_2, ..., X_n\}}(x_1, x_2, ..., x_n) = f_{X_1}(x_1) \cdot f_{X_2}(x_2) \cdot ... \cdot f_{X_n}(x_n) $$ What if, for example, we want to obtain the probability distribution of the random variable $X_1$ from this joint probability distribution? We can simply integrate over all of the random variables that we don't want to consider. $$ f_{X_1}(x_1) = \int_{-\infty}^{\infty}...\int_{-\infty}^{\infty} f_{\{X_1, X_2, ..., X_n\}}(x_1, x_2, ..., x_n) \hspace{1mm} \text{d}x_2 \hspace{1mm} ... \hspace{1mm} \text{d}x_n $$ Random variables are independent and identically distributed (iid) if they are independent and have the same underlying probability distributions. How can we calculate statistics of the sum of multiple random variables? Calculating the expectation is straightforward - we can simply use the linearity of expectation. $$ \text{E}\left[\: \sum_{k=0}^n X_k \right] = \sum_{k=0}^n \text{E}[X_k] $$ What about variance? This is slightly trickier, because we have to take into account covariance between the random variables. We begin with the relationship that $$ \text{Var} \left( \sum_{j=0}^n X_j \right) = \text{Cov}\left(\sum_{j=0}^n X_j, \sum_{j=0}^n X_j \right) $$

First, let's write this expression in terms of the expectations of the relevant quantities $$ \begin{align*} \text{Cov}(X+Y, Z) = \text{E} \left[ (X + Y - \mu_X - \mu_Y)(Z - \mu_Z) \right]\\ = \text{E} \left[ (X - \mu_X)(Z - \mu_Z) + (Y - \mu_Y)(Z - \mu_Z) \right]\\ \end{align*}. $$ Using linearity of expectation gives us the result. $$ \text{Cov}(X+Y, Z) = \text{E} \left[ (X - \mu_X)(Z - \mu_Z) \right] + \text{E} \left[ (Y - \mu_Y)(Z - \mu_Z) \right]\\ = \text{Cov}(X,Z) + \text{Cov}(Y,Z). $$

This relationship in the exercise can be generalised to an arbitrary number of random variables $$ \text{Cov}\left(\sum_{j=0}^m X_j, \sum_{j=0}^m X_j \right) = \sum_{i=0}^n \sum_{j=0}^n \text{Cov} \left(X_i, X_j \right). $$ Here we are summing all of the elements of a covariance matrix. However, the fact that the covariance matrix is symmetric implies $$ \text{Cov}(X_i, X_j) = \text{Cov}(X_j, X_i) $$ which means that $$ \sum_{i=0}^n \sum_{j=0}^n \text{Cov} \left(X_i, X_j \right) = \sum_i^n \text{Var}(X_i) + 2 \cdot \sum_{i=1}^n \sum_{j < i} \text{Cov}(X_i, X_j). $$ This equation shows us that if we have a collection of random variables that are independent, then $$ \text{Var} \left( \: \sum_{j=0}^n X_j \right) = \sum_{j=0}^n \text{Var} \left( X_j \right). $$ This fact is worth remembering - it will underpin important results when we mathematically model asset prices later on in the course.

Example: The Central Limit Theorem

Now that we have been introduced to the normal distribution, let's cover an extremely important theorem in statistics. Questions about the central limit theorem come up consistently in quantitative finance interviews. Consider a set of independent and identically distributed random variables, $\{ X_n \}$. Let $$ S_n = \sum_{k=1}^{n} X_n. $$ The central limit theorem states $$ \lim_{n \rightarrow \infty} \frac{S_n - n \mu}{\sigma \sqrt{n}} \sim N(0, 1) $$ which is that, in the limit of $n \rightarrow \infty$, the above expression is distributed as a normal distribution with mean $0$ and standard deviation $1$. Why is this theorem so important? The simple reason is that the central limit theorem enables us to assume that phenomena, like returns of financial assets, are samples of some normal probability distribution. We can justify this since there are an uncountable, effectively infinite number of factors that can influence the price of an asset. We will in the next section that there are serious flaws with this concept. Nevertheless, the central limit theorem is the foundation for many of the seminal concepts in quantitative finance, such as Mean-variance portfolio optimisation, risk modelling, and option pricing.