*convergence in probability*and

*convergence in distribution*in the previous post - we now state two fundamental theorems which incorporate these concepts.

**(Weak) Law of Large Numbers:**

Let $\left\{X_i \right\}$ be a sequence of iid (independent and identically distributed) random variables with $\mathbb{E}\left[X_i\right] = \mu \lt \infty$ and $\text{Var}\left(X_i \right) = \sigma^2 $. Now define $$S_n:=\sum_{i=1}^{n} X_i$$ then $$\frac{S_n}{n}\rightarrow \mu$$ as $n\rightarrow \infty$.

**Proof:**Since the random variables

*are all iid then $$\mathbb{E}\left[\frac{S_n}{n} \right] = \frac{1}{n} \mathbb{E}\left[S_n\right] = \frac{n \mu}{n} = \mu$$ Similarly $$\text{Var}\left( \frac{S_n}{n} \right) = \frac{1}{n^2} \left(\mathbb{E}[S_n^2]- \mathbb{E}[S_n]^2 \right)=\frac{n \sigma^2}{n^2} = \frac{\sigma^2}{n}$$ We now call upon the Chebyshev's Inequality which states if $X$ is a random variable with finite expectation $\mu$ and non-zero variance $\sigma^2$ then $$P\left( \left|X-\mu \right| \ge n \sigma \right) \le \frac{1}{n^2}$$It gives a bound on the distributions values in terms of the variance - this is completely distribution agnostic.*

We can now use Chebyshev's Inequality and re-write it using $X=S_n$ and $n \sigma = \epsilon$:

$$P\left( \left|S_n-\mu \right| \ge \epsilon \right) \le \frac{\sigma^2}{n \epsilon^2}$$Hence

$$ P\left( \left|S_n-\mu \right| \lt \epsilon \right) = 0$$ as $n \rightarrow \infty$. Which is precisely the definition of convergence in probability.

**Central Limit Theorem (CLT):**

We shall state a slightly restricted version of the CLT which is fine for illustrative purposes. Taking our $S_n$ from above, then the CLT states:

Let $\left\{X_1, X_2, . . . \right\}$ be a sequence of i.i.d random variables with finite expectation $\mathbb{E}[X_i] = \mu < \infty$ and finite non-zero variance $\text{Var}\left(X_i\right) = \sigma^2 < \infty$. Then

$$ P \left( \frac{S_n-n \mu}{\sigma \sqrt{n}} \leq x \right) \rightarrow \Phi(x)$$

as $n \rightarrow \infty$ and the convergence is

*in distribution*and $\Phi(x)$ is the cumulative density function of of a Standard Normal variable. We shall not prove the CLT as only an understanding of what it says and how it can be applied is required.

This explains why Normal distributions are so common in modelling (and even nature) since under these mild conditions, regardless of distribution - if this combination of random variables is formed then it will tend to the normal distribution for large enough n.