diff --git a/.gitattributes b/.gitattributes
new file mode 100644
index 0000000..e74fa55
--- /dev/null
+++ b/.gitattributes
@@ -0,0 +1 @@
+typst/** linguist-generated
diff --git a/src/posts/2025-02-16-probability-distributions.md b/src/posts/2025-02-16-probability-distributions.md
new file mode 100644
index 0000000..8cea2e2
--- /dev/null
+++ b/src/posts/2025-02-16-probability-distributions.md
@@ -0,0 +1,1329 @@
+---
+author: "Youwen Wu"
+authorTwitter: "@youwen"
+keywords: "probability, counting, math, expected value, distributions"
+lang: "en"
+title: "Random variables, distributions, and probability theory"
+desc: "An overview of discrete and continuous random variables and their distributions and moment generating functions"
+---
+
+These are some notes I've been collecting on random variables, their
+distributions, expected values, and moment generating functions. I
+thought I'd write them down somewhere useful.
+
+These are almost extracted verbatim from my in-class notes, which I take
+in real time using Typst. I simply wrote a tiny compatibility shim to
+allow Pandoc to render them to the web.
+
+---
+
+## Random variables
+
+First, some brief exposition on random variables. Quixotically, a random
+variable is actually a function.
+
+Standard notation, $\Omega$ is sample space, $\omega$ is an event.
+
+*Definition. *
+
+A **random variable** $X$ is a function
+$X:\Omega \rightarrow {\mathbb{R}}$ that gives the probability of an
+event $\omega \in \Omega$.
+
+*Definition. *
+
+The **state space** of a random variable $X$ is all of the values $X$
+can take.
+
+*Example. *
+
+Let $X$ be a random variable that takes on the values
+$\left\{ 0,1,2,3 \right\}$. Then the state space of $X$ is the set
+$\left\{ 0,1,2,3 \right\}$.
+
+### Discrete random variables
+
+A random variable $X$ is discrete if there is countable $A$ such that
+$P(X \in A) = 1$. $k$ is a possible value if $P(X = k) > 0$. We discuss
+continuous random variables later.
+
+The *probability distribution* of $X$ gives its important probabilistic
+information. The probability distribution is a description of the
+probabilities $P(X \in B)$ for subsets $B \in {\mathbb{R}}$. We describe
+the probability density function and the cumulative distribution
+function.
+
+A discrete random variable has probability distribution entirely
+determined by its probability mass function (hereafter abbreviated p.m.f
+or PMF) $p(k) = P(X = k)$. The p.m.f. is a function from the set of
+possible values of $X$ into $\lbrack 0,1\rbrack$. Labeling the p.m.f.
+with the random variable is done by $p_{X}(k)$.
+
+$$p_{X}:\text{ State space of }X \rightarrow \lbrack 0,1\rbrack$$
+
+By the axioms of probability,
+
+$$\sum_{k}p_{X}(k) = \sum_{k}P(X = k) = 1$$
+
+For a subset $B \subset {\mathbb{R}}$,
+
+$$P(X \in B) = \sum_{k \in B}p_{X}(k)$$
+
+### Continuous random variables
+
+Now as promised we introduce another major class of random variables.
+
+*Definition. *
+
+Let $X$ be a random variable. If $f$ satisfies
+
+$$P(X \leq b) = \int_{- \infty}^{b}f(x)dx$$
+
+for all $b \in {\mathbb{R}}$, then $f$ is the **probability density
+function** (hereafter abbreviated p.d.f. or PDF) of $X$.
+
+We immediately see that the p.d.f. is analogous to the c.d.f. of the
+discrete case.
+
+The probability that $X \in ( - \infty,b\rbrack$ is equal to the area
+under the graph of $f$ from $- \infty$ to $b$.
+
+A corollary is the following.
+
+*Fact. *
+
+$$P(X \in B) = \int_{B}f(x)dx$$
+
+for any $B \subset {\mathbb{R}}$ where integration makes sense.
+
+The set can be bounded or unbounded, or any collection of intervals.
+
+*Fact. *
+
+$$P(a \leq X \leq b) = \int_{a}^{b}f(x)dx$$
+$$P(X > a) = \int_{a}^{\infty}f(x)dx$$
+
+*Fact. *
+
+If a random variable $X$ has density function $f$ then individual point
+values have probability zero:
+
+$$P(X = c) = \int_{c}^{c}f(x)dx = 0,\forall c \in {\mathbb{R}}$$
+
+*Remark. *
+
+It follows a random variable with a density function is not discrete. An
+immediate corollary of this is that the probabilities of intervals are
+not changed by including or excluding endpoints. So $P(X \leq k)$ and
+$P(X < k)$ are equivalent.
+
+How to determine which functions are p.d.f.s? Since
+$P( - \infty < X < \infty) = 1$, a p.d.f. $f$ must satisfy
+
+$$\begin{array}{r}
+f(x) \geq 0\forall x \in {\mathbb{R}} \\
+\int_{- \infty}^{\infty}f(x)dx = 1
+\end{array}$$
+
+*Fact. *
+
+Random variables with density functions are called *continuous* random
+variables. This does not imply that the random variable is a continuous
+function on $\Omega$ but it is standard terminology.
+
+## Discrete distributions
+
+Recall that the *probability distribution* of $X$ gives its important
+probabilistic information. Let us discuss some of these distributions.
+
+In general we first consider the experiment's properties and theorize
+about the distribution that its random variable takes. We can then apply
+the distribution to find out various pieces of probabilistic
+information.
+
+### Bernoulli trials
+
+A Bernoulli trial is the original "experiment." It's simply a single
+trial with a binary "success" or "failure" outcome. Encode this T/F, 0
+or 1, or however you'd like. It becomes immediately useful in defining
+more complex distributions, so let's analyze its properties.
+
+The setup: the experiment has exactly two outcomes:
+
+-   Success -- $S$ or 1
+
+-   Failure -- $F$ or 0
+
+Additionally: $$\begin{array}{r}
+P(S) = p,(0 < p < 1) \\
+P(F) = 1 - p = q
+\end{array}$$
+
+Construct the probability mass function:
+
+$$\begin{array}{r}
+P(X = 1) = p \\
+P(X = 0) = 1 - p
+\end{array}$$
+
+Write it as:
+
+$$p_{x(k)} = p^{k}(1 - p)^{1 - k}$$
+
+for $k = 1$ and $k = 0$.
+
+### Binomial distribution
+
+The setup: very similar to Bernoulli, trials have exactly 2 outcomes. A
+bunch of Bernoulli trials in a row.
+
+Importantly: $p$ and $q$ are defined exactly the same in all trials.
+
+This ties the binomial distribution to the sampling with replacement
+model, since each trial does not affect the next.
+
+We conduct $n$ **independent** trials of this experiment. Example with
+coins: each flip independently has a $\frac{1}{2}$ chance of heads or
+tails (holds same for die, rigged coin, etc).
+
+$n$ is fixed, i.e. known ahead of time.
+
+#### Binomial random variable
+
+Let's consider the random variable characterized by the binomial
+distribution now.
+
+Let $X = \#$ of successes in $n$ independent trials. For any particular
+sequence of $n$ trials, it takes the form
+$\Omega = \left\{ \omega \right\}\text{ where }\omega = SFF\cdots F$ and
+is of length $n$.
+
+Then $X(\omega) = 0,1,2,\ldots,n$ can take $n + 1$ possible values. The
+probability of any particular sequence is given by the product of the
+individual trial probabilities.
+
+*Example. *
+
+$$\omega = SFFSF\cdots S = (pqqpq\cdots p)$$
+
+So $P(x = 0) = P(FFF\cdots F) = q \cdot q \cdot \cdots \cdot q = q^{n}$.
+
+And $$\begin{array}{r}
+P(X = 1) = P(SFF\cdots F) + P(FSFF\cdots F) + \cdots + P(FFF\cdots FS) \\
+ = \underset{\text{ possible outcomes}}{\underbrace{n}} \cdot p^{1} \cdot p^{n - 1} \\
+ = \begin{pmatrix}
+n \\
+1
+\end{pmatrix} \cdot p^{1} \cdot p^{n - 1} \\
+ = n \cdot p^{1} \cdot p^{n - 1}
+\end{array}$$
+
+Now we can generalize
+
+$$P(X = 2) = \begin{pmatrix}
+n \\
+2
+\end{pmatrix}p^{2}q^{n - 2}$$
+
+How about all successes?
+
+$$P(X = n) = P(SS\cdots S) = p^{n}$$
+
+We see that for all failures we have $q^{n}$ and all successes we have
+$p^{n}$. Otherwise we use our method above.
+
+In general, here is the probability mass function for the binomial
+random variable
+
+$$P(X = k) = \begin{pmatrix}
+n \\
+k
+\end{pmatrix}p^{k}q^{n - k},\text{ for }k = 0,1,2,\ldots,n$$
+
+Binomial distribution is very powerful. Choosing between two things,
+what are the probabilities?
+
+To summarize the characterization of the binomial random variable:
+
+-   $n$ independent trials
+
+-   each trial results in binary success or failure
+
+-   with probability of success $p$, identically across trials
+
+with $X = \#$ successes in **fixed** $n$ trials.
+
+$$X\sim\text{ Bin}(n,p)$$
+
+with probability mass function
+
+$$P(X = x) = \begin{pmatrix}
+n \\
+x
+\end{pmatrix}p^{x}(1 - p)^{n - x} = p(x)\text{ for }x = 0,1,2,\ldots,n$$
+
+We see this is in fact the binomial theorem!
+
+$$p(x) \geq 0,\sum_{x = 0}^{n}p(x) = \sum_{x = 0}^{n}\begin{pmatrix}
+n \\
+x
+\end{pmatrix}p^{x}q^{n - x} = (p + q)^{n}$$
+
+In fact, $$(p + q)^{n} = \left( p + (1 - p) \right)^{n} = 1$$
+
+*Example. *
+
+What is the probability of getting exactly three aces (1's) out of 10
+throws of a fair die?
+
+Seems a little trickier but we can still write this as well defined
+$S$/$F$. Let $S$ be getting an ace and $F$ being anything else.
+
+Then $p = \frac{1}{6}$ and $n = 10$. We want $P(X = 3)$. So
+
+$$\begin{array}{r}
+P(X = 3) = \begin{pmatrix}
+10 \\
+3
+\end{pmatrix}p^{3}q^{7} = \begin{pmatrix}
+10 \\
+3
+\end{pmatrix}\left( \frac{1}{6} \right)^{3}\left( \frac{5}{6} \right)^{7} \\
+ \approx 0.15505
+\end{array}$$
+
+#### With or without replacement?
+
+I place particular emphasis on the fact that the binomial distribution
+generally applies to cases where you're sampling with *replacement*.
+Consider the following: *Example. *
+
+Suppose we have two types of candy, red and black. Select $n$ candies.
+Let $X$ be the number of red candies among $n$ selected.
+
+2 cases.
+
+-   case 1: with replacement: Binomial Distribution, $n$,
+    $p = \frac{a}{a + b}$.
+
+$$P(X = 2) = \begin{pmatrix}
+n \\
+2
+\end{pmatrix}\left( \frac{a}{a + b} \right)^{2}\left( \frac{b}{a + b} \right)^{n - 2}$$
+
+-   case 2: without replacement: then use counting
+
+$$P(X = x) = \frac{\begin{pmatrix}
+a \\
+x
+\end{pmatrix}\begin{pmatrix}
+b \\
+n - x
+\end{pmatrix}}{\begin{pmatrix}
+a + b \\
+n
+\end{pmatrix}} = p(x)$$
+
+In case 2, we used the elementary counting techniques we are already
+familiar with. Immediately we see a distinct case similar to the
+binomial but when sampling without replacement. Let's formalize this as
+a random variable!
+
+### Hypergeometric distribution
+
+Let's introduce a random variable to represent a situation like case 2
+above.
+
+*Definition. *
+
+$$P(X = x) = \frac{\begin{pmatrix}
+a \\
+x
+\end{pmatrix}\begin{pmatrix}
+b \\
+n - x
+\end{pmatrix}}{\begin{pmatrix}
+a + b \\
+n
+\end{pmatrix}} = p(x)$$
+
+is known as a **Hypergeometric distribution**.
+
+Abbreviate this by:
+
+$$X\sim\text{ Hypergeom}\left( \#\text{ total},\#\text{ successes},\text{ sample size} \right)$$
+
+For example,
+
+$$X\sim\text{ Hypergeom}\left( N,N_{a},n \right)$$
+
+*Remark. *
+
+If $x$ is very small relative to $a + b$, then both cases give similar
+(approx. the same) answers.
+
+For instance, if we're sampling for blood types from UCSB, and we take a
+student out without replacement, we don't really change the sample size
+substantially. So both answers give a similar result.
+
+Suppose we have two types of items, type $A$ and type $B$. Let $N_{A}$
+be $\#$ type $A$, $N_{B}$ $\#$ type $B$. $N = N_{A} + N_{B}$ is the
+total number of objects.
+
+We sample $n$ items **without replacement** ($n \leq N$) with order not
+mattering. Denote by $X$ the number of type $A$ objects in our sample.
+
+*Definition. *
+
+Let $0 \leq N_{A} \leq N$ and $1 \leq n \leq N$ be integers. A random
+variable $X$ has the **hypergeometric distribution** with parameters
+$\left( N,N_{A},n \right)$ if $X$ takes values in the set
+$\left\{ 0,1,\ldots,n \right\}$ and has p.m.f.
+
+$$P(X = k) = \frac{\begin{pmatrix}
+N_{A} \\
+k
+\end{pmatrix}\begin{pmatrix}
+N - N_{A} \\
+n - k
+\end{pmatrix}}{\begin{pmatrix}
+N \\
+n
+\end{pmatrix}} = p(k)$$
+
+*Example. *
+
+Let $N_{A} = 10$ defectives. Let $N_{B} = 90$ non-defectives. We select
+$n = 5$ without replacement. What is the probability that 2 of the 5
+selected are defective?
+
+$$X\sim\text{ Hypergeom }\left( N = 100,N_{A} = 10,n = 5 \right)$$
+
+We want $P(X = 2)$.
+
+$$P(X = 2) = \frac{\begin{pmatrix}
+10 \\
+2
+\end{pmatrix}\begin{pmatrix}
+90 \\
+3
+\end{pmatrix}}{\begin{pmatrix}
+100 \\
+5
+\end{pmatrix}} \approx 0.0702$$
+
+*Remark. *
+
+Make sure you can distinguish when a problem is binomial or when it is
+hypergeometric. This is very important on exams.
+
+Recall that both ask about number of successes, in a fixed number of
+trials. But binomial is sample with replacement (each trial is
+independent) and sampling without replacement is hypergeometric.
+
+### Geometric distribution
+
+Consider an infinite sequence of independent trials. e.g. number of
+attempts until I make a basket.
+
+In fact we can think of this as a variation on the binomial
+distribution. But in this case we don't sample $n$ times and ask how
+many successes we have, we sample as many times as we need for *one*
+success. Later on we'll see this is really a specific case of another
+distribution, the *negative binomial*.
+
+Let $X_{i}$ denote the outcome of the $i^{\text{th}}$ trial, where
+success is 1 and failure is 0. Let $N$ be the number of trials needed to
+observe the first success in a sequence of independent trials with
+probability of success $p$. Then
+
+We fail $k - 1$ times and succeed on the $k^{\text{th}}$ try. Then:
+
+$$P(N = k) = P\left( X_{1} = 0,X_{2} = 0,\ldots,X_{k - 1} = 0,X_{k} = 1 \right) = (1 - p)^{k - 1}p$$
+
+This is the probability of failures raised to the amount of failures,
+times probability of success.
+
+The key characteristic in these trials, we keep going until we succeed.
+There's no $n$ choose $k$ in front like the binomial distribution
+because there's exactly one sequence that gives us success.
+
+*Definition. *
+
+Let $0 < p \leq 1$. A random variable $X$ has the geometric distribution
+with success parameter $p$ if the possible values of $X$ are
+$\left\{ 1,2,3,\ldots \right\}$ and $X$ satisfies
+
+$$P(X = k) = (1 - p)^{k - 1}p$$
+
+for positive integers $k$. Abbreviate this by $X\sim\text{ Geom}(p)$.
+
+*Example. *
+
+What is the probability it takes more than seven rolls of a fair die to
+roll a six?
+
+Let $X$ be the number of rolls of a fair die until the first six. Then
+$X\sim\text{ Geom}\left( \frac{1}{6} \right)$. Now we just want
+$P(X > 7)$.
+
+$$P(X > 7) = \sum_{k = 8}^{\infty}P(X = k) = \sum_{k = 8}^{\infty}\left( \frac{5}{6} \right)^{k - 1}\frac{1}{6}$$
+
+Re-indexing,
+
+$$\sum_{k = 8}^{\infty}\left( \frac{5}{6} \right)^{k - 1}\frac{1}{6} = \frac{1}{6}\left( \frac{5}{6} \right)^{7}\sum_{j = 0}^{\infty}\left( \frac{5}{6} \right)^{j}$$
+
+Now we calculate by standard methods:
+
+$$\frac{1}{6}\left( \frac{5}{6} \right)^{7}\sum_{j = 0}^{\infty}\left( \frac{5}{6} \right)^{j} = \frac{1}{6}\left( \frac{5}{6} \right)^{7} \cdot \frac{1}{1 - \frac{5}{6}} = \left( \frac{5}{6} \right)^{7}$$
+
+### Negative binomial
+
+As promised, here's the negative binomial.
+
+Consider a sequence of Bernoulli trials with the following
+characteristics:
+
+-   Each trial success or failure
+
+-   Prob. of success $p$ is same on each trial
+
+-   Trials are independent (notice they are not fixed to specific
+    number)
+
+-   Experiment continues until $r$ successes are observed, where $r$ is
+    a given parameter
+
+Then if $X$ is the number of trials necessary until $r$ successes are
+observed, we say $X$ is a **negative binomial** random variable.
+
+Immediately we see that the geometric distribution is just the negative
+binomial with $r = 1$.
+
+*Definition. *
+
+Let $k \in {\mathbb{Z}}^{+}$ and $0 < p \leq 1$. A random variable $X$
+has the negative binomial distribution with parameters
+$\left\{ k,p \right\}$ if the possible values of $X$ are the integers
+$\left\{ k,k + 1,k + 2,\ldots \right\}$ and the p.m.f. is
+
+$$P(X = n) = \begin{pmatrix}
+n - 1 \\
+k - 1
+\end{pmatrix}p^{k}(1 - p)^{n - k}\text{ for }n \geq k$$
+
+Abbreviate this by $X\sim\text{ Negbin}(k,p)$.
+
+*Example. *
+
+Steph Curry has a three point percentage of approx. $43\%$. What is the
+probability that Steph makes his third three-point basket on his
+$5^{\text{th}}$ attempt?
+
+Let $X$ be number of attempts required to observe the 3rd success. Then,
+
+$$X\sim\text{ Negbin}(k = 3,p = 0.43)$$
+
+So, $$\begin{aligned}
+P(X = 5) & = {\begin{pmatrix}
+5 - 1 \\
+3 - 1
+\end{pmatrix}(0.43)}^{3}(1 - 0.43)^{5 - 3} \\
+ & = \begin{pmatrix}
+4 \\
+2
+\end{pmatrix}(0.43)^{3}(0.57)^{2} \\
+ & \approx 0.155
+\end{aligned}$$
+
+### Poisson distribution
+
+This p.m.f. follows from the Taylor expansion
+
+$$e^{\lambda} = \sum_{k = 0}^{\infty}\frac{\lambda^{k}}{k!}$$
+
+which implies that
+
+$$\sum_{k = 0}^{\infty}e^{- \lambda}\frac{\lambda^{k}}{k!} = e^{- \lambda}e^{\lambda} = 1$$
+
+*Definition. *
+
+For an integer valued random variable $X$, we say
+$X\sim\text{ Poisson}(\lambda)$ if it has p.m.f.
+
+$$P(X = k) = e^{- \lambda}\frac{\lambda^{k}}{k!}$$
+
+for $k \in \left\{ 0,1,2,\ldots \right\}$ for $\lambda > 0$ and
+
+$$\sum_{k = 0}^{\infty}P(X = k) = 1$$
+
+The Poisson arises from the Binomial. It applies in the binomial context
+when $n$ is very large ($n \geq 100$) and $p$ is very small
+$p \leq 0.05$, such that $np$ is a moderate number ($np < 10$).
+
+Then $X$ follows a Poisson distribution with $\lambda = np$.
+
+$$P\left( \text{Bin}(n,p) = k \right) \approx P\left( \text{Poisson}(\lambda = np) = k \right)$$
+
+for $k = 0,1,\ldots,n$.
+
+The Poisson distribution is useful for finding the probabilities of rare
+events over a continuous interval of time. By knowing $\lambda = np$ for
+small $n$ and $p$, we can calculate many probabilities.
+
+*Example. *
+
+The number of typing errors in the page of a textbook.
+
+Let
+
+-   $n$ be the number of letters of symbols per page (large)
+
+-   $p$ be the probability of error, small enough such that
+
+-   $\lim\limits_{n \rightarrow \infty}\lim\limits_{p \rightarrow 0}np = \lambda = 0.1$
+
+What is the probability of exactly 1 error?
+
+We can approximate the distribution of $X$ with a
+$\text{Poisson}(\lambda = 0.1)$ distribution
+
+$$P(X = 1) = \frac{e^{- 0.1}(0.1)^{1}}{1!} = 0.09048$$
+
+## Continuous distributions
+
+All of the distributions we've been analyzing have been discrete, that
+is, they apply to random variables with a
+[countable](https://en.wikipedia.org/wiki/Countable_set) state space.
+Even when the state space is infinite, as in the negative binomial, it
+is countable. We can think of it as indexing each trial with a natural
+number $0,1,2,3,\ldots$.
+
+Now we turn our attention to continuous random variables that operate on
+uncountably infinite state spaces. For example, if we sample uniformly
+inside of the interval $\lbrack 0,1\rbrack$, there are an uncountably
+infinite number of possible values we could obtain. We cannot index
+these values by the natural numbers, by some theorems of set theory we
+in fact know that the interval $\lbrack 0,1\rbrack$ has a bijection to
+$\mathbb{R}$ and has cardinality $א_{1}$.
+
+Additionally we notice that asking for the probability that we pick a
+certain point in the interval $\lbrack 0,1\rbrack$ makes no sense, there
+are an infinite amount of sample points! Intuitively we should think
+that the probability of choosing any particular point is 0. However, we
+should be able to make statements about whether we can choose a point
+that lies within a subset, like $\lbrack 0,0.5\rbrack$.
+
+Let's formalize these ideas.
+
+*Definition. *
+
+Let $X$ be a random variable. If we have a function $f$ such that
+
+$$P(X \leq b) = \int_{- \infty}^{b}f(x)dx$$ for all
+$b \in {\mathbb{R}}$, then $f$ is the **probability density function**
+of $X$.
+
+The probability that the value of $X$ lies in $( - \infty,b\rbrack$
+equals the area under the curve of $f$ from $- \infty$ to $b$.
+
+If $f$ satisfies this definition, then for any $B \subset {\mathbb{R}}$
+for which integration makes sense,
+
+$$P(X \in B) = \int_{B}f(x)dx$$
+
+*Remark. *
+
+Recall from our previous discussion of random variables that the PDF is
+the analogue of the PMF for discrete random variables.
+
+Properties of a CDF:
+
+Any CDF $F(x) = P(X \leq x)$ satisfies
+
+1.  Integrates to unity: $F( - \infty) = 0$, $F(\infty) = 1$
+
+2.  $F(x)$ is non-decreasing in $x$ (monotonically increasing)
+
+$$s < t \Rightarrow F(s) \leq F(t)$$
+
+3.  $P(a < X \leq b) = P(X \leq b) - P(X \leq a) = F(b) - F(a)$
+
+Like we mentioned before, we can only ask about things like
+$P(X \leq k)$, but not $P(X = k)$. In fact $P(X = k) = 0$ for all $k$.
+An immediate corollary of this is that we can freely interchange $\leq$
+and $<$ and likewise for $\geq$ and $>$, since $P(X \leq k) = P(X < k)$
+if $P(X = k) = 0$.
+
+*Example. *
+
+Let $X$ be a continuous random variable with density (pdf)
+
+$$f(x) = \begin{cases}
+cx^{2} & \text{for }0 < x < 2 \\
+0 & \text{otherwise }
+\end{cases}$$
+
+1.  What is $c$?
+
+$c$ is such that
+$$1 = \int_{- \infty}^{\infty}f(x)dx = \int_{0}^{2}cx^{2}dx$$
+
+2.  Find the probability that $X$ is between 1 and 1.4.
+
+Integrate the curve between 1 and 1.4.
+
+$$\begin{array}{r}
+\int_{1}^{1.4}\frac{3}{8}x^{2}dx = \left( \frac{x^{3}}{8} \right)|_{1}^{1.4} \\
+ = 0.218
+\end{array}$$
+
+This is the probability that $X$ lies between 1 and 1.4.
+
+3.  Find the probability that $X$ is between 1 and 3.
+
+Idea: integrate between 1 and 3, be careful after 2.
+
+$$\int_{1}^{2}\frac{3}{8}x^{2}dx + \int_{2}^{3}0dx =$$
+
+4.  What is the CDF for $P(X \leq x)$? Integrate the curve to $x$.
+
+$$\begin{array}{r}
+F(x) = P(X \leq x) = \int_{- \infty}^{x}f(t)dt \\
+ = \int_{0}^{x}\frac{3}{8}t^{2}dt \\
+ = \frac{x^{3}}{8}
+\end{array}$$
+
+Important: include the range!
+
+$$F(x) = \begin{cases}
+0 & \text{for }x \leq 0 \\
+\frac{x^{3}}{8} & \text{for }0 < x < 2 \\
+1 & \text{for }x \geq 2
+\end{cases}$$
+
+5.  Find a point $a$ such that you integrate up to the point to find
+    exactly $\frac{1}{2}$
+
+the area.
+
+We want to find $\frac{1}{2} = P(X \leq a)$.
+
+$$\frac{1}{2} = P(X \leq a) = F(a) = \frac{a^{3}}{8} \Rightarrow a = \sqrt[3]{4}$$
+
+Now let us discuss some named continuous distributions.
+
+### The (continuous) uniform distribution
+
+The most simple and the best of the named distributions!
+
+*Definition. *
+
+Let $\lbrack a,b\rbrack$ be a bounded interval on the real line. A
+random variable $X$ has the uniform distribution on the interval
+$\lbrack a,b\rbrack$ if $X$ has the density function
+
+$$f(x) = \begin{cases}
+\frac{1}{b - a} & \text{for }x \in \lbrack a,b\rbrack \\
+0 & \text{for }x \notin \lbrack a,b\rbrack
+\end{cases}$$
+
+Abbreviate this by $X\sim\text{ Unif }\lbrack a,b\rbrack$.
+
+The graph of $\text{Unif }\lbrack a,b\rbrack$ is a constant line at
+height $\frac{1}{b - a}$ defined across $\lbrack a,b\rbrack$. The
+integral is just the area of a rectangle, and we can check it is 1.
+
+*Fact. *
+
+For $X\sim\text{ Unif }\lbrack a,b\rbrack$, its cumulative distribution
+function (CDF) is given by:
+
+$$F_{x}(x) = \begin{cases}
+0 & \text{for }x < a \\
+\frac{x - a}{b - a} & \text{for }x \in \lbrack a,b\rbrack \\
+1 & \text{for }x > b
+\end{cases}$$
+
+*Fact. *
+
+If $X\sim\text{ Unif }\lbrack a,b\rbrack$, and
+$\lbrack c,d\rbrack \subset \lbrack a,b\rbrack$, then
+$$P(c \leq X \leq d) = \int_{c}^{d}\frac{1}{b - a}dx = \frac{d - c}{b - a}$$
+
+*Example. *
+
+Let $Y$ be a uniform random variable on $\lbrack - 2,5\rbrack$. Find the
+probability that its absolute value is at least 1.
+
+$Y$ takes values in the interval $\lbrack - 2,5\rbrack$, so the absolute
+value is at least 1 iff.
+$Y \in \lbrack - 2,1\rbrack \cup \lbrack 1,5\rbrack$.
+
+The density function of $Y$ is
+$f(x) = \frac{1}{5 - ( - 2)} = \frac{1}{7}$ on $\lbrack - 2,5\rbrack$
+and 0 everywhere else.
+
+So,
+
+$$\begin{aligned}
+P\left( |Y| \geq 1 \right) & = P\left( Y \in \lbrack - 2, - 1\rbrack \cup \lbrack 1,5\rbrack \right) \\
+ & = P( - 2 \leq Y \leq - 1) + P(1 \leq Y \leq 5) \\
+ & = \frac{5}{7}
+\end{aligned}$$
+
+### The exponential distribution
+
+The geometric distribution can be viewed as modeling waiting times, in a
+discrete setting, i.e. we wait for $n - 1$ failures to arrive at the
+$n^{\text{th}}$ success.
+
+The exponential distribution is the continuous analogue to the geometric
+distribution, in that we often use it to model waiting times in the
+continuous sense. For example, the first custom to enter the barber
+shop.
+
+*Definition. *
+
+Let $0 < \lambda < \infty$. A random variable $X$ has the exponential
+distribution with parameter $\lambda$ if $X$ has PDF
+
+$$f(x) = \begin{cases}
+\lambda e^{- \lambda x} & \text{for }x \geq 0 \\
+0 & \text{for }x < 0
+\end{cases}$$
+
+Abbreviate this by $X\sim\text{ Exp}(\lambda)$, the exponential
+distribution with rate $\lambda$.
+
+The CDF of the $\text{Exp}(\lambda)$ distribution is given by:
+
+$$F(t) + \begin{cases}
+0 & \text{if }t < 0 \\
+1 - e^{- \lambda t} & \text{if }t \geq 0
+\end{cases}$$
+
+*Example. *
+
+Suppose the length of a phone call, in minutes, is well modeled by an
+exponential random variable with a rate $\lambda = \frac{1}{10}$.
+
+1.  What is the probability that a call takes more than 8 minutes?
+
+2.  What is the probability that a call takes between 8 and 22 minutes?
+
+Let $X$ be the length of the phone call, so that
+$X\sim\text{ Exp}\left( \frac{1}{10} \right)$. Then we can find the
+desired probability by:
+
+$$\begin{aligned}
+P(X > 8) & = 1 - P(X \leq 8) \\
+ & = 1 - F_{x}(8) \\
+ & = 1 - \left( 1 - e^{- \left( \frac{1}{10} \right) \cdot 8} \right) \\
+ & = e^{- \frac{8}{10}} \approx 0.4493
+\end{aligned}$$
+
+Now to find $P(8 < X < 22)$, we can take the difference in CDFs:
+
+$$\begin{aligned}
+ & P(X > 8) - P(X \geq 22) \\
+ & = e^{- \frac{8}{10}} - e^{- \frac{22}{10}} \\
+ & \approx 0.3385
+\end{aligned}$$
+
+*Fact (Memoryless property of the exponential distribution).*
+
+Suppose that $X\sim\text{ Exp}(\lambda)$. Then for any $s,t > 0$, we
+have $$P\left( X > t + s~|~X > t \right) = P(X > s)$$
+
+This is like saying if I've been waiting 5 minutes and then 3 minutes
+for the bus, what is the probability that I'm gonna wait more than 5 + 3
+minutes, given that I've already waited 5 minutes? And that's precisely
+equal to just the probability I'm gonna wait more than 3 minutes.
+
+*Proof. *
+
+$$\begin{array}{r}
+P\left( X > t + s~|~X > t \right) = \frac{P(X > t + s \cap X > t)}{P(X > t)} \\
+ = \frac{P(X > t + s)}{P(X > t)} = \frac{e^{- \lambda(t + s)}}{e^{- \lambda t}} = e^{- \lambda s} \\
+ \equiv P(X > s)
+\end{array}$$
+
+### Gamma distribution
+
+*Definition. *
+
+Let $r,\lambda > 0$. A random variable $X$ has the **gamma
+distribution** with parameters $(r,\lambda)$ if $X$ is nonnegative and
+has probability density function
+
+$$f(x) = \begin{cases}
+\frac{\lambda^{r}x^{r - 2}}{\Gamma(r)}e^{- \lambda x} & \text{for }x \geq 0 \\
+0 & \text{for }x < 0
+\end{cases}$$
+
+Abbreviate this by $X\sim\text{ Gamma}(r,\lambda)$.
+
+The gamma function $\Gamma(r)$ generalizes the factorial function and is
+defined as
+
+$$\Gamma(r) = \int_{0}^{\infty}x^{r - 1}e^{- x}dx,\text{ for }r > 0$$
+
+Special case: $\Gamma(n) = (n - 1)!$ if $n \in {\mathbb{Z}}^{+}$.
+
+*Remark. *
+
+The $\text{Exp}(\lambda)$ distribution is a special case of the gamma
+distribution, with parameter $r = 1$.
+
+## The normal distribution
+
+Also known as the Gaussian distribution, this is so important it gets
+its own section.
+
+*Definition. *
+
+A random variable $Z$ has the **standard normal distribution** if $Z$
+has density function
+
+$$\varphi(x) = \frac{1}{\sqrt{2\pi}}e^{- \frac{x^{2}}{2}}$$ on the real
+line. Abbreviate this by $Z\sim N(0,1)$.
+
+*Fact (CDF of a standard normal random variable).*
+
+Let $Z\sim N(0,1)$ be normally distributed. Then its CDF is given by
+$$\Phi(x) = \int_{- \infty}^{x}\varphi(s)ds = \int_{- \infty}^{x}\frac{1}{\sqrt{2\pi}}e^{\frac{- \left( - s^{2} \right)}{2}}ds$$
+
+The normal distribution is so important, instead of the standard
+$f_{Z(x)}$ and $F_{z(x)}$, we use the special $\varphi(x)$ and
+$\Phi(x)$.
+
+*Fact. *
+
+$$\int_{- \infty}^{\infty}e^{- \frac{s^{2}}{2}}ds = \sqrt{2\pi}$$
+
+No closed form of the standard normal CDF $\Phi$ exists, so we are left
+to either:
+
+-   approximate
+
+-   use technology (calculator)
+
+-   use the standard normal probability table in the textbook
+
+To evaluate negative values, we can use the symmetry of the normal
+distribution to apply the following identity:
+
+$$\Phi( - x) = 1 - \Phi(x)$$
+
+### General normal distributions
+
+We can compute any other parameters of the normal distribution using the
+standard normal.
+
+The general family of normal distributions is obtained by linear or
+affine transformations of $Z$. Let $\mu$ be real, and $\sigma > 0$, then
+
+$$X = \sigma Z + \mu$$ is also a normally distributed random variable
+with parameters $\left( \mu,\sigma^{2} \right)$. The CDF of $X$ in terms
+of $\Phi( \cdot )$ can be expressed as
+
+$$\begin{aligned}
+F_{X}(x) & = P(X \leq x) \\
+ & = P(\sigma Z + \mu \leq x) \\
+ & = P\left( Z \leq \frac{x - \mu}{\sigma} \right) \\
+ & = \Phi(\frac{x - \mu}{\sigma})
+\end{aligned}$$
+
+Also,
+
+$$f(x) = F\prime(x) = \frac{d}{dx}\left\lbrack \Phi(\frac{x - u}{\sigma}) \right\rbrack = \frac{1}{\sigma}\varphi(\frac{x - u}{\sigma}) = \frac{1}{\sqrt{2\pi\sigma^{2}}}e^{\frac{- \left( (x - \mu)^{2} \right)}{2\sigma^{2}}}$$
+
+*Definition. *
+
+Let $\mu$ be real and $\sigma > 0$. A random variable $X$ has the
+*normal distribution* with mean $\mu$ and variance $\sigma^{2}$ if $X$
+has density function
+
+$$f(x) = \frac{1}{\sqrt{2\pi\sigma^{2}}}e^{\frac{- \left( (x - \mu)^{2} \right)}{2\sigma^{2}}}$$
+
+on the real line. Abbreviate this by
+$X\sim N\left( \mu,\sigma^{2} \right)$.
+
+*Fact. *
+
+Let $X\sim N\left( \mu,\sigma^{2} \right)$ and $Y = aX + b$. Then
+$$Y\sim N\left( a\mu + b,a^{2}\sigma^{2} \right)$$
+
+That is, $Y$ is normally distributed with parameters
+$\left( a\mu + b,a^{2}\sigma^{2} \right)$. In particular,
+$$Z = \frac{X - \mu}{\sigma}\sim N(0,1)$$ is a standard normal variable.
+
+## Expectation
+
+Let's discuss the *expectation* of a random variable, which is a similar
+idea to the basic concept of *mean*.
+
+*Definition. *
+
+The expectation or mean of a discrete random variable $X$ is the
+weighted average, with weights assigned by the corresponding
+probabilities.
+
+$$E(X) = \sum_{\text{all }x_{i}}x_{i} \cdot p\left( x_{i} \right)$$
+
+*Example. *
+
+Find the expected value of a single roll of a fair die.
+
+-   $X = \frac{\text{ score }}{\text{ dots}}$
+
+-   $x = 1,2,3,4,5,6$
+
+-   $p(x) = \frac{1}{6},\frac{1}{6},\frac{1}{6},\frac{1}{6},\frac{1}{6},\frac{1}{6}$
+
+$$E\lbrack x\rbrack = 1 \cdot \frac{1}{6} + 2 \cdot \frac{1}{6}\ldots + 6 \cdot \frac{1}{6}$$
+
+### Binomial expected value
+
+$$E\lbrack x\rbrack = np$$
+
+### Bernoulli expected value
+
+Bernoulli is just binomial with one trial.
+
+Recall that $P(X = 1) = p$ and $P(X = 0) = 1 - p$.
+
+$$E\lbrack X\rbrack = 1 \cdot P(X = 1) + 0 \cdot P(X = 0) = p$$
+
+Let $A$ be an event on $\Omega$. Its *indicator random variable* $I_{A}$
+is defined for $\omega \in \Omega$ by
+
+$$I_{A}(\omega) = \begin{cases}
+1\text{, if  } & \omega \in A \\
+0\text{, if } & \omega \notin A
+\end{cases}$$
+
+$$E\left\lbrack I_{A} \right\rbrack = 1 \cdot P(A) = P(A)$$
+
+## Geometric expected value
+
+Let $p \in \lbrack 0,1\rbrack$ and $X\sim\text{ Geom}\lbrack p\rbrack$
+be a geometric RV with probability of success $p$. Recall that the
+p.m.f. is $pq^{k - 1}$, where prob. of failure is defined by
+$q ≔ 1 - p$.
+
+Then
+
+$$\begin{aligned}
+E\lbrack X\rbrack & = \sum_{k = 1}^{\infty}kpq^{k - 1} \\
+ & = p \cdot \sum_{k = 1}^{\infty}k \cdot q^{k - 1}
+\end{aligned}$$
+
+Now recall from calculus that you can differentiate a power series term
+by term inside its radius of convergence. So for $|t| < 1$,
+
+$$\begin{array}{r}
+\sum_{k = 1}^{\infty}kt^{k - 1} = \sum_{k = 1}^{\infty}\frac{d}{dt}t^{k} = \frac{d}{dt}\sum_{k = 1}^{\infty}t^{k} = \frac{d}{dt}\left( \frac{1}{1 - t} \right) = \frac{1}{(1 - t)^{2}} \\
+\therefore E\lbrack x\rbrack = \sum_{k = 1}^{\infty}kpq^{k - 1} = p\sum_{k = 1}^{\infty}kq^{k - 1} = p\left( \frac{1}{(1 - q)^{2}} \right) = \frac{1}{p}
+\end{array}$$
+
+### Expected value of a continuous RV
+
+*Definition. *
+
+The expectation or mean of a continuous random variable $X$ with density
+function $f$ is
+
+$$E\lbrack x\rbrack = \int_{- \infty}^{\infty}x \cdot f(x)dx$$
+
+An alternative symbol is $\mu = E\lbrack x\rbrack$.
+
+$\mu$ is the "first moment" of $X$, analogous to physics, it's the
+"center of gravity" of $X$.
+
+*Remark. *
+
+In general when moving between discrete and continuous RV, replace sums
+with integrals, p.m.f. with p.d.f., and vice versa.
+
+*Example. *
+
+Suppose $X$ is a continuous RV with p.d.f.
+
+$$f_{X}(x) = \begin{cases}
+2x\text{,  } & 0 < x < 1 \\
+0\text{, } & \text{elsewhere}
+\end{cases}$$
+
+$$E\lbrack X\rbrack = \int_{- \infty}^{\infty}x \cdot f(x)dx = \int_{0}^{1}x \cdot 2xdx = \frac{2}{3}$$
+
+*Example (Uniform expectation).*
+
+Let $X$ be a uniform random variable on the interval
+$\lbrack a,b\rbrack$ with $X\sim\text{ Unif}\lbrack a,b\rbrack$. Find
+the expected value of $X$.
+
+$$\begin{array}{r}
+E\lbrack X\rbrack = \int_{- \infty}^{\infty}x \cdot f(x)dx = \int_{a}^{b}\frac{x}{b - a}dx \\
+ = \frac{1}{b - a}\int_{a}^{b}xdx = \frac{1}{b - a} \cdot \frac{b^{2} - a^{2}}{2} = \underset{\text{ midpoint formula}}{\underbrace{\frac{b + a}{2}}}
+\end{array}$$
+
+*Example (Exponential expectation).*
+
+Find the expected value of an exponential RV, with p.d.f.
+
+$$f_{X}(x) = \begin{cases}
+\lambda e^{- \lambda x}\text{,  } & x > 0 \\
+0\text{, } & \text{elsewhere}
+\end{cases}$$
+
+$$\begin{array}{r}
+E\lbrack x\rbrack = \int_{- \infty}^{\infty}x \cdot f(x)dx = \int_{0}^{\infty}x \cdot \lambda e^{- \lambda x}dx \\
+ = \lambda \cdot \int_{0}^{\infty}x \cdot e^{- \lambda x}dx \\
+ = \lambda \cdot \left\lbrack \left. -x\frac{1}{\lambda}e^{- \lambda x} \right|_{x = 0}^{x = \infty} - \int_{0}^{\infty} - \frac{1}{\lambda}e^{- \lambda x}dx \right\rbrack \\
+ = \frac{1}{\lambda}
+\end{array}$$
+
+*Example (Uniform dartboard).*
+
+Our dartboard is a disk of radius $r_{0}$ and the dart lands uniformly
+at random on the disk when thrown. Let $R$ be the distance of the dart
+from the center of the disk. Find $E\lbrack R\rbrack$ given density
+function
+
+$$f_{R}(t) = \begin{cases}
+\frac{2t}{r_{0}^{2}}\text{,  } & 0 \leq t \leq r_{0} \\
+0\text{,  } & t < 0\text{ or }t > r_{0}
+\end{cases}$$
+
+$$\begin{array}{r}
+E\lbrack R\rbrack = \int_{- \infty}^{\infty}tf_{R}(t)dt \\
+ = \int_{0}^{r_{0}}t \cdot \frac{2t}{r_{0}^{2}}dt \\
+ = \frac{2}{3}r_{0}
+\end{array}$$
+
+### Expectation of derived values
+
+If we can find the expected value of $X$, can we find the expected value
+of $X^{2}$? More precisely, can we find
+$E\left\lbrack X^{2} \right\rbrack$?
+
+If the distribution is easy to see, then this is trivial. Otherwise we
+have the following useful property:
+
+$$E\left\lbrack X^{2} \right\rbrack = \int_{\text{all }x}x^{2}f_{X}(x)dx$$
+
+(for continuous RVs).
+
+And in the discrete case,
+
+$$E\left\lbrack X^{2} \right\rbrack = \sum_{\text{all }x}x^{2}p_{X}(x)$$
+
+In fact $E\left\lbrack X^{2} \right\rbrack$ is so important that we call
+it the **mean square**.
+
+*Fact. *
+
+More generally, a real valued function $g(X)$ defined on the range of
+$X$ is itself a random variable (with its own distribution).
+
+We can find expected value of $g(X)$ by
+
+$$E\left\lbrack g(x) \right\rbrack = \int_{- \infty}^{\infty}g(x)f(x)dx$$
+
+or
+
+$$E\left\lbrack g(x) \right\rbrack = \sum_{\text{all }x}g(x)f(x)$$
+
+*Example. *
+
+You roll a fair die to determine the winnings (or losses) $W$ of a
+player as follows:
+
+$$W = \begin{cases}
+ - 1,\ if\ the\ roll\ is\ 1,\ 2,\ or\ 3 \\
+1,\ if\ the\ roll\ is\ a\ 4 \\
+3,\ if\ the\ roll\ is\ 5\ or\ 6
+\end{cases}$$
+
+What is the expected winnings/losses for the player during 1 roll of the
+die?
+
+Let $X$ denote the outcome of the roll of the die. Then we can define
+our random variable as $W = g(X)$ where the function $g$ is defined by
+$g(1) = g(2) = g(3) = - 1$ and so on.
+
+Note that $P(W = - 1) = P(X = 1 \cup X = 2 \cup X = 3) = \frac{1}{2}$.
+Likewise $P(W = 1) = P(X = 4) = \frac{1}{6}$, and
+$P(W = 3) = P(X = 5 \cup X = 6) = \frac{1}{3}$.
+
+Then $$\begin{array}{r}
+E\left\lbrack g(X) \right\rbrack = E\lbrack W\rbrack = ( - 1) \cdot P(W = - 1) + (1) \cdot P(W = 1) + (3) \cdot P(W = 3) \\
+ = - \frac{1}{2} + \frac{1}{6} + 1 = \frac{2}{3}
+\end{array}$$
+
+*Example. *
+
+A stick of length $l$ is broken at a uniformly chosen random location.
+What is the expected length of the longer piece?
+
+Idea: if you break it before the halfway point, then the longer piece
+has length given by $l - x$. If you break it after the halfway point,
+the longer piece has length $x$.
+
+Let the interval $\lbrack 0,l\rbrack$ represent the stick and let
+$X\sim\text{ Unif}\lbrack 0,l\rbrack$ be the location where the stick is
+broken. Then $X$ has density $f(x) = \frac{1}{l}$ on
+$\lbrack 0,l\rbrack$ and 0 elsewhere.
+
+Let $g(x)$ be the length of the longer piece when the stick is broken at
+$x$,
+
+$$g(x) = \begin{cases}
+1 - x\text{,  } & 0 \leq x < \frac{l}{2} \\
+x\text{,  } & \frac{1}{2} \leq x \leq l
+\end{cases}$$
+
+Then $$\begin{array}{r}
+E\left\lbrack g(X) \right\rbrack = \int_{- \infty}^{\infty}g(x)f(x)dx = \int_{0}^{\frac{l}{2}}\frac{l - x}{l}dx + \int_{\frac{l}{2}}^{l}\frac{x}{l}dx \\
+ = \frac{3}{4}l
+\end{array}$$
+
+So we expect the longer piece to be $\frac{3}{4}$ of the total length,
+which is a bit pathological.
+
+### Moments of a random variable
+
+We continue discussing expectation but we introduce new terminology.
+
+*Fact. *
+
+The $n^{\text{th}}$ moment (or $n^{\text{th}}$ raw moment) of a discrete
+random variable $X$ with p.m.f. $p_{X}(x)$ is the expectation
+
+$$E\left\lbrack X^{n} \right\rbrack = \sum_{k}k^{n}p_{X}(k) = \mu_{n}$$
+
+If $X$ is continuous, then we have analogously
+
+$$E\left\lbrack X^{n} \right\rbrack = \int_{- \infty}^{\infty}x^{n}f_{X}(x) = \mu_{n}$$
+
+The **deviation** is given by $\sigma$ and the **variance** is given by
+$\sigma^{2}$ and
+
+$$\sigma^{2} = \mu_{2} - \left( \mu_{1} \right)^{2}$$
+
+$\mu_{3}$ is used to measure "skewness" / asymmetry of a distribution.
+For example, the normal distribution is very symmetric.
+
+$\mu_{4}$ is used to measure kurtosis/peakedness of a distribution.
+
+### Central moments
+
+Previously we discussed "raw moments." Be careful not to confuse them
+with *central moments*.
+
+*Fact. *
+
+The $n^{\text{th}}$ central moment of a discrete random variable $X$
+with p.m.f. $p_{X}(x)$ is the expected value of the difference about the
+mean raised to the $n^{\text{th}}$ power
+
+$$E\left\lbrack (X - \mu)^{n} \right\rbrack = \sum_{k}(k - \mu)^{n}p_{X}(k) = \mu\prime_{n}$$
+
+And of course in the continuous case,
+
+$$E\left\lbrack (X - \mu)^{n} \right\rbrack = \int_{- \infty}^{\infty}(x - \mu)^{n}f_{X}(x) = \mu\prime_{n}$$
+
+In particular,
+
+$$\begin{array}{r}
+\mu\prime_{1} = E\left\lbrack (X - \mu)^{1} \right\rbrack = \int_{- \infty}^{\infty}(x - \mu)^{1}f_{X}(x)dx \\
+ = \int_{\infty}^{\infty}xf_{X}(x)dx = \int_{- \infty}^{\infty}\mu f_{X}(x)dx = \mu - \mu \cdot 1 = 0 \\
+\mu\prime_{2} = E\left\lbrack (X - \mu)^{2} \right\rbrack = \sigma_{X}^{2} = \text{ Var}(X)
+\end{array}$$
+
+*Example. *
+
+Let $Y$ be a uniformly chosen integer from
+$\left\{ 0,1,2,\ldots,m \right\}$. Find the first and second moment of
+$Y$.
+
+The p.m.f. of $Y$ is $p_{Y}(k) = \frac{1}{m + 1}$ for
+$k \in \lbrack 0,m\rbrack$. Thus,
+
+$$\begin{array}{r}
+E\lbrack Y\rbrack = \sum_{k = 0}^{m}k\frac{1}{m + 1} = \frac{1}{m + 1}\sum_{k = 0}^{m}k \\
+ = \frac{m}{2}
+\end{array}$$
+
+Then,
+
+$$E\left\lbrack Y^{2} \right\rbrack = \sum_{k = 0}^{m}k^{2}\frac{1}{m + 1} = \frac{1}{m + 1} = \frac{m(2m + 1)}{6}$$
+
+*Example. *
+
+Let $c > 0$ and let $U$ be a uniform random variable on the interval
+$\lbrack 0,c\rbrack$. Find the $n^{\text{th}}$ moment for $U$ for all
+positive integers $n$.
+
+The density function of $U$ is
+
+$$f(x) = \begin{cases}
+\frac{1}{c}\text{, if } & x \in \lbrack 0,c\rbrack \\
+0\text{,  } & \text{otherwise}
+\end{cases}$$
+
+Therefore the $n^{\text{th}}$ moment of $U$ is,
+
+$$E\left\lbrack U^{n} \right\rbrack = \int_{- \infty}^{\infty}x^{n}f(x)dx$$
+
+*Example. *
+
+Suppose the random variable $X\sim\text{ Exp}(\lambda)$. Find the second
+moment of $X$.
+
+$$\begin{array}{r}
+E\left\lbrack X^{2} \right\rbrack = \int_{0}^{\infty}x^{2}\lambda e^{- \lambda x}dx \\
+ = \frac{1}{\lambda^{2}}\int_{0}^{\infty}u^{2}e^{- u}du \\
+ = \frac{1}{\lambda^{2}}\Gamma(2 + 1) = \frac{2!}{\lambda^{2}}
+\end{array}$$
+
+*Fact. *
+
+In general, to find teh $n^{\text{th}}$ moment of
+$X\sim\text{ Exp}(\lambda)$,
+$$E\left\lbrack X^{n} \right\rbrack = \int_{0}^{\infty}x^{n}\lambda e^{- \lambda x}dx = \frac{n!}{\lambda^{n}}$$
+
+### Median and quartiles
+
+When a random variable has rare (abnormal) values, its expectation may
+be a bad indicator of where the center of the distribution lies.
+
+*Definition. *
+
+The **median** of a random variable $X$ is any real value $m$ that
+satisfies
+
+$$P(X \geq m) \geq \frac{1}{2}\text{ and }P(X \leq m) \geq \frac{1}{2}$$
+
+With half the probability on both $\left\{ X \leq m \right\}$ and
+$\left\{ X \geq m \right\}$, the median is representative of the
+midpoint of the distribution. We say that the median is more *robust*
+because it is less affected by outliers. It is not necessarily unique.
+
+*Example. *
+
+Let $X$ be discretely uniformly distributed in the set
+$\left\{ - 100,1,2,,3,\ldots,9 \right\}$ so $X$ has probability mass
+function $$p_{X}( - 100) = p_{X}(1) = \cdots = p_{X}(9)$$
+
+Find the expected value and median of $X$.
+
+$$E\lbrack X\rbrack = ( - 100) \cdot \frac{1}{10} + (1) \cdot \frac{1}{10} + \cdots + (9) \cdot \frac{1}{10} = - 5.5$$
+
+While the median is any number $m \in \lbrack 4,5\rbrack$.
+
+The median reflects the fact that 90% of the values and probability is
+in the range $1,2,\ldots,9$ while the mean is heavily influenced by the
+$- 100$ value.
diff --git a/typst/2025-02-16-probability-distributions.typ b/typst/2025-02-16-probability-distributions.typ
new file mode 100644
index 0000000..9a0dda7
--- /dev/null
+++ b/typst/2025-02-16-probability-distributions.typ
@@ -0,0 +1,1304 @@
+#let callout = type => (title: "", content) => [
+  #if title != "" [
+    _#type (#title)._
+  ] else [
+    _#type. _
+  ]
+  #content
+]
+
+#let fact = callout("Fact")
+#let thm = callout("Theorem")
+#let proof = callout("Proof")
+#let remark = callout("Remark")
+#let definition = callout("Definition")
+#let abuse = callout("Abuse of notation")
+#let exercise = callout("Exercise")
+#let example = callout("Example")
+
+These are some notes I've been collecting on random variables, their
+distributions, expected values, and moment generating functions. I thought I'd
+write them down somewhere useful.
+
+These are almost extracted verbatim from my in-class notes, which I take in real
+time using Typst. I simply wrote a tiny compatibility shim to allow Pandoc to
+render them.
+
+== Random variables
+
+First, some brief exposition on random variables. Quixotically, a random
+variable is actually a function.
+
+Standard notation, $Omega$ is sample space, $omega$ is an event.
+
+#definition[
+  A *random variable* $X$ is a function $X : Omega -> RR$ that gives the
+  probability of an event $omega in Omega$.
+]
+
+#definition[
+  The *state space* of a random variable $X$ is all of the values $X$ can take.
+]
+
+#example[
+  Let $X$ be a random variable that takes on the values ${0,1,2,3}$. Then the
+  state space of $X$ is the set ${0,1,2,3}$.
+]
+
+=== Discrete random variables
+
+A random variable $X$ is discrete if there is countable $A$ such that $P(X in
+A) = 1$. $k$ is a possible value if $P(X = k) > 0$. We discuss continuous
+random variables later.
+
+The _probability distribution_ of $X$ gives its important probabilistic
+information. The probability distribution is a description of the probabilities
+$P(X in B)$ for subsets $B in RR$. We describe the probability density function
+and the cumulative distribution function.
+
+A discrete random variable has probability distribution entirely determined by
+its probability mass function (hereafter abbreviated p.m.f or PMF) $p(k) = P(X
+= k)$. The p.m.f. is a function from the set of possible values of $X$ into
+$[0,1]$. Labeling the p.m.f. with the random variable is done by $p_X (k)$.
+
+$
+  p_X : "State space of" X -> [0,1]
+$
+
+By the axioms of probability,
+
+$
+  sum_k p_X (k) = sum_k P(X=k) = 1
+$
+
+For a subset $B subset RR$,
+
+$
+  P(X in B) = sum_(k in B) p_X (k)
+$
+
+=== Continuous random variables
+
+Now as promised we introduce another major class of random variables.
+
+#definition[
+  Let $X$ be a random variable. If $f$ satisfies
+
+  $
+    P(X <= b) = integral^b_(-infinity) f(x) dif x
+  $
+
+  for all $b in RR$, then $f$ is the *probability density function* (hereafter
+  abbreviated p.d.f. or PDF) of $X$.
+]
+
+We immediately see that the p.d.f. is analogous to the c.d.f. of the discrete case.
+
+The probability that $X in (-infinity, b]$ is equal to the area under the graph
+of $f$ from $-infinity$ to $b$.
+
+A corollary is the following.
+
+#fact[
+  $ P(X in B) = integral_B f(x) dif x $
+]
+
+for any $B subset RR$ where integration makes sense.
+
+The set can be bounded or unbounded, or any collection of intervals.
+
+#fact[
+  $ P(a <= X <= b) = integral_a^b f(x) dif x $
+  $ P(X > a) = integral_a^infinity f(x) dif x $
+]
+
+#fact[
+  If a random variable $X$ has density function $f$ then individual point
+  values have probability zero:
+
+  $ P(X = c) = integral_c^c f(x) dif x = 0, forall c in RR $
+]
+
+#remark[
+  It follows a random variable with a density function is not discrete. An
+  immediate corollary of this is that the probabilities of intervals are not
+  changed by including or excluding endpoints. So $P(X <= k)$ and $P(X < k)$ are equivalent.
+]
+
+How to determine which functions are p.d.f.s? Since $P(-infinity < X <
+infinity) = 1$, a p.d.f. $f$ must satisfy
+
+$
+  f(x) >= 0 forall x in RR \
+  integral^infinity_(-infinity) f(x) dif x = 1
+$
+
+#fact[
+  Random variables with density functions are called _continuous_ random
+  variables. This does not imply that the random variable is a continuous
+  function on $Omega$ but it is standard terminology.
+]
+
+== Discrete distributions
+
+Recall that the _probability distribution_ of $X$ gives its important probabilistic
+information. Let us discuss some of these distributions.
+
+In general we first consider the experiment's properties and theorize about the
+distribution that its random variable takes. We can then apply the distribution
+to find out various pieces of probabilistic information.
+
+=== Bernoulli trials
+
+A Bernoulli trial is the original "experiment." It's simply a single trial with
+a binary "success" or "failure" outcome. Encode this T/F, 0 or 1, or however
+you'd like. It becomes immediately useful in defining more complex
+distributions, so let's analyze its properties.
+
+The setup: the experiment has exactly two outcomes:
+- Success -- $S$ or 1
+- Failure -- $F$ or 0
+
+Additionally:
+$
+  P(S) = p, (0 < p < 1) \
+  P(F) = 1 - p = q
+$
+
+Construct the probability mass function:
+
+$
+  P(X = 1) = p \
+  P(X = 0) = 1 - p
+$
+
+Write it as:
+
+$ p_x(k) = p^k (1-p)^(1-k) $
+
+for $k = 1$ and $k = 0$.
+
+=== Binomial distribution
+
+The setup: very similar to Bernoulli, trials have exactly 2 outcomes. A bunch
+of Bernoulli trials in a row.
+
+Importantly: $p$ and $q$ are defined exactly the same in all trials.
+
+This ties the binomial distribution to the sampling with replacement model,
+since each trial does not affect the next.
+
+We conduct $n$ *independent* trials of this experiment. Example with coins: each
+flip independently has a $1/2$ chance of heads or tails (holds same for die,
+rigged coin, etc).
+
+$n$ is fixed, i.e. known ahead of time.
+
+==== Binomial random variable
+
+Let's consider the random variable characterized by the binomial distribution now.
+
+Let $X = hash$ of successes in $n$ independent trials. For any particular
+sequence of $n$ trials, it takes the form $Omega = {omega} "where" omega = S
+  F F dots.c F$ and is of length $n$.
+
+Then $X(omega) = 0,1,2,...,n$ can take $n + 1$ possible values. The
+probability of any particular sequence is given by the product of the
+individual trial probabilities.
+
+#example[
+  $ omega = S F F S F dots.c S = (p q q p q dots.c p) $
+]
+
+So $P(x = 0) = P(F F F dots.c F) = q dot q dot dots.c dot q = q^n$.
+
+And
+$
+  P(X = 1) = P(S F F dots.c F) + P(F S F F dots.c F) + dots.c + P(F F F dots.c F S) \
+  = underbrace(n, "possible outcomes") dot p^1 dot p^(n-1) \
+  = vec(n, 1) dot p^1 dot p^(n-1) \
+  = n dot p^1 dot p^(n-1)
+$
+
+Now we can generalize
+
+$
+  P(X = 2) = vec(n,2) p^2 q^(n-2)
+$
+
+How about all successes?
+
+$
+  P(X = n) = P(S S dots.c S) = p^n
+$
+
+We see that for all failures we have $q^n$ and all successes we have $p^n$.
+Otherwise we use our method above.
+
+In general, here is the probability mass function for the binomial random variable
+
+$
+  P(X = k) = vec(n, k) p^k q^(n-k), "for" k = 0,1,2,...,n
+$
+
+
+Binomial distribution is very powerful. Choosing between two things, what are the probabilities?
+
+To summarize the characterization of the binomial random variable:
+
+- $n$ independent trials
+- each trial results in binary success or failure
+- with probability of success $p$, identically across trials
+
+with $X = hash$ successes in *fixed* $n$ trials.
+
+$ X ~ "Bin"(n,p) $
+
+with probability mass function
+
+$
+  P(X = x) = vec(n,x) p^x (1 - p)^(n-x) = p(x) "for" x = 0,1,2,...,n
+$
+
+We see this is in fact the binomial theorem!
+
+$
+  p(x) >= 0, sum^n_(x=0) p(x) = sum^n_(x=0) vec(n,x) p^x q^(n-x) = (p + q)^n
+$
+
+In fact,
+$
+  (p + q)^n = (p + (1 - p))^n = 1
+$
+
+#example[
+  What is the probability of getting exactly three aces (1's) out of 10 throws
+  of a fair die?
+
+  Seems a little trickier but we can still write this as well defined $S$/$F$.
+  Let $S$ be getting an ace and $F$ being anything else.
+
+  Then $p = 1/6$ and $n = 10$. We want $P(X=3)$. So
+
+  $
+    P(X=3) = vec(10,3) p^3 q^7 = vec(10,3) (1 / 6)^3 (5 / 6)^7 \
+    approx 0.15505
+  $
+]
+
+==== With or without replacement?
+
+I place particular emphasis on the fact that the binomial distribution
+generally applies to cases where you're sampling with _replacement_. Consider
+the following:
+#example[
+  Suppose we have two types of candy, red and black. Select $n$ candies. Let $X$
+  be the number of red candies among $n$ selected.
+
+  2 cases.
+
+  - case 1: with replacement: Binomial Distribution, $n$, $p = a/(a + b)$.
+  $ P(X = 2) = vec(n,2) (a / (a+b))^2 (b / (a+b))^(n-2) $
+  - case 2: without replacement: then use counting
+  $ P(X = x) = (vec(a,x) vec(b,n-x)) / vec(a+b,n) = p(x) $
+]
+
+In case 2, we used the elementary counting techniques we are already familiar
+with. Immediately we see a distinct case similar to the binomial but when
+sampling without replacement. Let's formalize this as a random variable!
+
+=== Hypergeometric distribution
+
+Let's introduce a random variable to represent a situation like case 2 above.
+
+#definition[
+  $ P(X = x) = (vec(a,x) vec(b,n-x)) / vec(a+b,n) = p(x) $
+
+  is known as a *Hypergeometric distribution*.
+]
+
+Abbreviate this by:
+
+$ X ~ "Hypergeom"(hash "total", hash "successes", "sample size") $
+
+For example,
+
+$ X ~ "Hypergeom"(N, N_a, n) $
+
+#remark[
+  If $x$ is very small relative to $a + b$, then both cases give similar (approx.
+  the same) answers.
+]
+
+For instance, if we're sampling for blood types from UCSB, and we take a
+student out without replacement, we don't really change the sample size
+substantially. So both answers give a similar result.
+
+Suppose we have two types of items, type $A$ and type $B$. Let $N_A$ be $hash$
+type $A$, $N_B$ $hash$ type $B$. $N = N_A + N_B$ is the total number of
+objects.
+
+We sample $n$ items *without replacement* ($n <= N$) with order not mattering.
+Denote by $X$ the number of type $A$ objects in our sample.
+
+#definition[
+  Let $0 <= N_A <= N$ and $1 <= n <= N$ be integers. A random variable $X$ has the *hypergeometric distribution* with parameters $(N, N_A, n)$ if $X$ takes values in the set ${0,1,...,n}$ and has p.m.f.
+
+  $ P(X = k) = (vec(N_A,k) vec(N-N_A,n-k)) / vec(N,n) = p(k) $
+]
+
+#example[
+  Let $N_A = 10$ defectives. Let $N_B = 90$ non-defectives. We select $n=5$ without replacement. What is the probability that 2 of the 5 selected are defective?
+
+  $
+    X ~ "Hypergeom" (N = 100, N_A = 10, n = 5)
+  $
+
+  We want $P(X=2)$.
+
+  $
+    P(X=2) = (vec(10,2) vec(90,3)) / vec(100,5) approx 0.0702
+  $
+]
+
+#remark[
+  Make sure you can distinguish when a problem is binomial or when it is
+  hypergeometric. This is very important on exams.
+
+  Recall that both ask about number of successes, in a fixed number of trials.
+  But binomial is sample with replacement (each trial is independent) and
+  sampling without replacement is hypergeometric.
+]
+
+=== Geometric distribution
+
+Consider an infinite sequence of independent trials. e.g. number of attempts until I make a basket.
+
+In fact we can think of this as a variation on the binomial distribution.
+But in this case we don't sample $n$ times and ask how many successes we have,
+we sample as many times as we need for _one_ success. Later on we'll see this
+is really a specific case of another distribution, the _negative binomial_.
+
+Let $X_i$ denote the outcome of the $i^"th"$ trial, where success is 1 and
+failure is 0. Let $N$ be the number of trials needed to observe the first
+success in a sequence of independent trials with probability of success $p$.
+Then
+
+We fail $k-1$ times and succeed on the $k^"th"$ try. Then:
+
+$
+  P(N = k) = P(X_1 = 0, X_2 = 0, ..., X_(k-1) = 0, X_k = 1) = (1 - p)^(k-1) p
+$
+
+This is the probability of failures raised to the amount of failures, times
+probability of success.
+
+The key characteristic in these trials, we keep going until we succeed. There's
+no $n$ choose $k$ in front like the binomial distribution because there's
+exactly one sequence that gives us success.
+
+#definition[
+  Let $0 < p <= 1$. A random variable $X$ has the geometric distribution with
+  success parameter $p$ if the possible values of $X$ are ${1,2,3,...}$ and $X$
+  satisfies
+
+  $
+    P(X=k) = (1-p)^(k-1) p
+  $
+
+  for positive integers $k$. Abbreviate this by $X ~ "Geom"(p)$.
+]
+
+#example[
+  What is the probability it takes more than seven rolls of a fair die to roll a
+  six?
+
+  Let $X$ be the number of rolls of a fair die until the first six. Then $X ~
+  "Geom"(1/6)$. Now we just want $P(X > 7)$.
+
+  $
+    P(X > 7) = sum^infinity_(k=8) P(X=k) = sum^infinity_(k=8) (5 / 6)^(k-1) 1 / 6
+  $
+
+  Re-indexing,
+
+  $
+    sum^infinity_(k=8) (5 / 6)^(k-1) 1 / 6 = 1 / 6 (5 / 6)^7 sum^infinity_(j=0) (5 / 6)^j
+  $
+
+  Now we calculate by standard methods:
+
+  $
+    1 / 6 (5 / 6)^7 sum^infinity_(j=0) (5 / 6)^j = 1 / 6 (5 / 6)^7 dot 1 / (1-5 / 6) =
+    (5 / 6)^7
+  $
+]
+
+=== Negative binomial
+
+As promised, here's the negative binomial.
+
+Consider a sequence of Bernoulli trials with the following characteristics:
+
+- Each trial success or failure
+- Prob. of success $p$ is same on each trial
+- Trials are independent (notice they are not fixed to specific number)
+- Experiment continues until $r$ successes are observed, where $r$ is a given parameter
+
+Then if $X$ is the number of trials necessary until $r$ successes are observed,
+we say $X$ is a *negative binomial* random variable.
+
+Immediately we see that the geometric distribution is just the negative binomial with $r = 1$.
+
+#definition[
+  Let $k in ZZ^+$ and $0 < p <= 1$. A random variable $X$ has the negative
+  binomial distribution with parameters ${k,p}$ if the possible values of $X$
+  are the integers ${k,k+1, k+2, ...}$ and the p.m.f. is
+
+  $
+    P(X = n) = vec(n-1, k-1) p^k (1-p)^(n-k) "for" n >= k
+  $
+
+  Abbreviate this by $X ~ "Negbin"(k,p)$.
+]
+
+#example[
+  Steph Curry has a three point percentage of approx. $43%$. What is the
+  probability that Steph makes his third three-point basket on his $5^"th"$
+  attempt?
+
+  Let $X$ be number of attempts required to observe the 3rd success. Then,
+
+  $
+    X ~ "Negbin"(k = 3, p = 0.43)
+  $
+
+  So,
+  $
+    P(X = 5) &= vec(5-1,3-1)(0.43)^3 (1 - 0.43)^(5-3) \
+    &= vec(4,2) (0.43)^3 (0.57)^2 \
+    &approx 0.155
+  $
+]
+
+=== Poisson distribution
+
+This p.m.f. follows from the Taylor expansion
+
+$
+  e^lambda = sum_(k=0)^infinity lambda^k / k!
+$
+
+which implies that
+
+$
+  sum_(k=0)^infinity e^(-lambda) lambda^k / k! = e^(-lambda) e^lambda = 1
+$
+
+#definition[
+  For an integer valued random variable $X$, we say $X ~ "Poisson"(lambda)$ if it has p.m.f.
+
+  $ P(X = k) = e^(-lambda) lambda^k / k! $
+
+  for $k in {0,1,2,...}$ for $lambda > 0$ and
+
+  $
+    sum_(k = 0)^infinity P(X=k) = 1
+  $
+]
+
+The Poisson arises from the Binomial. It applies in the binomial context when
+$n$ is very large ($n >= 100$) and $p$ is very small $p <= 0.05$, such that $n
+p$ is a moderate number ($n p < 10$).
+
+Then $X$ follows a Poisson distribution with $lambda = n p$.
+
+$
+  P("Bin"(n,p) = k) approx P("Poisson"(lambda = n p) = k)
+$
+
+for $k = 0,1,...,n$.
+
+The Poisson distribution is useful for finding the probabilities of rare events
+over a continuous interval of time. By knowing $lambda = n p$ for small $n$ and
+$p$, we can calculate many probabilities.
+
+#example[
+  The number of typing errors in the page of a textbook.
+
+  Let
+
+  - $n$ be the number of letters of symbols per page (large)
+  - $p$ be the probability of error, small enough such that
+  - $lim_(n -> infinity) lim_(p -> 0) n p = lambda = 0.1$
+
+  What is the probability of exactly 1 error?
+
+  We can approximate the distribution of $X$ with a $"Poisson"(lambda = 0.1)$
+  distribution
+
+  $
+    P(X = 1) = (e^(-0.1) (0.1)^1) / 1! = 0.09048
+  $
+]
+
+== Continuous distributions
+
+All of the distributions we've been analyzing have been discrete, that is, they
+apply to random variables with a
+#link("https://en.wikipedia.org/wiki/Countable_set")[countable] state space.
+Even when the state space is infinite, as in the negative binomial, it is
+countable. We can think of it as indexing each trial with a natural number
+$0,1,2,3,...$.
+
+Now we turn our attention to continuous random variables that operate on
+uncountably infinite state spaces. For example, if we sample uniformly inside
+of the interval $[0,1]$, there are an uncountably infinite number of possible
+values we could obtain. We cannot index these values by the natural numbers, by
+some theorems of set theory we in fact know that the interval $[0,1]$ has a
+bijection to $RR$ and has cardinality $aleph_1$.
+
+Additionally we notice that asking for the probability that we pick a certain
+point in the interval $[0,1]$ makes no sense, there are an infinite amount of
+sample points! Intuitively we should think that the probability of choosing any
+particular point is 0. However, we should be able to make statements about
+whether we can choose a point that lies within a subset, like $[0,0.5]$.
+
+Let's formalize these ideas.
+
+#definition[
+  Let $X$ be a random variable. If we have a function $f$ such that
+
+  $
+    P(X <= b) = integral^b_(-infinity) f(x) dif x
+  $
+  for all $b in RR$, then $f$ is the *probability density function* of $X$.
+]
+
+The probability that the value of $X$ lies in $(-infinity, b]$ equals the area
+under the curve of $f$ from $-infinity$ to $b$.
+
+If $f$ satisfies this definition, then for any $B subset RR$ for which integration makes sense,
+
+$
+  P(X in B) = integral_B f(x) dif x
+$
+
+#remark[
+  Recall from our previous discussion of random variables that the PDF is the
+  analogue of the PMF for discrete random variables.
+]
+
+Properties of a CDF:
+
+Any CDF $F(x) = P(X <= x)$ satisfies
+
+1. Integrates to unity: $F(-infinity) = 0$, $F(infinity) = 1$
+2. $F(x)$ is non-decreasing in $x$ (monotonically increasing)
+$ s < t => F(s) <= F(t) $
+3. $P(a < X <= b) = P(X <= b) - P(X <= a) = F(b) - F(a)$
+
+Like we mentioned before, we can only ask about things like $P(X <= k)$, but
+not $P(X = k)$. In fact $P(X = k) = 0$ for all $k$. An immediate corollary of
+this is that we can freely interchange $<=$ and $<$ and likewise for $>=$ and
+$>$, since $P(X <= k) = P(X < k)$ if $P(X = k) = 0$.
+
+#example[
+  Let $X$ be a continuous random variable with density (pdf)
+
+  $
+    f(x) = cases(
+c x^2 &"for" 0 < x < 2,
+0 &"otherwise"
+)
+  $
+
+  1. What is $c$?
+
+  $c$ is such that
+  $
+    1 = integral^infinity_(-infinity) f(x) dif x = integral_0^2 c x^2 dif x
+  $
+
+  2. Find the probability that $X$ is between 1 and 1.4.
+
+  Integrate the curve between 1 and 1.4.
+
+  $
+    integral_1^1.4 3 / 8 x^2 dif x = (x^3 / 8) |_1^1.4 \
+    = 0.218
+  $
+
+  This is the probability that $X$ lies between 1 and 1.4.
+
+  3. Find the probability that $X$ is between 1 and 3.
+
+  Idea: integrate between 1 and 3, be careful after 2.
+
+  $ integral^2_1 3 / 8 x^2 dif x + integral_2^3 0 dif x = $
+
+  4. What is the CDF for $P(X <= x)$? Integrate the curve to $x$.
+
+  $
+    F(x) = P(X <= x) = integral_(-infinity)^x f(t) dif t \
+    = integral_0^x 3 / 8 t^2 dif t \
+    = x^3 / 8
+  $
+
+  Important: include the range!
+
+  $
+    F(x) = cases(
+  0 &"for" x <= 0,
+  x^3/8 &"for" 0 < x < 2,
+  1 &"for" x >= 2
+  )
+  $
+
+  5. Find a point $a$ such that you integrate up to the point to find exactly $1/2$
+  the area.
+
+  We want to find $1/2 = P(X <= a)$.
+
+  $ 1 / 2 = P(X <= a) = F(a) = a^3 / 8 => a = root(3, 4) $
+]
+
+Now let us discuss some named continuous distributions.
+
+=== The (continuous) uniform distribution
+
+The most simple and the best of the named distributions!
+
+#definition[
+  Let $[a,b]$ be a bounded interval on the real line. A random variable $X$ has the uniform distribution on the interval $[a,b]$ if $X$ has the density function
+
+  $
+    f(x) = cases(
+1/(b-a) &"for" x in [a,b],
+0 &"for" x in.not [a,b]
+)
+  $
+
+  Abbreviate this by $X ~ "Unif" [a,b]$.
+]<continuous-uniform>
+
+The graph of $"Unif" [a,b]$ is a constant line at height $1/(b-a)$ defined
+across $[a,b]$. The integral is just the area of a rectangle, and we can check
+it is 1.
+
+#fact[
+  For $X ~ "Unif" [a,b]$, its cumulative distribution function (CDF) is given by:
+
+  $
+    F_x (x) = cases(
+0 &"for" x < a,
+(x-a)/(b-a) &"for" x in [a,b],
+1 &"for" x > b
+)
+  $
+]
+
+#fact[
+  If $X ~ "Unif" [a,b]$, and $[c,d] subset [a,b]$, then
+  $
+    P(c <= X <= d) = integral_c^d 1 / (b-a) dif x = (d-c) / (b-a)
+  $
+]
+
+#example[
+  Let $Y$ be a uniform random variable on $[-2,5]$. Find the probability that its
+  absolute value is at least 1.
+
+  $Y$ takes values in the interval $[-2,5]$, so the absolute value is at least 1 iff. $Y in [-2,1] union [1,5]$.
+
+  The density function of $Y$ is $f(x) = 1/(5- (-2)) = 1/7$ on $[-2,5]$ and 0 everywhere else.
+
+  So,
+
+  $
+    P(|Y| >= 1) &= P(Y in [-2,-1] union [1,5]) \
+    &= P(-2 <= Y <= -1) + P(1 <= Y <= 5) \
+    &= 5 / 7
+  $
+]
+
+=== The exponential distribution
+
+The geometric distribution can be viewed as modeling waiting times, in a discrete setting, i.e. we wait for $n - 1$ failures to arrive at the $n^"th"$ success.
+
+The exponential distribution is the continuous analogue to the geometric
+distribution, in that we often use it to model waiting times in the continuous
+sense. For example, the first custom to enter the barber shop.
+
+#definition[
+  Let $0 < lambda < infinity$. A random variable $X$ has the exponential distribution with parameter $lambda$ if $X$ has PDF
+
+  $
+    f(x) = cases(
+  lambda e^(-lambda x) &"for" x >= 0,
+  0 &"for" x < 0
+  )
+  $
+
+  Abbreviate this by $X ~ "Exp"(lambda)$, the exponential distribution with rate $lambda$.
+
+  The CDF of the $"Exp"(lambda)$ distribution is given by:
+
+  $
+    F(t) + cases(
+  0 &"if" t <0,
+  1 - e^(-lambda t) &"if" t>= 0
+  )
+  $
+]
+
+#example[
+  Suppose the length of a phone call, in minutes, is well modeled by an exponential random variable with a rate $lambda = 1/10$.
+
+  1. What is the probability that a call takes more than 8 minutes?
+  2. What is the probability that a call takes between 8 and 22 minutes?
+
+  Let $X$ be the length of the phone call, so that $X ~ "Exp"(1/10)$. Then we can find the desired probability by:
+
+  $
+    P(X > 8) &= 1 - P(X <= 8) \
+    &= 1 - F_x (8) \
+    &= 1 - (1 - e^(-(1 / 10) dot 8)) \
+    &= e^(-8 / 10) approx 0.4493
+  $
+
+  Now to find $P(8 < X < 22)$, we can take the difference in CDFs:
+
+  $
+    &P(X > 8) - P(X >= 22) \
+    &= e^(-8 / 10) - e^(-22 / 10) \
+    &approx 0.3385
+  $
+]
+
+#fact(title: "Memoryless property of the exponential distribution")[
+  Suppose that $X ~ "Exp"(lambda)$. Then for any $s,t > 0$, we have
+  $
+    P(X > t + s | X > t) = P(X > s)
+  $
+]<memoryless>
+
+This is like saying if I've been waiting 5 minutes and then 3 minutes for the
+bus, what is the probability that I'm gonna wait more than 5 + 3 minutes, given
+that I've already waited 5 minutes? And that's precisely equal to just the
+probability I'm gonna wait more than 3 minutes.
+
+#proof[
+  $
+    P(X > t + s | X > t) = (P(X > t + s sect X > t)) / (P(X > t)) \
+    = P(X > t + s) / P(X > t)
+    = e^(-lambda (t+ s)) / (e^(-lambda t)) = e^(-lambda s) \
+    equiv P(X > s)
+  $
+]
+
+=== Gamma distribution
+
+#definition[
+  Let $r, lambda > 0$. A random variable $X$ has the *gamma distribution* with parameters $(r, lambda)$ if $X$ is nonnegative and has probability density function
+
+  $
+    f(x) = cases(
+(lambda^r x^(r-2))/(Gamma(r)) e^(-lambda x) &"for" x >= 0,
+0 &"for" x < 0
+)
+  $
+
+  Abbreviate this by $X ~ "Gamma"(r, lambda)$.
+]
+
+The gamma function $Gamma(r)$ generalizes the factorial function and is defined as
+
+$
+  Gamma(r) = integral_0^infinity x^(r-1) e^(-x) dif x, "for" r > 0
+$
+
+Special case: $Gamma(n) = (n - 1)!$ if $n in ZZ^+$.
+
+#remark[
+  The $"Exp"(lambda)$ distribution is a special case of the gamma distribution,
+  with parameter $r = 1$.
+]
+
+== The normal distribution
+
+Also known as the Gaussian distribution, this is so important it gets its own section.
+
+#definition[
+  A random variable $Z$ has the *standard normal distribution* if $Z$ has
+  density function
+
+  $
+    phi(x) = 1 / sqrt(2 pi) e^(-x^2 / 2)
+  $
+  on the real line. Abbreviate this by $Z ~ N(0,1)$.
+]<normal-dist>
+
+#fact(title: "CDF of a standard normal random variable")[
+  Let $Z~N(0,1)$ be normally distributed. Then its CDF is given by
+  $
+    Phi(x) = integral_(-infinity)^x phi(s) dif s = integral_(-infinity)^x 1 / sqrt(2 pi) e^(-(-s^2) / 2) dif s
+  $
+]
+
+The normal distribution is so important, instead of the standard $f_Z(x)$ and
+$F_z(x)$, we use the special $phi(x)$ and $Phi(x)$.
+
+#fact[
+  $
+    integral_(-infinity)^infinity e^(-s^2 / 2) dif s = sqrt(2 pi)
+  $
+
+  No closed form of the standard normal CDF $Phi$ exists, so we are left to either:
+  - approximate
+  - use technology (calculator)
+  - use the standard normal probability table in the textbook
+]
+
+To evaluate negative values, we can use the symmetry of the normal distribution
+to apply the following identity:
+
+$
+  Phi(-x) = 1 - Phi(x)
+$
+
+=== General normal distributions
+
+We can compute any other parameters of the normal distribution using the
+standard normal.
+
+The general family of normal distributions is obtained by linear or affine
+transformations of $Z$. Let $mu$ be real, and $sigma > 0$, then
+
+$
+  X = sigma Z + mu
+$
+is also a normally distributed random variable with parameters $(mu, sigma^2)$.
+The CDF of $X$ in terms of $Phi(dot)$ can be expressed as
+
+$
+  F_X (x) &= P(X <= x) \
+  &= P(sigma Z + mu <= x) \
+  &= P(Z <= (x - mu) / sigma) \
+  &= Phi((x-mu)/sigma)
+$
+
+Also,
+
+$
+  f(x) = F'(x) = dif / (dif x) [Phi((x-u)/sigma)] = 1 / sigma phi((x-u)/sigma) = 1 / sqrt(2 pi sigma^2) e^(-((x-mu)^2) / (2sigma^2))
+$
+
+#definition[
+  Let $mu$ be real and $sigma > 0$. A random variable $X$ has the _normal distribution_ with mean $mu$ and variance $sigma^2$ if $X$ has density function
+
+  $
+    f(x) = 1 / sqrt(2 pi sigma^2) e^(-((x-mu)^2) / (2sigma^2))
+  $
+
+  on the real line. Abbreviate this by $X ~ N(mu, sigma^2)$.
+]
+
+#fact[
+  Let $X ~ N(mu, sigma^2)$ and $Y = a X + b$. Then
+  $
+    Y ~ N(a mu + b, a^2 sigma^2)
+  $
+
+  That is, $Y$ is normally distributed with parameters $(a mu + b, a^2 sigma^2)$.
+  In particular,
+  $
+    Z = (X - mu) / sigma ~ N(0,1)
+  $
+  is a standard normal variable.
+]
+
+== Expectation
+
+Let's discuss the _expectation_ of a random variable, which is a similar idea
+to the basic concept of _mean_.
+
+#definition[
+  The expectation or mean of a discrete random variable $X$ is the weighted
+  average, with weights assigned by the corresponding probabilities.
+
+  $
+    E(X) = sum_("all" x_i) x_i dot p(x_i)
+  $
+]
+
+#example[
+  Find the expected value of a single roll of a fair die.
+
+  - $X = "score" / "dots"$
+  - $x = 1,2,3,4,5,6$
+  - $p(x) = 1 / 6, 1 / 6,1 / 6,1 / 6,1 / 6,1 / 6$
+
+  $
+    E[x] = 1 dot 1 / 6 + 2 dot 1 / 6 ... + 6 dot 1 / 6
+  $
+]
+
+=== Binomial expected value
+
+$
+  E[x] = n p
+$
+
+=== Bernoulli expected value
+
+Bernoulli is just binomial with one trial.
+
+Recall that $P(X=1) = p$ and $P(X=0) = 1 - p$.
+
+$
+  E[X] = 1 dot P(X=1) + 0 dot P(X=0) = p
+$
+
+Let $A$ be an event on $Omega$. Its _indicator random variable_ $I_A$ is defined
+for $omega in Omega$ by
+
+$
+  I_A (omega) = cases(1", if " &omega in A, 0", if" &omega in.not A)
+$
+
+$
+  E[I_A] = 1 dot P(A) = P(A)
+$
+
+== Geometric expected value
+
+Let $p in [0,1]$ and $X ~ "Geom"[ p ]$ be a geometric RV with probability of
+success $p$. Recall that the p.m.f. is $p q^(k-1)$, where prob. of failure is defined by $q := 1-p$.
+
+Then
+
+$
+  E[X] &= sum_(k=1)^infinity k p q^(k-1) \
+  &= p dot sum_(k=1)^infinity k dot q^(k-1)
+$
+
+Now recall from calculus that you can differentiate a power series term by term inside its radius of convergence. So for $|t| < 1$,
+
+$
+  sum_(k=1)^infinity k t^(k-1) =
+  sum_(k=1)^infinity dif / (dif t) t^k = dif / (dif t) sum_(k=1)^infinity t^k = dif / (dif t) (1 / (1-t)) = 1 / (1-t)^2 \
+  therefore E[x] = sum^infinity_(k=1) k p q^(k-1) = p sum^infinity_(k=1) k q^(k-1) = p (1 / (1 - q)^2) = 1 / p
+$
+
+=== Expected value of a continuous RV
+
+#definition[
+  The expectation or mean of a continuous random variable $X$ with density
+  function $f$ is
+
+  $
+    E[x] = integral_(-infinity)^infinity x dot f(x) dif x
+  $
+
+  An alternative symbol is $mu = E[x]$.
+]
+
+$mu$ is the "first moment" of $X$, analogous to physics, it's the "center of
+gravity" of $X$.
+
+#remark[
+  In general when moving between discrete and continuous RV, replace sums with
+  integrals, p.m.f. with p.d.f., and vice versa.
+]
+
+#example[
+  Suppose $X$ is a continuous RV with p.d.f.
+
+  $
+    f_X (x) = cases(2x", " &0 < x < 1, 0"," &"elsewhere")
+  $
+
+  $
+    E[X] = integral_(-infinity)^infinity x dot f(x) dif x = integral^1_0 x dot 2x dif x = 2 / 3
+  $
+]
+
+#example(title: "Uniform expectation")[
+  Let $X$ be a uniform random variable on the interval $[a,b]$ with $X ~
+  "Unif"[a,b]$. Find the expected value of $X$.
+
+  $
+    E[X] = integral^infinity_(-infinity) x dot f(x) dif x = integral_a^b x / (b-a) dif x \
+    = 1 / (b-a) integral_a^b x dif x = 1 / (b-a) dot (b^2 - a^2) / 2 = underbrace((b+a) / 2, "midpoint formula")
+  $
+]
+
+#example(title: "Exponential expectation")[
+  Find the expected value of an exponential RV, with p.d.f.
+
+  $
+    f_X (x) = cases(lambda e^(-lambda x)", " &x > 0, 0"," &"elsewhere")
+  $
+
+  $
+    E[x] = integral_(-infinity)^infinity x dot f(x) dif x = integral_0^infinity x dot lambda e^(-lambda x) dif x \
+    = lambda dot integral_0^infinity x dot e^(-lambda x) dif x \
+    = lambda dot [lr(-x 1 / lambda e^(-lambda x) |)_(x=0)^(x=infinity) - integral_0^infinity -1 / lambda e^(-lambda x) dif x] \
+    = 1 / lambda
+  $
+]
+
+#example(title: "Uniform dartboard")[
+  Our dartboard is a disk of radius $r_0$ and the dart lands uniformly at
+  random on the disk when thrown. Let $R$ be the distance of the dart from the
+  center of the disk. Find $E[R]$ given density function
+
+  $
+    f_R (t) = cases((2t)/(r_0 ^2)", " &0 <= t <= r_0, 0", " &t < 0 "or" t > r_0)
+  $
+
+  $
+    E[R] = integral_(-infinity)^infinity t f_R (t) dif t \
+    = integral^(r_0)_0 t dot (2t) / (r_0^2) dif t \
+    = 2 / 3 r_0
+  $
+]
+
+=== Expectation of derived values
+
+If we can find the expected value of $X$, can we find the expected value of
+$X^2$? More precisely, can we find $E[X^2]$?
+
+If the distribution is easy to see, then this is trivial. Otherwise we have the
+following useful property:
+
+$
+  E[X^2] = integral_("all" x) x^2 f_X (x) dif x
+$
+
+(for continuous RVs).
+
+And in the discrete case,
+
+$
+  E[X^2] = sum_("all" x) x^2 p_X (x)
+$
+
+In fact $E[X^2]$ is so important that we call it the *mean square*.
+
+#fact[
+  More generally, a real valued function $g(X)$ defined on the range of $X$ is
+  itself a random variable (with its own distribution).
+]
+
+We can find expected value of $g(X)$ by
+
+$
+  E[g(x)] = integral_(-infinity)^infinity g(x) f(x) dif x
+$
+
+or
+
+$
+  E[g(x)] = sum_("all" x) g(x) f(x)
+$
+
+#example[
+  You roll a fair die to determine the winnings (or losses) $W$ of a player as
+  follows:
+
+  $
+    W = cases(-1", if the roll is 1, 2, or 3", 1", if the roll is a 4", 3", if the roll is 5 or 6")
+  $
+
+  What is the expected winnings/losses for the player during 1 roll of the die?
+
+  Let $X$ denote the outcome of the roll of the die. Then we can define our
+  random variable as $W = g(X)$ where the function $g$ is defined by $g(1) =
+  g(2) = g(3) = -1$ and so on.
+
+  Note that $P(W = -1) = P(X = 1 union X = 2 union X = 3) = 1/2$. Likewise $P(W=1)
+  = P(X=4) = 1/6$, and $P(W=3) = P(X=5 union X=6) = 1/3$.
+
+  Then
+  $
+    E[g(X)] = E[W] = (-1) dot P(W=-1) + (1) dot P(W=1) + (3) dot P(W=3) \
+    = -1 / 2 + 1 / 6 + 1 = 2 / 3
+  $
+]
+
+#example[
+  A stick of length $l$ is broken at a uniformly chosen random location. What is
+  the expected length of the longer piece?
+
+  Idea: if you break it before the halfway point, then the longer piece has length
+  given by $l - x$. If you break it after the halfway point, the longer piece
+  has length $x$.
+
+  Let the interval $[0,l]$ represent the stick and let $X ~ "Unif"[0,l]$ be the
+  location where the stick is broken. Then $X$ has density $f(x) = 1/l$ on
+  $[0,l]$ and 0 elsewhere.
+
+  Let $g(x)$ be the length of the longer piece when the stick is broken at $x$,
+
+  $
+    g(x) = cases(1-x", " &0 <= x < l/2, x", " &1/2 <= x <= l)
+  $
+
+  Then
+  $
+    E[g(X)] = integral_(-infinity)^infinity g(x) f(x) dif x = integral_0^(l / 2) (l-x) / l dif x + integral_(l / 2)^l x / l dif x \
+    = 3 / 4 l
+  $
+
+  So we expect the longer piece to be $3/4$ of the total length, which is a bit
+  pathological.
+]
+
+=== Moments of a random variable
+
+We continue discussing expectation but we introduce new terminology.
+
+#fact[
+  The $n^"th"$ moment (or $n^"th"$ raw moment) of a discrete random variable $X$
+  with p.m.f. $p_X (x)$ is the expectation
+
+  $
+    E[X^n] = sum_k k^n p_X (k) = mu_n
+  $
+
+  If $X$ is continuous, then we have analogously
+
+  $
+    E[X^n] = integral_(-infinity)^infinity x^n f_X (x) = mu_n
+  $
+]
+
+The *deviation* is given by $sigma$ and the *variance* is given by $sigma^2$ and
+
+$
+  sigma^2 = mu_2 - (mu_1)^2
+$
+
+$mu_3$ is used to measure "skewness" / asymmetry of a distribution. For
+example, the normal distribution is very symmetric.
+
+$mu_4$ is used to measure kurtosis/peakedness of a distribution.
+
+=== Central moments
+
+Previously we discussed "raw moments." Be careful not to confuse them with
+_central moments_.
+
+#fact[
+  The $n^"th"$ central moment of a discrete random variable $X$ with p.m.f. $p_X
+  (x)$ is the expected value of the difference about the mean raised to the
+  $n^"th"$ power
+
+  $
+    E[(X-mu)^n] = sum_k (k - mu)^n p_X (k) = mu'_n
+  $
+
+  And of course in the continuous case,
+
+  $
+    E[(X-mu)^n] = integral_(-infinity)^infinity (x - mu)^n f_X (x) = mu'_n
+  $
+]
+
+In particular,
+
+$
+  mu'_1 = E[(X-mu)^1] = integral_(-infinity)^infinity (x-mu)^1 f_X (x) dif x \
+  = integral_(infinity)^infinity x f_X (x) dif x = integral_(-infinity)^infinity mu f_X (x) dif x = mu - mu dot 1 = 0 \
+  mu'_2 = E[(X-mu)^2] = sigma^2_X = "Var"(X)
+$
+
+#example[
+  Let $Y$ be a uniformly chosen integer from ${0,1,2,...,m}$. Find the first and
+  second moment of $Y$.
+
+  The p.m.f. of $Y$ is $p_Y (k) = 1/(m+1)$ for $k in [0,m]$. Thus,
+
+  $
+    E[Y] = sum_(k=0)^m k 1 / (m+1) = 1 / (m+1) sum_(k=0)^m k \
+    = m / 2
+  $
+
+  Then,
+
+  $
+    E[Y^2] = sum_(k=0)^m k^2 1 / (m+1) = 1 / (m+1) = (m(2m+1)) / 6
+  $
+]
+
+#example[
+  Let $c > 0$ and let $U$ be a uniform random variable on the interval $[0,c]$.
+  Find the $n^"th"$ moment for $U$ for all positive integers $n$.
+
+  The density function of $U$ is
+
+  $
+    f(x) = cases(1/c", if" &x in [0,c], 0", " &"otherwise")
+  $
+
+  Therefore the $n^"th"$ moment of $U$ is,
+
+  $
+    E[U^n] = integral_(-infinity)^infinity x^n f(x) dif x
+  $
+]
+
+#example[
+  Suppose the random variable $X ~ "Exp"(lambda)$. Find the second moment of $X$.
+
+  $
+    E[X^2] = integral_0^infinity x^2 lambda e^(-lambda x) dif x \
+    = 1 / (lambda^2) integral_0^infinity u^2 e^(-u) dif u \
+    = 1 / (lambda^2) Gamma(2 + 1) = 2! / lambda^2
+  $
+]
+
+#fact[
+  In general, to find teh $n^"th"$ moment of $X ~ "Exp"(lambda)$,
+  $
+    E[X^n] = integral^infinity_0 x^n lambda e^(-lambda x) dif x = n! / lambda^n
+  $
+]
+
+=== Median and quartiles
+
+When a random variable has rare (abnormal) values, its expectation may be a bad
+indicator of where the center of the distribution lies.
+
+#definition[
+  The *median* of a random variable $X$ is any real value $m$ that satisfies
+
+  $
+    P(X >= m) >= 1 / 2 "and" P(X <= m) >= 1 / 2
+  $
+
+  With half the probability on both ${X <= m}$ and ${X >= m}$, the median is
+  representative of the midpoint of the distribution. We say that the median is
+  more _robust_ because it is less affected by outliers. It is not necessarily
+  unique.
+]
+
+#example[
+  Let $X$ be discretely uniformly distributed in the set ${-100, 1, 2, ,3, ..., 9}$ so $X$ has probability mass function
+  $
+    p_X (-100) = p_X (1) = dots.c = p_X (9)
+  $
+
+  Find the expected value and median of $X$.
+
+  $
+    E[X] = (-100) dot 1 / 10 + (1) dot 1 / 10 + dots.c + (9) dot 1 / 10 = -5.5
+  $
+
+  While the median is any number $m in [4,5]$.
+
+  The median reflects the fact that 90% of the values and probability is in the
+  range $1,2,...,9$ while the mean is heavily influenced by the $-100$ value.
+]