2464 lines
69 KiB
Text
2464 lines
69 KiB
Text
#import "@youwen/zen:0.1.0": *
|
|
|
|
#show: zen.with(
|
|
title: "PSTAT120A Course Notes",
|
|
author: "Youwen Wu",
|
|
date: "Winter 2025",
|
|
subtitle: "Taught by Brian Wainwright",
|
|
)
|
|
|
|
#outline()
|
|
|
|
= Introduction
|
|
|
|
There are lecture notes from when PSTAT120A (Probability and Statistics) was
|
|
taught in Winter 2025 by Dr. Wainwright. Any errors contained within are the
|
|
scribe's, not the instructor's.
|
|
|
|
= Lecture #datetime(day: 6, month: 1, year: 2025).display()
|
|
|
|
== Preliminaries
|
|
|
|
#definition[
|
|
Statistics is the science dealing with the collection, summarization,
|
|
analysis, and interpretation of data.
|
|
]
|
|
|
|
== Set theory for dummies
|
|
|
|
A terse introduction to elementary naive set theory and the basic operations
|
|
upon them.
|
|
|
|
#remark[
|
|
Keep in mind that without $cal(Z F C)$ or another model of set theory that
|
|
resolves fundamental issues, our set theory is subject to paradoxes like
|
|
Russell's. Whoops, the universe doesn't exist.
|
|
]
|
|
|
|
#definition[
|
|
A *set* is a collection of elements.
|
|
]
|
|
|
|
#example[Examples of sets][
|
|
+ Trivial set: ${1}$
|
|
+ Empty set: $emptyset$
|
|
+ $A = {a,b,c}$
|
|
]
|
|
|
|
We can construct sets using set-builder notation (also sometimes called set
|
|
comprehension).
|
|
|
|
$ {"expression with" x | "conditions on" x} $
|
|
|
|
#example("Set builder notation")[
|
|
+ The set of all even integers: ${2n | n in ZZ}$
|
|
+ The set of all perfect squares in $RR$: ${x^2 | x in NN}$
|
|
]
|
|
|
|
We also have notation for working with sets:
|
|
|
|
With arbitrary sets $A$, $B$:
|
|
|
|
+ $a in A$ ($a$ is a member of the set $A$)
|
|
+ $a in.not A$ ($a$ is not a member of the set $A$)
|
|
+ $A subset.eq B$ (Set theory: $A$ is a subset of $B$) (Stats: $A$ is a sample space in $B$)
|
|
+ $A subset B$ (Proper subset: $A != B$)
|
|
+ $A^c$ or $A'$ (read "complement of $A$," and introduced later)
|
|
+ $A union B$ (Union of $A$ and $B$. Gives a set with both the elements of $A$ and $B$)
|
|
+ $A sect B$ (Intersection of $A$ and $B$. Gives a set consisting of the elements in *both* $A$ and $B$)
|
|
+ $A \\ B$ (Set difference. The set of all elements of $A$ that are not also in $B$)
|
|
+ $A times B$ (Cartesian product. Ordered pairs of $(a,b)$ $forall a in A$, $forall b in B$)
|
|
|
|
We can also write a few of these operations precisely as set comprehensions.
|
|
|
|
+ $A subset B => A = {a | a in B, forall a in A}$
|
|
+ $A union B = {x | x in A or x in B}$ (here $or$ is the logical OR)
|
|
+ $A sect B = {x | x in A and x in B}$ (here $and$ is the logical AND)
|
|
+ $A \\ B = {a | a in A and a in.not B}$
|
|
+ $A times B = {(a,b) | forall a in A, forall b in B}$
|
|
|
|
Take a moment and convince yourself that these definitions are equivalent to
|
|
the previous ones.
|
|
|
|
#definition[
|
|
The universal set $Omega$ is the set of all objects in a given set
|
|
theoretical universe.
|
|
]
|
|
|
|
With the above definition, we can now introduce the set complement.
|
|
|
|
#definition[
|
|
The set complement $A'$ is given by
|
|
$
|
|
A' = Omega \\ A
|
|
$
|
|
where $Omega$ is the _universal set_.
|
|
]
|
|
|
|
#example[The real plane][
|
|
The real plane $RR^2$ can be defined as a Cartesian product of $RR$ with
|
|
itself.
|
|
|
|
$ RR^2 = RR times RR $
|
|
]
|
|
|
|
Check your intuition that this makes sense. Why do you think $RR^n$ was chosen
|
|
as the notation for $n$ dimensional spaces in $RR$?
|
|
|
|
#definition[Disjoint sets][
|
|
If $A sect B$ = $emptyset$, then we say that $A$ and $B$ are *disjoint*.
|
|
]
|
|
|
|
#fact[
|
|
For any sets $A$ and $B$, we have DeMorgan's Laws:
|
|
+ $(A union B)' = A' sect B'$
|
|
+ $(A sect B)' = A' union B'$
|
|
]
|
|
|
|
#fact[Generalized DeMorgan's][
|
|
+ $(union.big_i A_i)' = sect.big_i A_i '$
|
|
+ $(sect.big_i A_i)' = union.big_i A_i '$
|
|
]
|
|
|
|
== Sizes of infinity
|
|
|
|
#definition[
|
|
Let $N(A)$ be the number of elements in $A$. $N(A)$ is called the _cardinality_ of $A$.
|
|
]
|
|
|
|
We say a set is finite if it has finite cardinality, or infinite if it has an
|
|
infinite cardinality.
|
|
|
|
Infinite sets can be either _countably infinite_ or _uncountably infinite_.
|
|
|
|
When a set is countably infinite, its cardinality is $aleph_0$ (here $aleph$ is
|
|
the Hebrew letter aleph and read "aleph null").
|
|
|
|
When a set is uncountably infinite, its cardinality is greater than $aleph_0$.
|
|
|
|
#example("Countable sets")[
|
|
+ The natural numbers $NN$.
|
|
+ The rationals $QQ$.
|
|
+ The natural numbers $ZZ$.
|
|
+ The set of all logical tautologies.
|
|
]
|
|
|
|
#example("Uncountable sets")[
|
|
+ The real numbers $RR$.
|
|
+ The real numbers in the interval $[0,1]$.
|
|
+ The _power set_ of $ZZ$, which is the set of all subsets of $ZZ$.
|
|
]
|
|
|
|
#remark[
|
|
All the uncountable sets above have cardinality $2^(aleph_0)$ or $aleph_1$ or
|
|
$frak(c)$ or $beth_1$. This is the _cardinality of the continuum_, also
|
|
called "aleph 1" or "beth 1".
|
|
|
|
However, in general uncountably infinite sets do not have the same
|
|
cardinality.
|
|
]
|
|
|
|
#fact[
|
|
If a set is countably infinite, then it has a bijection with $ZZ$. This means
|
|
every set with cardinality $aleph_0$ has a bijection to $ZZ$. More generally,
|
|
any sets with the same cardinality have a bijection between them.
|
|
]
|
|
|
|
This gives us the following equivalent statement:
|
|
|
|
#fact[
|
|
Two sets have the same cardinality if and only if there exists a bijective
|
|
function between them. In symbols,
|
|
|
|
$ N(A) = N(B) <==> exists F : A <-> B $
|
|
]
|
|
|
|
= Lecture #datetime(day: 8, month: 1, year: 2025).display()
|
|
|
|
== Probability
|
|
|
|
#definition[
|
|
A *random experiment* is one in which the set of all possible outcomes is known in advance, but one can't predict which outcome will occur on a given trial of the experiment.
|
|
]
|
|
|
|
#example("Finite sample spaces")[
|
|
Toss a coin:
|
|
$ Omega = {H,T} $
|
|
|
|
Roll a pair of dice:
|
|
$ Omega = {1,2,3,4,5,6} times {1,2,3,4,5,6} $
|
|
]
|
|
|
|
#example("Countably infinite sample spaces")[
|
|
Shoot a basket until you make one:
|
|
$ Omega = {M, F M, F F M, F F F M, dots} $
|
|
]
|
|
|
|
#example("Uncountably infinite sample space")[
|
|
Waiting time for a bus:
|
|
$ Omega = {T : t >= 0} $
|
|
]
|
|
|
|
#fact[
|
|
Elements of $Omega$ are called sample points.
|
|
]
|
|
|
|
#definition[
|
|
Any properly defined subset of $Omega$ is called an *event*.
|
|
]
|
|
|
|
#example[Dice][
|
|
Rolling a fair die twice, let $A$ be the event that the combined score of both dice is 10.
|
|
|
|
$ A = {(4,6,), (5,5),(6,4)} $
|
|
]
|
|
|
|
Probabilistic concepts in the parlance of set theory:
|
|
|
|
- Superset ($Omega$) $<->$ sample space
|
|
- Element $<->$ outcome / sample point ($omega$)
|
|
- Disjoint sets $<->$ mutually exclusive events
|
|
|
|
== Classical approach
|
|
|
|
Classical approach:
|
|
|
|
$ P(a) = (hash A) / (hash Omega) $
|
|
|
|
Requires equally likely outcomes and finite sample spaces.
|
|
|
|
#remark[
|
|
With an infinite sample space, the probability becomes 0, which is often wrong.
|
|
]
|
|
|
|
#example("Dice again")[
|
|
Rolling a fair die twice, let $A$ be the event that the combined score of both dice is 10.
|
|
|
|
$
|
|
A &= {(4,6,), (5,5),(6,4)} \
|
|
P(A) &= 3 / 36 = 1 / 12
|
|
$
|
|
]
|
|
|
|
== Relative frequency approach
|
|
|
|
An approach done commonly by applied statisticians who work in the disgusting
|
|
real world. This is where we are generally concerned with irrelevant concerns
|
|
like accurate sampling and $p$-values and such.
|
|
$
|
|
P(A) = (hash "of times" A "occurs in large number of trials") / (hash "of trials")
|
|
$
|
|
|
|
#example[
|
|
Flipping a coin to determine the probability of it landing heads.
|
|
]
|
|
|
|
== Subjective approach
|
|
|
|
Personal definition of probability. Not "real" probability, merely co-opting
|
|
its parlance to lend credibility to subjective judgements of confidence.
|
|
|
|
== Axiomatic approach
|
|
|
|
Consider a random experiment. Then:
|
|
|
|
#definition[
|
|
The *sample space* $Omega$ is the set of all possible outcomes of the
|
|
experiment.
|
|
]
|
|
|
|
#definition[
|
|
Elements of $Omega$ are called *sample points*.
|
|
]
|
|
|
|
#definition[
|
|
Subsets of $Omega$ are called *events*. The collection of events (in other
|
|
terms, the power set of $Omega$) in $Omega$ is denoted by $cal(F)$.
|
|
]
|
|
|
|
#definition[
|
|
The *probability measure*, or probability distribution, or simply probability s a function $P$.
|
|
|
|
Let $P : cal(F) -> RR$ be a function satisfying the following axioms (properties).
|
|
|
|
+ $P(A) >= 0, forall A$
|
|
+ $P(Omega) = 1$
|
|
+ If $A_i sect A_j = emptyset, forall i != j$, then
|
|
$ P(union.big_(i=1)^infinity A_i) = sum_(i=1)^infinity P(A_i) $
|
|
]
|
|
|
|
The 3-tuple $(Omega, cal(F), P)$ is called a *probability space*.
|
|
|
|
#remark[
|
|
In more advanced texts you will see $Omega$ introduced as a so-called
|
|
$sigma$-algebra. A $sigma$-algebra on a set $Omega$ is a nonempty collection
|
|
$Sigma$ of subsets of $Omega$ that is closed under set complement, countable
|
|
unions, and as a corollary, countable intersections.
|
|
]
|
|
|
|
Now let us show various results with $P$.
|
|
|
|
#proposition[
|
|
$ P(emptyset) = 0 $
|
|
]
|
|
|
|
#proof[
|
|
By axiom 3,
|
|
|
|
$
|
|
A_1 = emptyset, A_2 = emptyset, A_3 = emptyset \
|
|
P(emptyset) = sum^infinity_(i=1) P(A_i) = sum^infinity_(i=1) P(emptyset)
|
|
$
|
|
Suppose $P(emptyset) != 0$. Then $P >= 0$ by axiom 1 but then $P -> infinity$ in the sum, which implies $Omega > 1$, which is disallowed by axiom 2. So $P(emptyset) = 0$.
|
|
]
|
|
|
|
#proposition[
|
|
If $A_1, A_2, ..., A_n$ are disjoint, then
|
|
$ P(union.big^n_(i=1) A_i) = sum^n_(i= 1) P(A_i) $
|
|
]
|
|
|
|
This is mostly a formal manipulation to derive the obviously true proposition from our axioms.
|
|
|
|
#proof[
|
|
Write any finite set $(A_1, A_2, ..., A_n)$ as an infinite set $(A_1, A_2, ..., A_n, emptyset, emptyset, ...)$. Then
|
|
$
|
|
P(union.big_(i=1)^infinity A_i) = sum^n_(i=1) P(A_i) + sum^infinity_(i=n+1) P(emptyset) = sum^n_(i=1) P(A_i)
|
|
$
|
|
And because all of the elements after $A_n$ are $emptyset$, their union adds no additional elements to the resultant union set of all $A_i$, so
|
|
$
|
|
P(union.big_(i=1)^infinity A_i) = P(union.big_(i=1)^n A_i) = sum_(i=1)^n P(A_i)
|
|
$
|
|
]
|
|
|
|
#proposition[Complement][
|
|
$ P(A') = 1 - P(A) $
|
|
]
|
|
|
|
#proof[
|
|
$
|
|
A' union A &= Omega \
|
|
A' sect A &= emptyset \
|
|
P(A' union A) &= P(A') + P(A) &"(by axiom 3)"\
|
|
= P(Omega) &= 1 &"(by axiom 2)" \
|
|
therefore P(A') &= 1 - P(A)
|
|
$
|
|
]
|
|
|
|
#proposition[
|
|
$ A subset.eq B => P(A) <= P(B) $
|
|
]
|
|
|
|
#proof[
|
|
$ B = A union (A' sect B) $
|
|
|
|
but $A$ and ($A' sect B$) are disjoint, so
|
|
|
|
$
|
|
P(B) &= P(A union (A' sect B)) \
|
|
&= P(A) + P(A' sect B) \
|
|
&therefore P(B) >= P(A)
|
|
$
|
|
]
|
|
|
|
#proposition("Inclusion-exclusion principle")[
|
|
$ P(A union B) = P(A) + P(B) - P(A sect B) $
|
|
]<inclusion-exclusion>
|
|
|
|
#proof[
|
|
$
|
|
A = (A sect B) union (A sect B') \
|
|
=> P(A) = P(A sect B) + P(A sect B') \
|
|
=> P(B) = P(B sect A) + P(B sect A') \
|
|
P(A) + P(B) = P(A sect B) + P(A sect B) + P(A sect B') + P(A' sect B) \
|
|
=> P(A) + P(B) - P(A sect B) = P(A sect B) + P(A sect B') + P(A' sect B) \
|
|
$
|
|
]
|
|
|
|
#remark[
|
|
This is a stronger result of axiom 3, which generalizes for all sets $A$ and $B$ regardless of whether they're disjoint.
|
|
]
|
|
|
|
#remark[
|
|
These are mostly intuitively true statements (think about the probabilistic
|
|
concepts represented by the sets) in classical probability that we derive
|
|
rigorously from our axiomatic probability function $P$.
|
|
]
|
|
|
|
#example[
|
|
Now let us consider some trivial concepts in classical probability written in
|
|
the parlance of combinatorial probability.
|
|
|
|
Select one card from a deck of 52 cards.
|
|
Then the following is true:
|
|
|
|
$
|
|
Omega = {1,2,...,52} \
|
|
A = "card is a heart" = {H 2, H 3, H 4, ..., H"Ace"} \
|
|
B = "card is an Ace" = {H"Ace", C"Ace", D"Ace", S"Ace"} \
|
|
C = "card is black" = {C 2, C 3, ..., C"Ace", S 2, S 3, ..., S"Ace"} \
|
|
P(A) = 13 / 52,
|
|
P(B) = 4 / 52,
|
|
P(C) = 26 / 52 \
|
|
P(A sect B) = 1 / 52 \
|
|
P(A sect C) = 0 \
|
|
P(B sect C) = 2 / 52 \
|
|
P(A union B) = P(A) + P(B) - P(A sect B) = 16 / 52 \
|
|
P(B') = 1 - P(B) = 48 / 52 \
|
|
P(A sect B') = P(A) - P(A sect B) = 13 / 52 - 1 / 52 = 12 / 52 \
|
|
P((A sect B') union (A' sect B)) = P(A sect B') + P(A' sect B) = 15 / 52 \
|
|
P(A' sect B') = P(A union B)' = 1 - P(A union B) = 36 / 52
|
|
$
|
|
]
|
|
|
|
== Countable sample spaces
|
|
|
|
#definition[
|
|
A sample space $Omega$ is said to be *countable* if it's finite or countably infinite.
|
|
]
|
|
|
|
In such a case, one can list the elements of $Omega$.
|
|
|
|
$ Omega = {omega_1, omega_2, omega_3, ...} $
|
|
with associated probabilities, $p_1, p_2, p_3,...$, where
|
|
$
|
|
p_i = P(omega_i) >= 0 \
|
|
1 = P(Omega) = sum P(omega_i)
|
|
$
|
|
|
|
#example[Fair die, again][
|
|
All outcomes are equally likely,
|
|
$ p_1 = p_2 = ... = p_6 = 1 / 6 $
|
|
Let $A$ be the event that the score is odd = ${1,3,5}$
|
|
$ P(A) = 3 / 6 $
|
|
]
|
|
|
|
#example[Loaded die][
|
|
Consider a die where the probabilities of rolling odd sides is double the probability of rolling an even side.
|
|
$
|
|
p_2 = p_4 = p_6, p_1 = p_3 = p_5 = 2p_2 \
|
|
6p_2 + 3p_2 = 9p_2 = 1 \
|
|
p_2 = 1 / 9, p_1 = 2 / 9
|
|
$
|
|
]
|
|
|
|
#example[Coins][
|
|
Toss a fair coin until you get the first head.
|
|
$
|
|
Omega = {H, T H, T T H, ...} "(countably infinite)" \
|
|
P(H) = 1 / 2 \
|
|
P(T T H) = (1 / 2)^3 \
|
|
P(Omega) = sum_(n=1)^infinity (1 / 2)^n = 1 / (1 - 1 / 2) - 1 = 1
|
|
$
|
|
]
|
|
|
|
#example[Birthdays][
|
|
What is the probability two people share the same birthday?
|
|
|
|
$
|
|
Omega = [1,365] times [1,365] \
|
|
P(A) = 365 / 365^2 = 1 / 365
|
|
$
|
|
]
|
|
|
|
== Continuous sample spaces
|
|
|
|
#definition[
|
|
A *continuous sample space* contains an interval in $RR$ and is uncountably infinite.
|
|
]
|
|
|
|
#definition[
|
|
A probability density function (#smallcaps[pdf]) gives the probability at the point
|
|
$s$.
|
|
]
|
|
|
|
Properties of the #smallcaps[pdf]:
|
|
|
|
- $f(s) >= 0, forall p_i >= 0$
|
|
- $integral_S f(s) dif s = 1, forall p_i >= 0$
|
|
|
|
#example[
|
|
Waiting time for bus: $Omega = {s : s >= 0}$.
|
|
]
|
|
|
|
= Notes on counting
|
|
|
|
The cardinality of $A$ is given by $hash A$. Let us develop methods for finding
|
|
$hash A$ from a description of the set $A$ (in other words, methods for
|
|
counting).
|
|
|
|
== General multiplication principle
|
|
|
|
#fact[
|
|
Let $A$ and $B$ be finite sets, $k in ZZ^+$. Then let $f : A -> B$ be a
|
|
function such that each element in $B$ is the image of exactly $k$ elements
|
|
in $A$ (such a function is called _$k$-to-one_). Then $hash A = k dot hash
|
|
B$.
|
|
]<ktoone>
|
|
|
|
#example[
|
|
Four fully loaded 10-seater vans transported people to the picnic. How many
|
|
people were transported?
|
|
|
|
By @ktoone, we have $A$ is the set of people, $B$ is the set of vans, $f : A -> B$ maps a person to the van they ride in. So $f$ is a 10-to-one function, $hash A = 40$, $hash B = 4$, and clearly the answer is $10 dot 4 = 40$.
|
|
]
|
|
|
|
#definition[
|
|
An $n$-tuple is an ordered sequence of $n$ elements.
|
|
]
|
|
|
|
Many of our methods in probability rely on multiplying together multiple
|
|
outcomes to obtain their combined amount of outcomes. We make this explicit below in @tuplemultiplication.
|
|
|
|
#fact[
|
|
Suppose a set of $n$-tuples $(a_1, ..., a_n)$ obeys these rules:
|
|
|
|
+ There are $r_1$ choices for the first entry $a_1$.
|
|
+ Once the first $k$ entries $a_1, ..., a_k$ have been chosen, the number of alternatives for the next entry $a_(k+1)$ is $r_(k+1)$, regardless of the previous choices.
|
|
|
|
Then the total number of $n$-tuples is the product $r_1 dot r_2 dot r_2 dot dots dot r_n$.
|
|
]<tuplemultiplication>
|
|
|
|
#proof[
|
|
It is trivially true for $n = 1$ since you have $r_1$ choices of $a_1$ for a
|
|
1-tuple $(a_1)$.
|
|
|
|
Let $A$ be the set of all possible $n$-tuples and $B$ be the set of all
|
|
possible $(n+1)$-tuples. Now let us assume the statement is true for $A$.
|
|
Proceed by induction on $B$, noting that for each $n$-tuple in $A$, $(a_1,
|
|
..., a_n)$, we have $r_(n+1)$ tuples in $A$.
|
|
|
|
Let $f : B -> A$ be a function which takes each $(n+1)$-tuple and truncates the $a_(n+1)$ term, leaving us with just an $n$-tuple of the form $(a_1, a_2, ..., a_n)$.
|
|
$ f((a_1, ..., a_n, a_(n + 1))) = (a_1, ..., a_n) $
|
|
Now notice that $f$ is precisely a $r_(n+1)$-to-one function! Recall by
|
|
our assumption that @tuplemultiplication is true for $n$-tuples, so $A$ has $r_1 dot
|
|
r_2 dot ... dot r_n$ elements, or $hash A = r_1 dot ... dot r_n$. Then by
|
|
@ktoone, we have $hash B = hash A dot r_(n+1) = r_1 dot r_2 dot
|
|
... dot r_(n+1)$. Our induction is complete and we have proved @tuplemultiplication.
|
|
]
|
|
|
|
@tuplemultiplication is sometimes called the _general multiplication principle_.
|
|
|
|
We can use @tuplemultiplication to derive counting formulas for various
|
|
situations. Let $A_1, A_2, A_n$ be finite sets. Then as a corollary of
|
|
@tuplemultiplication, we can count the number of $n$-tuples in a finite
|
|
Cartesian product of $A_1, A_2, A_n$.
|
|
|
|
#fact[
|
|
Let $A_1, A_2, A_n$ be finite sets. Then
|
|
|
|
$
|
|
hash (A_1 times A_2 times ... times, A_n) = (hash A_1) dot (hash A_2) dot ... dot (hash A_n) = Pi^n_(i=1) (hash A_i)
|
|
$
|
|
]
|
|
|
|
#example[
|
|
How many distinct subsets does a set of size $n$ have?
|
|
|
|
The answer is $2^n$. Each subset can be encoded as an $n$-tuple with entries 0
|
|
or 1, where the $i$th entry is 1 if the $i$th element of the set is in the
|
|
subset and 0 if it is not.
|
|
|
|
Thus the number of subsets is the same as the cardinality of
|
|
$ {0,1} times ... times {0,1} = {0,1}^n $
|
|
which is $2^n$.
|
|
|
|
This is why given a set $X$ with cardinality $aleph$, we write the
|
|
cardinality of the power set of $X$ as $2^aleph$.
|
|
]
|
|
|
|
== Permutations
|
|
|
|
Now we can use the multiplication principle to count permutations.
|
|
|
|
#fact[
|
|
Consider all $k$-tuples $(a_1, ..., a_k)$ that can be constructed from a set $A$ of size $n, n>= k$ without repetition. The total number of these $k$-tuples is
|
|
$ (n)_k = n dot (n - 1) ... (n - k + 1) = n! / (n-k)! $
|
|
|
|
In particular, with $k=n$, each $n$-tuple is an ordering or _permutation_ of $A$. So the total number of permutations of a set of $n$ elements is $n!$.
|
|
]<permutation>
|
|
|
|
#proof[
|
|
We construct the $k$-tuples sequentially. For the first element, we choose
|
|
one element from $A$ with $n$ alternatives. The next element has $n - 1$
|
|
alternatives. In general, after $j$ elements are chosen, there are $n - j +
|
|
1$ alternatives.
|
|
|
|
Then clearly after choosing $k$ elements for our $k$-tuple we have by
|
|
@tuplemultiplication the number of $k$-tuples being $n dot (n - 1) dot ...
|
|
dot (n - k + 1) = (n)_k$.
|
|
]
|
|
|
|
#example[
|
|
Consider a round table with 8 seats.
|
|
|
|
+ In how many ways can we seat 8 guests around the table?
|
|
+ In how many ways can we do this if we do not differentiate between seating arrangements that are rotations of each other?
|
|
|
|
For (1), we easily see that we're simply asking for permutations of an
|
|
8-tuple, so $8!$ is the answer.
|
|
|
|
For (2), we number each person and each seat from 1-8, then always place person 1 in seat 1, and count the permutations of the other 7 people in the other 7 seats. Then the answer is $7!$.
|
|
|
|
Alternatively, notice that each arrangement has 8 equivalent arrangements under rotation. So the answer is $8!/8 = 7!$.
|
|
]
|
|
|
|
== Counting from sets
|
|
|
|
We turn our attention to sets, which unlike tuples are unordered collections.
|
|
|
|
#fact[
|
|
Let $n,k in NN$ with $0 <= k <= n$. The numbers of distinct subsets of size $k$ that a set of size $n$ has is given by the *binomial coefficient*
|
|
$ vec(n,k) = n! / (k! (n-k)!) $
|
|
]
|
|
|
|
#proof[
|
|
Let $A$ be a set of size $n$. By @permutation, $n!/(n-k)!$ unique ordered
|
|
$k$-tuples can be constructed from elements of $A$. Each subset of $A$ of
|
|
size $k$ has exactly $k!$ different orderings, and hence appears exactly $k!$
|
|
times among the ordered $k$-tuples. Thus the number of subsets of size $k$ is
|
|
$n! / (k! (n-k)!)$.
|
|
]
|
|
|
|
#example[
|
|
In a class there are 12 boys and 14 girls. How many different teams of 7 pupils
|
|
with 3 boys and 4 girls can be create?
|
|
|
|
First let us compute how many subsets of size 3 we can choose from the 12 boys and how many subsets of size 4 we can choose from the 14 girls.
|
|
|
|
$
|
|
"boys" &= vec(12,3) \
|
|
"girls" &= vec(14,4)
|
|
$
|
|
|
|
Then let us consider the entire team as a 2-tuple of (boys, girls). Then
|
|
there are $vec(12,3)$ alternatives for the choice of boys, and $vec(14,4)$ alternatives for
|
|
the choice of girls, so by the multiplication principle, we have the total being
|
|
|
|
$ vec(12,3) vec(14,4) $
|
|
]
|
|
|
|
#example[
|
|
Color the numbers 1, 2 red, the numbers 3, 4 green, and the numbers 5, 6
|
|
yellow. How many different two-element subsets of $A$ are there that have two
|
|
different colors?
|
|
|
|
First choose 2 colors, $vec(3,2) = 3$. Then from each color, choose one. Altogether it's
|
|
$ vec(3,2) vec(2,1) vec(2,1) = 3 dot 2 dot 2 = 12 $
|
|
]
|
|
|
|
One way to view $vec(n,k)$ is as the number of ways of painting $n$ elements
|
|
with two colors, red and yellow, with $k$ red and $n - k$ yellow elements. Let
|
|
us generalize to more than two colors.
|
|
|
|
#fact[
|
|
Let $n$ and $r$ be positive integers and $k_1, ..., k_r$ nonnegative integers
|
|
such that $k_1 + dots.c + k_r = n$. The number of ways of assigning labels
|
|
$1,2, ..., r$ to $n$ items so that for each $i = 1, 2, ..., r$, exactly $k_i$
|
|
items receive label $i$, is the *multinomial coefficient*
|
|
|
|
$ vec(n, (k_1, k_2, ..., k_r)) = vec(n!, k_1 ! k_2 ! dots.c k_r !) $
|
|
]<multinomial-coefficient>
|
|
|
|
#proof[
|
|
Order the $n$ integers in some manner, and assign labels like this: for the
|
|
first $k_1$ integers, assign the label 1, then for the next $k_2$ integers,
|
|
assign the label 2, and so on. The $i$th label will be assigned to all the
|
|
integers between positions $k_1 + dots.c + k_(i-1) + 1$ and $k_1 + dots.c +
|
|
k_i$.
|
|
|
|
Then notice that all possible orderings (permutations) of the integers gives
|
|
every possible way to label the integers. However, we overcount by some
|
|
amount. How much? The order of the integers with a given label don't matter,
|
|
so we need to deduplicate those.
|
|
|
|
Each set of labels is duplicated once for each way we can order all of the
|
|
elements with the same label. For label $i$, there are $k_i$ elements with
|
|
that label, so $k_i !$ ways to order those. By @tuplemultiplication, we know
|
|
that we can express the combined amount of ways each group of $k_1, ..., k_i$
|
|
numbers are labeled as $k_1 ! k_2 ! k_3 ! dots.c k_r !$.
|
|
|
|
So by @ktoone, we can account for the duplicates and the answer is
|
|
$ n! / (k_1 ! k_2 ! k_3 ! dots.c k_r !) $
|
|
]
|
|
|
|
#remark[
|
|
@multinomial-coefficient gives us a way to count how many ways there are to
|
|
fit $n$ distinguishable objects into $r$ distinguishable containers of
|
|
varying capacity.
|
|
|
|
To find the amount of ways to fit $n$ distinguishable objects into $k$
|
|
indistinguishable containers of _any_ capacity, use the "ball-and-urn"
|
|
technique.
|
|
]
|
|
|
|
#example[
|
|
How many different ways can six people be divided into three pairs?
|
|
|
|
First we use the multimonial coefficient to count the amount of ways to assign specific labels to pairs of elements:
|
|
$ vec(6, (2,2,2)) $
|
|
But notice that the actual labels themselves are irrelevant. Our multimonial
|
|
coefficient counts how many ways there are to assign 3 distinguishable
|
|
labels, say Pair 1, Pair 2, Pair 3, to our 6 elements.
|
|
|
|
To make this more explicit, say we had a 3-tuple where the position encoded
|
|
the label, where position 1 corresponds to Pair 1, and so on. Then the values
|
|
are the actual pairs of people (numbered 1-6). For instance
|
|
$ ((1,2), (3,4), (5,6)) $
|
|
corresponds to assigning the label Pair 1 to (1,2), Pair 2 to (3,4) and Pair
|
|
3 to (5,6). What our multimonial coefficient is doing is it's counting this,
|
|
as well as any other orderings of this tuple. For instance
|
|
$ ((3,4), (1,2), (5,6)) $
|
|
is also counted. However since in our case the actual labels are irrelevant,
|
|
the two examples shown above should really be counted only once.
|
|
|
|
How many extra times is each case counted? It turns out that we can think of
|
|
our multimonial coefficient as permuting the labels across our pairs. So in
|
|
this case it's permuting all the ways we can order 3 labels, which is $3! =
|
|
6$. That means by @ktoone our answer is
|
|
|
|
$ vec(6, (2,2,2)) / 3! = 15 $
|
|
]
|
|
|
|
#example("Poker")[
|
|
How many poker hands are in the category _one pair_?
|
|
|
|
A one pair is a hand with two cards of the same rank and three cards with ranks
|
|
different from each other and the pair.
|
|
|
|
We can count in two ways: we count all the ordered hands, then divide by $5!$
|
|
to remove overcounting, or we can build the unordered hands directly.
|
|
|
|
When finding the ordered hands, the key is to figure out how we can encode
|
|
our information in a tuple of the form described in @tuplemultiplication, and
|
|
then use @tuplemultiplication to compute the solution.
|
|
|
|
In this case, the first element encodes the two slots in the hand of 5 our
|
|
pair occupies, the second element encodes the first card of the pair, the
|
|
third element encodes the second card of the pair, and the fourth, fifth, and
|
|
sixth elements represent the 3 cards that are not of the same rank.
|
|
|
|
Now it is clear that the number of alternatives in each position of the
|
|
6-tuple does not depend on any of the others, so @tuplemultiplication
|
|
applies. Then we can determine the amount of alternatives for each position
|
|
in the 6-tuple and multiply them to determine the total amount of ways the
|
|
6-tuple can be constructed, giving us the total amount of ways to construct
|
|
ordered poker hands with one pairs.
|
|
|
|
First we choose 2 slots out of 5 positions (in the hand) so there are
|
|
$vec(5,2)$ alternatives. Then we choose any of the 52 cards for our first
|
|
pair card, so there are 52 alternatives. Then we choose any card with the
|
|
same rank for the second card in the pair, where there are 3 possible
|
|
alternatives. Then we choose the third card which must not be the same rank
|
|
as the first two, where there are 48 alternatives. The fourth card must not
|
|
be the same rank as the others, so there are 44 alternatives. Likewise, the
|
|
final card has 40 alternatives.
|
|
|
|
So the final answer is, remembering to divide by $5!$ because we don't care
|
|
about order,
|
|
$ (vec(5,2) dot 52 dot 3 dot 48 dot 44 dot 40) / 5! $
|
|
|
|
Alternatively, we can find way to build an unordered hand with the
|
|
requirements. First we choose the rank of the pair, then we choose two suits
|
|
for that rank, then we choose the remaining 3 different ranks, and finally a
|
|
suit for each of the ranks. Then, noting that we will now omit constructing
|
|
the tuple and explicitly listing alternatives for brevity, we have
|
|
$ 13 dot vec(5,2) dot vec(12, 3) dot 4^3 $
|
|
|
|
Both approaches given the same answer.
|
|
]
|
|
|
|
= Baye's theorem and conditional probability
|
|
|
|
== Conditional probability, partitions, law of total probability
|
|
|
|
Sometimes we want to analyze the probability of events in a sample space given
|
|
that we already know another event has occurred. Ergo, we want the probability
|
|
of $A in Omega$ conditional on the event $B in Omega$.
|
|
|
|
#definition[
|
|
For two events $A, B in Omega$, the probability of $A$ given $B$ is written
|
|
$
|
|
P(A | B)
|
|
$
|
|
]
|
|
|
|
#fact[
|
|
To calculate the conditional probability, use the following formula:
|
|
$
|
|
P(A | B) = (P(A B)) / (P(B))
|
|
$
|
|
]
|
|
|
|
Oftentimes we don't know $P(B)$, but we do know $P(B)$ given some events in
|
|
$Omega$. That is, we know the probability of $B$ conditional on some events.
|
|
For example, if we have a 50% chance of choosing a rigged (6-sided) die and a
|
|
50% chance of choosing a fair die, we know the probability of getting side $n$
|
|
given that we have the rigged die, and the probability of side $n$ given that
|
|
we have the fair die. Also note that we know the probability of both events
|
|
we're conditioning on (50% each), and they're disjoint events.
|
|
|
|
In these situations, the following law is useful:
|
|
|
|
#theorem[Law of total probability][
|
|
Given a _partition_ of $Omega$ with pairwise disjoint subsets $A_1, A_2, A_3, ..., A_n in Omega$, such that
|
|
$
|
|
union.big_(A_i in Omega) A_i = Omega \
|
|
sect.big_(A_i in Omega) A_i = emptyset
|
|
$
|
|
The probability of an event $B in Omega$ is given by
|
|
$
|
|
P(B) = P(B | A_1) P(A_1) + P(B | A_2) P(A_2) + dots.c + P(B | A_n) P(A_n)
|
|
$
|
|
]<law-total-prob>
|
|
|
|
#proof[
|
|
This is easy to show by writing the definition of the conditional probability
|
|
and simplifying.
|
|
]
|
|
|
|
== Baye's theorem
|
|
|
|
Finally let's discuss a rule for inverting conditional probabilities, that is,
|
|
getting $P(B | A)$ from $P(A | B)$.
|
|
|
|
#theorem[Baye's theorem][
|
|
Given two events $A,B in Omega$,
|
|
$
|
|
P(A | B) = (P(B | A)P(A)) / (P(B | A)P(A) + P(B | A^c)P(A^c))
|
|
$
|
|
]
|
|
|
|
#proof[
|
|
Apply the definition of conditional probability, then apply @law-total-prob
|
|
noting that $A$ and $A^c$ are a partitioning of $Omega$.
|
|
]
|
|
|
|
= Lecture #datetime(day: 23, month: 1, year: 2025).display()
|
|
|
|
== Independence
|
|
#definition("Independence")[
|
|
Two events $A subset Omega$ and $B subset Omega$ are independent if and only if
|
|
$ P(B sect A) = P(B)P(A) $
|
|
"Joint probability is equal to product of their marginal probabilities."
|
|
]
|
|
|
|
#fact[
|
|
This definition must be used to show the independence of two events.
|
|
]
|
|
|
|
#fact[
|
|
If $A$ and $B$ are independent, then,
|
|
$
|
|
P(A | B) = underbrace((P(A sect B)) / P(B), "conditional probability") = (P(A) P(B)) / P(B) = P(A)
|
|
$
|
|
]
|
|
|
|
#example[
|
|
Flip a fair coin 3 times. Let the events:
|
|
|
|
- $A$ = we have exactly one tails among the first 2 flips
|
|
- $B$ = we have exactly one tails among the last 2 flips
|
|
- $D$ = we get exactly one tails among all 3 flip
|
|
|
|
Show that $A$ and $B$ are independent.
|
|
What about $B$ and $D$?
|
|
|
|
Compute all of the possible events, then we see that
|
|
|
|
$
|
|
P(A sect B) = (hash (A sect B)) / (hash Omega) = 2 / 8 = 4 / 8 dot 4 / 8 = P(A) P(B)
|
|
$
|
|
|
|
So they are independent.
|
|
|
|
Repeat the same reasoning for $B$ and $D$, we see that they are not independent.
|
|
]
|
|
|
|
#example[
|
|
Suppose we have 4 red and 7 green balls in an urn. We choose two balls with replacement. Let
|
|
|
|
- $A$ = the first ball is red
|
|
- $B$ = the second ball is greeen
|
|
|
|
Are $A$ and $B$ independent?
|
|
|
|
$
|
|
hash Omega = 11 times 11 = 121 \
|
|
hash A = 4 dot 11 = 44 \
|
|
hash B = 11 dot 7 = 77 \
|
|
hash (A sect B) = 4 dot 7 = 28
|
|
$
|
|
]
|
|
|
|
#definition[
|
|
Events $A_1, ..., A_n$ are independent (mutually independent) if for every collection $A_i_1, ..., A_i_k$, where $2 <= k <= n$ and $1 <= i_1 < i_2 < dots.c < i_k <= n$,
|
|
|
|
$
|
|
P(A_i_1 sect A_i_2 sect dots.c sect A_i_k) = P(A_i_1) P(A_i_2) dots.c P(A_i_k)
|
|
$
|
|
]
|
|
|
|
#definition[
|
|
We say that the events $A_1, ..., A_n$ are *pairwise independent* if any two
|
|
different events $A_i$ and $A_j$ are independent for any $i != j$.
|
|
]
|
|
|
|
= A bit of review on random variables
|
|
|
|
== Random variables, discrete random variables
|
|
|
|
First, some brief exposition on random variables. Quixotically, a random
|
|
variable is actually a function.
|
|
|
|
Standard notation: $Omega$ is a sample space, $omega in Omega$ is an event.
|
|
|
|
#definition[
|
|
A *random variable* $X$ is a function $X : Omega -> RR$ that takes the set of
|
|
possible outcomes in a sample space, and maps it to a
|
|
#link("https://en.wikipedia.org/wiki/Measurable_space")[measurable space],
|
|
typically (as in our case) a subset of $RR$.
|
|
]
|
|
|
|
#definition[
|
|
The *state space* or *support* of a random variable $X$ is all of the values $X$ can take.
|
|
]
|
|
|
|
#example[
|
|
Let $X$ be a random variable that takes on the values ${0,1,2,3}$. Then the
|
|
state space of $X$ is the set ${0,1,2,3}$.
|
|
]
|
|
|
|
$X$ gives its important probabilistic information. The probability distribution
|
|
is a description of the probabilities $P(X in B)$ for subsets $B in RR$. We
|
|
describe the probability density function and the cumulative distribution
|
|
function.
|
|
|
|
A random variable $X$ is discrete if there is countable $A$ such that $P(X in
|
|
A) = 1$. $k$ is a possible value if $P(X = k) > 0$.
|
|
|
|
A discrete random variable has probability distribution entirely determined by
|
|
p.m.f $p(k) = P(X = k)$. The p.m.f. is a function from the set of possible
|
|
values of $X$ into $[0,1]$. Labeling the p.m.f. with the random variable is
|
|
done by $p_X (k)$.
|
|
|
|
By the axioms of probability,
|
|
|
|
$
|
|
sum_k p_X (k) = sum_k P(X=k) = 1
|
|
$
|
|
|
|
For a subset $B subset RR$,
|
|
|
|
$
|
|
P(X in B) = sum_(k in B) p_X (k)
|
|
$
|
|
|
|
== Continuous random variables
|
|
|
|
Now we introduce another major class of random variables.
|
|
|
|
#definition[
|
|
Let $X$ be a random variable. If $f$ satisfies
|
|
|
|
$
|
|
P(X <= b) = integral^b_(-infinity) f(x) dif x
|
|
$
|
|
|
|
for all $b in RR$, then $f$ is the *probability density function* of $X$.
|
|
]
|
|
|
|
The probability that $X in (-infinity, b]$ is equal to the area under the graph
|
|
of $f$ from $-infinity$ to $b$.
|
|
|
|
A corollary is the following.
|
|
|
|
#fact[
|
|
$ P(X in B) = integral_B f(x) dif x $
|
|
]
|
|
|
|
for any $B subset RR$ where integration makes sense.
|
|
|
|
The set can be bounded or unbounded, or any collection of intervals.
|
|
|
|
#fact[
|
|
$ P(a <= X <= b) = integral_a^b f(x) dif x $
|
|
$ P(X > a) = integral_a^infinity f(x) dif x $
|
|
]
|
|
|
|
#fact[
|
|
If a random variable $X$ has density function $f$ then individual point
|
|
values have probability zero:
|
|
|
|
$ P(X = c) = integral_c^c f(x) dif x = 0, forall c in RR $
|
|
]
|
|
|
|
#remark[
|
|
It follows a random variable with a density function is not discrete. Also
|
|
the probabilities of intervals are not changed by including or excluding
|
|
endpoints.
|
|
]
|
|
|
|
How to determine which functions are p.d.f.s? Since $P(-infinity < X <
|
|
infinity) = 1$, a p.d.f. $f$ must satisfy
|
|
|
|
$
|
|
f(x) >= 0 forall x in RR \
|
|
integral^infinity_(-infinity) f(x) dif x = 1
|
|
$
|
|
|
|
#fact[
|
|
Random variables with density functions are called _continuous_ random
|
|
variables. This does not imply that the random variable is a continuous
|
|
function on $Omega$ but it is standard terminology.
|
|
]
|
|
|
|
Named distributions of continuous random variables are introduced in the
|
|
following chapters.
|
|
|
|
= Lecture #datetime(day: 27, year: 2025, month: 1).display()
|
|
|
|
== Bernoulli trials
|
|
|
|
The setup: the experiment has exactly two outcomes:
|
|
- Success -- $S$ or 1
|
|
- Failure -- $F$ or 0
|
|
|
|
Additionally:
|
|
$
|
|
P(S) = p, (0 < p < 1) \
|
|
P(F) = 1 - p = q
|
|
$
|
|
|
|
Construct the probability mass function:
|
|
|
|
$
|
|
P(X = 1) = p \
|
|
P(X = 0) = 1 - p
|
|
$
|
|
|
|
Write it as:
|
|
|
|
$ p_x(k) = p^k (1-p)^(1-k) $
|
|
|
|
for $k = 1$ and $k = 0$.
|
|
|
|
== Binomial distribution
|
|
|
|
The setup: very similar to Bernoulli, trials have exactly 2 outcomes. A bunch
|
|
of Bernoulli trials in a row.
|
|
|
|
Importantly: $p$ and $q$ are defined exactly the same in all trials.
|
|
|
|
This ties the binomial distribution to the sampling with replacement model,
|
|
since each trial does not affect the next.
|
|
|
|
We conduct $n$ *independent* trials of this experiment. Example with coins: each
|
|
flip independently has a $1/2$ chance of heads or tails (holds same for die,
|
|
rigged coin, etc).
|
|
|
|
$n$ is fixed, i.e. known ahead of time.
|
|
|
|
== Binomial random variable
|
|
|
|
Let $X = hash$ of successes in $n$ independent trials. For any particular
|
|
sequence of $n$ trials, it takes the form $Omega = {omega} "where" omega = S
|
|
F F dots.c F$ and is of length $n$.
|
|
|
|
Then $X(omega) = 0,1,2,...,n$ can take $n + 1$ possible values. The
|
|
probability of any particular sequence is given by the product of the
|
|
individual trial probabilities.
|
|
|
|
#example[
|
|
$ omega = S F F S F dots.c S = (p q q p q dots.c p) $
|
|
]
|
|
|
|
So $P(x = 0) = P(F F F dots.c F) = q dot q dot dots.c dot q = q^n$.
|
|
|
|
And
|
|
$
|
|
P(X = 1) = P(S F F dots.c F) + P(F S F F dots.c F) + dots.c + P(F F F dots.c F S) \
|
|
= underbrace(n, "possible outcomes") dot p^1 dot p^(n-1) \
|
|
= vec(n, 1) dot p^1 dot p^(n-1) \
|
|
= n dot p^1 dot p^(n-1)
|
|
$
|
|
|
|
Now we can generalize
|
|
|
|
$
|
|
P(X = 2) = vec(n,2) p^2 q^(n-2)
|
|
$
|
|
|
|
How about all successes?
|
|
|
|
$
|
|
P(X = n) = P(S S dots.c S) = p^n
|
|
$
|
|
|
|
We see that for all failures we have $q^n$ and all successes we have $p^n$.
|
|
Otherwise we use our method above.
|
|
|
|
In general, here is the probability mass function for the binomial random variable
|
|
|
|
$
|
|
P(X = k) = vec(n, k) p^k q^(n-k), "for" k = 0,1,2,...,n
|
|
$
|
|
|
|
|
|
Binomial distribution is very powerful. Choosing between two things, what are the probabilities?
|
|
|
|
To summarize the characterization of the binomial random variable:
|
|
|
|
- $n$ independent trials
|
|
- each trial results in binary success or failure
|
|
- with probability of success $p$, identically across trials
|
|
|
|
with $X = hash$ successes in *fixed* $n$ trials.
|
|
|
|
$ X ~ "Bin"(n,p) $
|
|
|
|
with probability mass function
|
|
|
|
$
|
|
P(X = x) = vec(n,x) p^x (1 - p)^(n-x) = p(x) "for" x = 0,1,2,...,n
|
|
$
|
|
|
|
We see this is in fact the binomial theorem!
|
|
|
|
$
|
|
p(x) >= 0, sum^n_(x=0) p(x) = sum^n_(x=0) vec(n,x) p^x q^(n-x) = (p + q)^n
|
|
$
|
|
|
|
In fact,
|
|
$
|
|
(p + q)^n = (p + (1 - p))^n = 1
|
|
$
|
|
|
|
#example[
|
|
Family 5 children, what is the probability that number of males = 2 if we
|
|
assume births are independent and probability of a male is 0.5.
|
|
|
|
First we check binomial criteria: $n$ independent trials, well formed
|
|
$S$/$F$, probability the same across trials. Let's say male is $S$ and
|
|
otherwise $F$.
|
|
|
|
We have $n=5$ and $p = 0.5$. We just need $P(X = 2)$.
|
|
|
|
$
|
|
P(X = 2) = vec(5,2) (0.5)^2 (0.5)^3 \
|
|
= (5 dot 4) / (2 dot 1) (1 / 2)^5 = 10 / 32
|
|
$
|
|
]
|
|
|
|
#example[
|
|
What is the probability of getting exactly three aces (1's) out of 10 throws
|
|
of a fair die?
|
|
|
|
Seems a little trickier but we can still write this as well defined $S$/$F$.
|
|
Let $S$ be getting an ace and $F$ being anything else.
|
|
|
|
Then $p = 1/6$ and $n = 10$. We want $P(X=3)$. So
|
|
|
|
$
|
|
P(X=3) = vec(10,3) p^3 q^7 = vec(10,3) (1 / 6)^3 (5 / 6)^7 \
|
|
approx 0.15505
|
|
$
|
|
]
|
|
|
|
#example[
|
|
Suppose we have two types of candy, red and black. Select $n$ candies. Let $X$
|
|
be the number of red candies among $n$ selected.
|
|
|
|
2 cases.
|
|
|
|
- case 1: with replacement: Binomial Distribution, $n$, $p = a/(a + b)$.
|
|
$ P(X = 2) = vec(n,2) (a / (a+b))^2 (b / (a+b))^(n-2) $
|
|
- case 2: without replacement: then use counting
|
|
$ P(X = x) = (vec(a,x) vec(b,n-x)) / vec(a+b,n) = p(x) $
|
|
]
|
|
|
|
We've done case 2 before, but now we introduce a random variable to represent
|
|
it.
|
|
|
|
$ P(X = x) = (vec(a,x) vec(b,n-x)) / vec(a+b,n) = p(x) $
|
|
|
|
is known as a *Hypergeometric distribution*.
|
|
|
|
== Hypergeometric distribution
|
|
|
|
There are different characterizations of the parameters, but
|
|
|
|
$ X ~ "Hypergeom"(hash "total", hash "successes", "sample size") $
|
|
|
|
For example,
|
|
$ X ~ "Hypergeom"(N, a, n) "where" N = a+b $
|
|
|
|
In the textbook, it's
|
|
$ X ~ "Hypergeom"(N, N_a, n) $
|
|
|
|
#remark[
|
|
If $x$ is very small relative to $a + b$, then both cases give similar (approx.
|
|
the same) answers.
|
|
]
|
|
|
|
For instance, if we're sampling for blood types from UCSB, and we take a
|
|
student out without replacement, we don't really change the sample size
|
|
substantially. So both answers give a similar result.
|
|
|
|
Suppose we have two types of items, type $A$ and type $B$. Let $N_A$ be $hash$
|
|
type $A$, $N_B$ $hash$ type $B$. $N = N_A + N_B$ is the total number of
|
|
objects.
|
|
|
|
We sample $n$ items *without replacement* ($n <= N$) with order not mattering.
|
|
Denote by $X$ the number of type $A$ objects in our sample.
|
|
|
|
#definition[
|
|
Let $0 <= N_A <= N$ and $1 <= n <= N$ be integers. A random variable $X$ has the *hypergeometric distribution* with parameters $(N, N_A, n)$ if $X$ takes values in the set ${0,1,...,n}$ and has p.m.f.
|
|
|
|
$ P(X = k) = (vec(N_A,k) vec(N-N_A,n-k)) / vec(N,n) = p(k) $
|
|
]
|
|
|
|
#example[
|
|
Let $N_A = 10$ defectives. Let $N_B = 90$ non-defectives. We select $n=5$ without replacement. What is the probability that 2 of the 5 selected are defective?
|
|
|
|
$
|
|
X ~ "Hypergeom" (N = 100, N_A = 10, n = 5)
|
|
$
|
|
|
|
We want $P(X=2)$.
|
|
|
|
$
|
|
P(X=2) = (vec(10,2) vec(90,3)) / vec(100,5) approx 0.0702
|
|
$
|
|
]
|
|
|
|
#remark[
|
|
Make sure you can distinguish when a problem is binomial or when it is
|
|
hypergeometric. This is very important on exams.
|
|
|
|
Recall that both ask about number of successes, in a fixed number of trials.
|
|
But binomial is sample with replacement (each trial is independent) and
|
|
sampling without replacement is hypergeometric.
|
|
]
|
|
|
|
#example[
|
|
Cat gives birth to 6 kittens. 2 are male, 4 are females. Your neighbor comes and picks up 3 kittens randomly to take home with them.
|
|
|
|
How to define random variable? What is p.m.f.?
|
|
|
|
Let $X$ be the number of male cats in the neighbor's selection.
|
|
|
|
$ X ~ "Hypergeom"(N = 6, N_A = 2, n = 3) $
|
|
and $X$ takes values in ${0,1,2}$. Find the p.m.f. by finding probabilities for these values.
|
|
|
|
$
|
|
&P(X = 0) = (vec(2,0) vec(4,3)) / vec(6,3) = 4 / 20 \
|
|
&P(X = 1) = (vec(2,1) vec(4,2)) / vec(6,3) = 12 / 20 \
|
|
&P(X = 2) = (vec(2,2) vec(4,1)) / vec(6,3) = 4 / 20 \
|
|
&P(X = 3) = (vec(2,3) vec(4,0)) / vec(6,3) = 0
|
|
$
|
|
|
|
Note that for $P(X=3)$, we are asking for 3 successes (drawing males) where
|
|
there are only 2 males, so it must be 0.
|
|
]
|
|
|
|
== Geometric distribution
|
|
|
|
Consider an infinite sequence of independent trials. e.g. number of attempts until I make a basket.
|
|
|
|
Let $X_i$ denote the outcome of the $i^"th"$ trial, where success is 1 and failure is 0. Let $N$ be the number of trials needed to observe the first success in a sequence of independent trials with probability of success $p$. Then
|
|
|
|
We fail $k-1$ times and succeed on the $k^"th"$ try. Then:
|
|
|
|
$
|
|
P(N = k) = P(X_1 = 0, X_2 = 0, ..., X_(k-1) = 0, X_k = 1) = (1 - p)^(k-1) p
|
|
$
|
|
|
|
This is the probability of failures raised to the amount of failures, times
|
|
probability of success.
|
|
|
|
The key characteristic in these trials, we keep going until we succeed. There's
|
|
no $n$ choose $k$ in front like the binomial distribution because there's
|
|
exactly one sequence that gives us success.
|
|
|
|
#definition[
|
|
Let $0 < p <= 1$. A random variable $X$ has the geometric distribution with
|
|
success parameter $p$ if the possible values of $X$ are ${1,2,3,...}$ and $X$
|
|
satisfies
|
|
|
|
$
|
|
P(X=k) = (1-p)^(k-1) p
|
|
$
|
|
|
|
for positive integers $k$. Abbreviate this by $X ~ "Geom"(p)$.
|
|
]
|
|
|
|
#example[
|
|
What is the probability it takes more than seven rolls of a fair die to roll a
|
|
six?
|
|
|
|
Let $X$ be the number of rolls of a fair die until the first six. Then $X ~
|
|
"Geom"(1/6)$. Now we just want $P(X > 7)$.
|
|
|
|
$
|
|
P(X > 7) = sum^infinity_(k=8) P(X=k) = sum^infinity_(k=8) (5 / 6)^(k-1) 1 / 6
|
|
$
|
|
|
|
Re-indexing,
|
|
|
|
$
|
|
sum^infinity_(k=8) (5 / 6)^(k-1) 1 / 6 = 1 / 6 (5 / 6)^7 sum^infinity_(j=0) (5 / 6)^j
|
|
$
|
|
|
|
Now we calculate by standard methods:
|
|
|
|
$
|
|
1 / 6 (5 / 6)^7 sum^infinity_(j=0) (5 / 6)^j = 1 / 6 (5 / 6)^7 dot 1 / (1-5 / 6) =
|
|
(5 / 6)^7
|
|
$
|
|
]
|
|
|
|
= Some more discrete distributions
|
|
|
|
== Negative binomial
|
|
|
|
Consider a sequence of Bernoulli trials with the following characteristics:
|
|
|
|
- Each trial success or failure
|
|
- Prob. of success $p$ is same on each trial
|
|
- Trials are independent (notice they are not fixed to specific number)
|
|
- Experiment continues until $r$ successes are observed, where $r$ is a given parameter
|
|
|
|
Then if $X$ is the number of trials necessary until $r$ successes are observed,
|
|
we say $X$ is a *negative binomial* random variable.
|
|
|
|
#definition[
|
|
Let $k in ZZ^+$ and $0 < p <= 1$. A random variable $X$ has the negative
|
|
binomial distribution with parameters ${k,p}$ if the possible values of $X$
|
|
are the integers ${k,k+1, k+2, ...}$ and the p.m.f. is
|
|
|
|
$
|
|
P(X = n) = vec(n-1, k-1) p^k (1-p)^(n-k) "for" n >= k
|
|
$
|
|
|
|
Abbreviate this by $X ~ "Negbin"(k,p)$.
|
|
]
|
|
|
|
#example[
|
|
Steph Curry has a three point percentage of approx. $43%$. What is the
|
|
probability that Steph makes his third three-point basket on his $5^"th"$
|
|
attempt?
|
|
|
|
Let $X$ be number of attempts required to observe the 3rd success. Then,
|
|
|
|
$
|
|
X ~ "Negbin"(k = 3, p = 0.43)
|
|
$
|
|
|
|
So,
|
|
$
|
|
P(X = 5) &= vec(5-1,3-1)(0.43)^3 (1 - 0.43)^(5-3) \
|
|
&= vec(4,2) (0.43)^3 (0.57)^2 \
|
|
&approx 0.155
|
|
$
|
|
]
|
|
|
|
== Poisson distribution
|
|
|
|
This p.m.f. follows from the Taylor expansion
|
|
|
|
$
|
|
e^lambda = sum_(k=0)^infinity lambda^k / k!
|
|
$
|
|
|
|
which implies that
|
|
|
|
$
|
|
sum_(k=0)^infinity e^(-lambda) lambda^k / k! = e^(-lambda) e^lambda = 1
|
|
$
|
|
|
|
#definition[
|
|
For an integer valued random variable $X$, we say $X ~ "Poisson"(lambda)$ if it has p.m.f.
|
|
|
|
$ P(X = k) = e^(-lambda) lambda^k / k! $
|
|
|
|
for $k in {0,1,2,...}$ for $lambda > 0$ and
|
|
|
|
$
|
|
sum_(k = 0)^infinity P(X=k) = 1
|
|
$
|
|
]
|
|
|
|
The Poisson arises from the Binomial. It applies in the binomial context when
|
|
$n$ is very large ($n >= 100$) and $p$ is very small $p <= 0.05$, such that $n
|
|
p$ is a moderate number ($n p < 10$).
|
|
|
|
Then $X$ follows a Poisson distribution with $lambda = n p$.
|
|
|
|
$
|
|
P("Bin"(n,p) = k) approx P("Poisson"(lambda = n p) = k)
|
|
$
|
|
|
|
for $k = 0,1,...,n$.
|
|
|
|
The Poisson distribution is useful for finding the probabilities of rare events
|
|
over a continuous interval of time. By knowing $lambda = n p$ for small $n$ and
|
|
$p$, we can calculate many probabilities.
|
|
|
|
#example[
|
|
The number of typing errors in the page of a textbook.
|
|
|
|
Let
|
|
|
|
- $n$ be the number of letters of symbols per page (large)
|
|
- $p$ be the probability of error, small enough such that
|
|
- $lim_(n -> infinity) lim_(p -> 0) n p = lambda = 0.1$
|
|
|
|
What is the probability of exactly 1 error?
|
|
|
|
We can approximate the distribution of $X$ with a $"Poisson"(lambda = 0.1)$
|
|
distribution
|
|
|
|
$
|
|
P(X = 1) = (e^(-0.1) (0.1)^1) / 1! = 0.09048
|
|
$
|
|
]
|
|
|
|
#example[
|
|
The number of reported auto accidents in a big city on any given day
|
|
|
|
Let
|
|
|
|
- $n$ be the number of autos on the road
|
|
- $p$ be the probability of an accident for any individual is small such that
|
|
$lim_(n->infinity) lim_(p->0) n p = lambda = 2$
|
|
|
|
What is the probability of no accidents today?
|
|
|
|
We can approximate $X$ by $"Poisson"(lambda = 2)$
|
|
|
|
$
|
|
P(X = 0) = (e^(-2) (2)^0) / 0! = 0.1353
|
|
$
|
|
]
|
|
|
|
A discrete example:
|
|
|
|
#example[
|
|
Suppose we have an election with candidates $B$ and $W$. A total of 10,000
|
|
ballots were cast such that
|
|
|
|
$
|
|
10,000 "votes" cases(5005 space B, 4995 space W)
|
|
$
|
|
|
|
But 15 ballots had irregularities and were disqualified. What is the
|
|
probability that the election results will change?
|
|
|
|
There are three combinations of disqualified ballots that would result in a
|
|
different election outcome: 13 $B$ and 2 $W$, 14 $B$ and 1 $W$, and 15 $B$
|
|
and 0 $W$. What is the probability of these?
|
|
]
|
|
|
|
= Lecture #datetime(day: 3, month: 2, year: 2025).display()
|
|
|
|
== CDFs, PMFs, PDFs
|
|
|
|
#definition[
|
|
Let $X$ be a random variable. If we have a function $f$ such that
|
|
|
|
$
|
|
P(X <= b) = integral^b_(-infinity) f(x) dif x
|
|
$
|
|
for all $b in RR$, then $f$ is the *probability density function* of $X$.
|
|
]
|
|
|
|
The probability that the value of $X$ lies in $(-infinity, b]$ equals the area
|
|
under the curve of $f$ from $-infinity$ to $b$.
|
|
|
|
If $f$ satisfies this definition, then for any $B subset RR$ for which integration makes sense,
|
|
|
|
$
|
|
P(X in B) = integral_B f(x) dif x
|
|
$
|
|
|
|
Properties of a CDF:
|
|
|
|
Any CDF $F(x) = P(X <= x)$ satisfies
|
|
|
|
1. $F(-infinity) = 0$, $F(infinity) = 1$
|
|
2. $F(x)$ is non-decreasing in $x$ (monotonically increasing)
|
|
$ s < t => F(s) <= F(t) $
|
|
3. $P(a < X <= b) = P(X <= b) - P(X <= a) = F(b) - F(a)$
|
|
|
|
#example[
|
|
Let $X$ be a continuous random variable with density (pdf)
|
|
|
|
$
|
|
f(x) = cases(
|
|
c x^2 &"for" 0 < x < 2,
|
|
0 &"otherwise"
|
|
)
|
|
$
|
|
|
|
1. What is $c$?
|
|
|
|
$c$ is such that
|
|
$
|
|
1 = integral^infinity_(-infinity) f(x) dif x = integral_0^2 c x^2 dif x
|
|
$
|
|
|
|
2. Find the probability that $X$ is between 1 and 1.4.
|
|
|
|
Integrate the curve between 1 and 1.4.
|
|
|
|
$
|
|
integral_1^1.4 3 / 8 x^2 dif x = (x^3 / 8) |_1^1.4 \
|
|
= 0.218
|
|
$
|
|
|
|
This is the probability that $X$ lies between 1 and 1.4.
|
|
|
|
3. Find the probability that $X$ is between 1 and 3.
|
|
|
|
Idea: integrate between 1 and 3, be careful after 2.
|
|
|
|
$ integral^2_1 3 / 8 x^2 dif x + integral_2^3 0 dif x = $
|
|
|
|
4. What is the CDF for $P(X <= x)$? Integrate the curve to $x$.
|
|
|
|
$
|
|
F(x) = P(X <= x) = integral_(-infinity)^x f(t) dif t \
|
|
= integral_0^x 3 / 8 t^2 dif t \
|
|
= x^3 / 8
|
|
$
|
|
|
|
Important: include the range!
|
|
|
|
$
|
|
F(x) = cases(
|
|
0 &"for" x <= 0,
|
|
x^3/8 &"for" 0 < x < 2,
|
|
1 &"for" x >= 2
|
|
)
|
|
$
|
|
|
|
5. Find a point $a$ such that you integrate up to the point to find exactly $1/2$
|
|
the area.
|
|
|
|
We want to find $1/2 = P(X <= a)$.
|
|
|
|
$ 1 / 2 = P(X <= a) = F(a) = a^3 / 8 => a = root(3, 4) $
|
|
]
|
|
|
|
== The (continuous) uniform distribution
|
|
|
|
The most simple and the best of the named distributions!
|
|
|
|
#definition[
|
|
Let $[a,b]$ be a bounded interval on the real line. A random variable $X$ has the uniform distribution on the interval $[a,b]$ if $X$ has the density function
|
|
|
|
$
|
|
f(x) = cases(
|
|
1/(b-a) &"for" x in [a,b],
|
|
0 &"for" x in.not [a,b]
|
|
)
|
|
$
|
|
|
|
Abbreviate this by $X ~ "Unif" [a,b]$.
|
|
]<continuous-uniform>
|
|
|
|
The graph of $"Unif" [a,b]$ is a constant line at height $1/(b-a)$ defined
|
|
across $[a,b]$. The integral is just the area of a rectangle, and we can check
|
|
it is 1.
|
|
|
|
#fact[
|
|
For $X ~ "Unif" [a,b]$, its cumulative distribution function (CDF) is given by:
|
|
|
|
$
|
|
F_x (x) = cases(
|
|
0 &"for" x < a,
|
|
(x-a)/(b-a) &"for" x in [a,b],
|
|
1 &"for" x > b
|
|
)
|
|
$
|
|
]
|
|
|
|
#fact[
|
|
If $X ~ "Unif" [a,b]$, and $[c,d] subset [a,b]$, then
|
|
$
|
|
P(c <= X <= d) = integral_c^d 1 / (b-a) dif x = (d-c) / (b-a)
|
|
$
|
|
]
|
|
|
|
#example[
|
|
Let $Y$ be a uniform random variable on $[-2,5]$. Find the probability that its
|
|
absolute value is at least 1.
|
|
|
|
$Y$ takes values in the interval $[-2,5]$, so the absolute value is at least 1 iff. $Y in [-2,1] union [1,5]$.
|
|
|
|
The density function of $Y$ is $f(x) = 1/(5- (-2)) = 1/7$ on $[-2,5]$ and 0 everywhere else.
|
|
|
|
So,
|
|
|
|
$
|
|
P(|Y| >= 1) &= P(Y in [-2,-1] union [1,5]) \
|
|
&= P(-2 <= Y <= -1) + P(1 <= Y <= 5) \
|
|
&= 5 / 7
|
|
$
|
|
]
|
|
|
|
== The exponential distribution
|
|
|
|
The geometric distribution can be viewed as modeling waiting times, in a discrete setting, i.e. we wait for $n - 1$ failures to arrive at the $n^"th"$ success.
|
|
|
|
The exponential distribution is the continuous analogue to the geometric
|
|
distribution, in that we often use it to model waiting times in the continuous
|
|
sense. For example, the first custom to enter the barber shop.
|
|
|
|
#definition[
|
|
Let $0 < lambda < infinity$. A random variable $X$ has the exponential distribution with parameter $lambda$ if $X$ has PDF
|
|
|
|
$
|
|
f(x) = cases(
|
|
lambda e^(-lambda x) &"for" x >= 0,
|
|
0 &"for" x < 0
|
|
)
|
|
$
|
|
|
|
Abbreviate this by $X ~ "Exp"(lambda)$, the exponential distribution with rate $lambda$.
|
|
|
|
The CDF of the $"Exp"(lambda)$ distribution is given by:
|
|
|
|
$
|
|
F(t) + cases(
|
|
0 &"if" t <0,
|
|
1 - e^(-lambda t) &"if" t>= 0
|
|
)
|
|
$
|
|
]
|
|
|
|
#example[
|
|
Suppose the length of a phone call, in minutes, is well modeled by an exponential random variable with a rate $lambda = 1/10$.
|
|
|
|
1. What is the probability that a call takes more than 8 minutes?
|
|
2. What is the probability that a call takes between 8 and 22 minutes?
|
|
|
|
Let $X$ be the length of the phone call, so that $X ~ "Exp"(1/10)$. Then we can find the desired probability by:
|
|
|
|
$
|
|
P(X > 8) &= 1 - P(X <= 8) \
|
|
&= 1 - F_x (8) \
|
|
&= 1 - (1 - e^(-(1 / 10) dot 8)) \
|
|
&= e^(-8 / 10) approx 0.4493
|
|
$
|
|
|
|
Now to find $P(8 < X < 22)$, we can take the difference in CDFs:
|
|
|
|
$
|
|
&P(X > 8) - P(X >= 22) \
|
|
&= e^(-8 / 10) - e^(-22 / 10) \
|
|
&approx 0.3385
|
|
$
|
|
]
|
|
|
|
#fact("Memoryless property of the exponential distribution")[
|
|
Suppose that $X ~ "Exp"(lambda)$. Then for any $s,t > 0$, we have
|
|
$
|
|
P(X > t + s | X > t) = P(X > s)
|
|
$
|
|
]<memoryless>
|
|
|
|
This is like saying if I've been waiting 5 minutes and then 3 minutes for the
|
|
bus, what is the probability that I'm gonna wait more than 5 + 3 minutes, given
|
|
that I've already waited 5 minutes? And that's precisely equal to just the
|
|
probability I'm gonna wait more than 3 minutes.
|
|
|
|
#proof[
|
|
$
|
|
P(X > t + s | X > t) = (P(X > t + s sect X > t)) / (P(X > t)) \
|
|
= P(X > t + s) / P(X > t)
|
|
= e^(-lambda (t+ s)) / (e^(-lambda t)) = e^(-lambda s) \
|
|
equiv P(X > s)
|
|
$
|
|
]
|
|
|
|
== Gamma distribution
|
|
|
|
#definition[
|
|
Let $r, lambda > 0$. A random variable $X$ has the *gamma distribution* with parameters $(r, lambda)$ if $X$ is nonnegative and has probability density function
|
|
|
|
$
|
|
f(x) = cases(
|
|
(lambda^r x^(r-2))/(Gamma(r)) e^(-lambda x) &"for" x >= 0,
|
|
0 &"for" x < 0
|
|
)
|
|
$
|
|
|
|
Abbreviate this by $X ~ "Gamma"(r, lambda)$.
|
|
]
|
|
|
|
The gamma function $Gamma(r)$ generalizes the factorial function and is defined as
|
|
|
|
$
|
|
Gamma(r) = integral_0^infinity x^(r-1) e^(-x) dif x, "for" r > 0
|
|
$
|
|
|
|
Special case: $Gamma(n) = (n - 1)!$ if $n in ZZ^+$.
|
|
|
|
#remark[
|
|
The $"Exp"(lambda)$ distribution is a special case of the gamma distribution,
|
|
with parameter $r = 1$.
|
|
]
|
|
|
|
== The normal (Gaussian) distribution
|
|
|
|
#definition[
|
|
A random variable $Z$ has the *standard normal distribution* if $Z$ has
|
|
density function
|
|
|
|
$
|
|
phi(x) = 1 / sqrt(2 pi) e^(-x^2 / 2)
|
|
$
|
|
on the real line. Abbreviate this by $Z ~ N(0,1)$.
|
|
]<normal-dist>
|
|
|
|
#fact("CDF of a standard normal random variable")[
|
|
Let $Z~N(0,1)$ be normally distributed. Then its CDF is given by
|
|
$
|
|
Phi(x) = integral_(-infinity)^x phi(s) dif s = integral_(-infinity)^x 1 / sqrt(2 pi) e^(-(-s^2) / 2) dif s
|
|
$
|
|
]
|
|
|
|
The normal distribution is so important, instead of the standard $f_Z(x)$ and
|
|
$F_z(x)$, we use the special $phi(x)$ and $Phi(x)$.
|
|
|
|
#fact[
|
|
$
|
|
integral_(-infinity)^infinity e^(-s^2 / 2) dif s = sqrt(2 pi)
|
|
$
|
|
|
|
No closed form of the standard normal CDF $Phi$ exists, so we are left to either:
|
|
- approximate
|
|
- use technology (calculator)
|
|
- use the standard normal probability table in the textbook
|
|
]
|
|
|
|
To evaluate negative values, we can use the symmetry of the normal distribution
|
|
to apply the following identity:
|
|
|
|
$
|
|
Phi(-x) = 1 - Phi(x)
|
|
$
|
|
|
|
== General normal distributions
|
|
|
|
The general family of normal distributions is obtained by linear or affine
|
|
transformations of $Z$. Let $mu$ be real, and $sigma > 0$, then
|
|
|
|
$
|
|
X = sigma Z + mu
|
|
$
|
|
is also a normally distributed random variable with parameters $(mu, sigma^2)$.
|
|
The CDF of $X$ in terms of $Phi(dot)$ can be expressed as
|
|
|
|
$
|
|
F_X (x) &= P(X <= x) \
|
|
&= P(sigma Z + mu <= x) \
|
|
&= P(Z <= (x - mu) / sigma) \
|
|
&= Phi((x-mu)/sigma)
|
|
$
|
|
|
|
Also,
|
|
|
|
$
|
|
f(x) = F'(x) = dif / (dif x) [Phi((x-u)/sigma)] = 1 / sigma phi((x-u)/sigma) = 1 / sqrt(2 pi sigma^2) e^(-((x-mu)^2) / (2sigma^2))
|
|
$
|
|
|
|
#definition[
|
|
Let $mu$ be real and $sigma > 0$. A random variable $X$ has the _normal distribution_ with mean $mu$ and variance $sigma^2$ if $X$ has density function
|
|
|
|
$
|
|
f(x) = 1 / sqrt(2 pi sigma^2) e^(-((x-mu)^2) / (2sigma^2))
|
|
$
|
|
|
|
on the real line. Abbreviate this by $X ~ N(mu, sigma^2)$.
|
|
]
|
|
|
|
#fact[
|
|
Let $X ~ N(mu, sigma^2)$ and $Y = a X + b$. Then
|
|
$
|
|
Y ~ N(a mu + b, a^2 sigma^2)
|
|
$
|
|
|
|
That is, $Y$ is normally distributed with parameters $(a mu + b, a^2 sigma^2)$.
|
|
In particular,
|
|
$
|
|
Z = (X - mu) / sigma ~ N(0,1)
|
|
$
|
|
is a standard normal variable.
|
|
]
|
|
|
|
= Lecture #datetime(day: 11, year: 2025, month: 2).display()
|
|
|
|
== Expectation
|
|
|
|
#definition[
|
|
The expectation or mean of a discrete random variable $X$ is the weighted
|
|
average, with weights assigned by the corresponding probabilities.
|
|
|
|
$
|
|
E(X) = sum_("all" x_i) x_i dot p(x_i)
|
|
$
|
|
]
|
|
|
|
#example[
|
|
Find the expected value of a single roll of a fair die.
|
|
|
|
- $X = "score" / "dots"$
|
|
- $x = 1,2,3,4,5,6$
|
|
- $p(x) = 1 / 6, 1 / 6,1 / 6,1 / 6,1 / 6,1 / 6$
|
|
|
|
$
|
|
E[x] = 1 dot 1 / 6 + 2 dot 1 / 6 ... + 6 dot 1 / 6
|
|
$
|
|
]
|
|
|
|
== Binomial expected value
|
|
|
|
$
|
|
E[x] = n p
|
|
$
|
|
|
|
== Bernoulli expected value
|
|
|
|
Bernoulli is just binomial with one trial.
|
|
|
|
Recall that $P(X=1) = p$ and $P(X=0) = 1 - p$.
|
|
|
|
$
|
|
E[X] = 1 dot P(X=1) + 0 dot P(X=0) = p
|
|
$
|
|
|
|
Let $A$ be an event on $Omega$. Its _indicator random variable_ $I_A$ is defined
|
|
for $omega in Omega$ by
|
|
|
|
$
|
|
I_A (omega) = cases(1", if " &omega in A, 0", if" &omega in.not A)
|
|
$
|
|
|
|
$
|
|
E[I_A] = 1 dot P(A) = P(A)
|
|
$
|
|
|
|
== Geometric expected value
|
|
|
|
Let $p in [0,1]$ and $X ~ "Geom"[ p ]$ be a geometric RV with probability of
|
|
success $p$. Recall that the p.m.f. is $p q^(k-1)$, where prob. of failure is defined by $q := 1-p$.
|
|
|
|
Then
|
|
|
|
$
|
|
E[X] &= sum_(k=1)^infinity k p q^(k-1) \
|
|
&= p dot sum_(k=1)^infinity k dot q^(k-1)
|
|
$
|
|
|
|
Now recall from calculus that you can differentiate a power series term by term inside its radius of convergence. So for $|t| < 1$,
|
|
|
|
$
|
|
sum_(k=1)^infinity k t^(k-1) =
|
|
sum_(k=1)^infinity dif / (dif t) t^k = dif / (dif t) sum_(k=1)^infinity t^k = dif / (dif t) (1 / (1-t)) = 1 / (1-t)^2 \
|
|
therefore E[x] = sum^infinity_(k=1) k p q^(k-1) = p sum^infinity_(k=1) k q^(k-1) = p (1 / (1 - q)^2) = 1 / p
|
|
$
|
|
|
|
== Expected value of a continuous RV
|
|
|
|
#definition[
|
|
The expectation or mean of a continuous random variable $X$ with density
|
|
function $f$ is
|
|
|
|
$
|
|
E[x] = integral_(-infinity)^infinity x dot f(x) dif x
|
|
$
|
|
|
|
An alternative symbol is $mu = E[x]$.
|
|
]
|
|
|
|
$mu$ is the "first moment" of $X$, analogous to physics, its the "center of
|
|
gravity" of $X$.
|
|
|
|
#remark[
|
|
In general when moving between discrete and continuous RV, replace sums with
|
|
integrals, p.m.f. with p.d.f., and vice versa.
|
|
]
|
|
|
|
#example[
|
|
Suppose $X$ is a continuous RV with p.d.f.
|
|
|
|
$
|
|
f_X (x) = cases(2x", " &0 < x < 1, 0"," &"elsewhere")
|
|
$
|
|
|
|
$
|
|
E[X] = integral_(-infinity)^infinity x dot f(x) dif x = integral^1_0 x dot 2x dif x = 2 / 3
|
|
$
|
|
]
|
|
|
|
#example("Uniform expectation")[
|
|
Let $X$ be a uniform random variable on the interval $[a,b]$ with $X ~
|
|
"Unif"[a,b]$. Find the expected value of $X$.
|
|
|
|
$
|
|
E[X] = integral^infinity_(-infinity) x dot f(x) dif x = integral_a^b x / (b-a) dif x \
|
|
= 1 / (b-a) integral_a^b x dif x = 1 / (b-a) dot (b^2 - a^2) / 2 = underbrace((b+a) / 2, "midpoint formula")
|
|
$
|
|
]
|
|
|
|
#example("Exponential expectation")[
|
|
Find the expected value of an exponential RV, with p.d.f.
|
|
|
|
$
|
|
f_X (x) = cases(lambda e^(-lambda x)", " &x > 0, 0"," &"elsewhere")
|
|
$
|
|
|
|
$
|
|
E[x] = integral_(-infinity)^infinity x dot f(x) dif x = integral_0^infinity x dot lambda e^(-lambda x) dif x \
|
|
= lambda dot integral_0^infinity x dot e^(-lambda x) dif x \
|
|
= lambda dot [lr(-x 1 / lambda e^(-lambda x) |)_(x=0)^(x=infinity) - integral_0^infinity -1 / lambda e^(-lambda x) dif x] \
|
|
= 1 / lambda
|
|
$
|
|
]
|
|
|
|
#example("Uniform dartboard")[
|
|
Our dartboard is a disk of radius $r_0$ and the dart lands uniformly at
|
|
random on the disk when thrown. Let $R$ be the distance of the dart from the
|
|
center of the disk. Find $E[R]$ given density function
|
|
|
|
$
|
|
f_R (t) = cases((2t)/(r_0 ^2)", " &0 <= t <= r_0, 0", " &t < 0 "or" t > r_0)
|
|
$
|
|
|
|
$
|
|
E[R] = integral_(-infinity)^infinity t f_R (t) dif t \
|
|
= integral^(r_0)_0 t dot (2t) / (r_0^2) dif t \
|
|
= 2 / 3 r_0
|
|
$
|
|
]
|
|
|
|
== Expectation of derived values
|
|
|
|
If we can find the expected value of $X$, can we find the expected value of
|
|
$X^2$? More precisely, can we find $E[X^2]$?
|
|
|
|
If the distribution is easy to see, then this is trivial. Otherwise we have the
|
|
following useful property:
|
|
|
|
$
|
|
E[X^2] = integral_("all" x) x^2 f_X (x) dif x
|
|
$
|
|
|
|
(for continuous RVs).
|
|
|
|
And in the discrete case,
|
|
|
|
$
|
|
E[X^2] = sum_("all" x) x^2 p_X (x)
|
|
$
|
|
|
|
In fact $E[X^2]$ is so important that we call it the *mean square*.
|
|
|
|
#fact[
|
|
More generally, a real valued function $g(X)$ defined on the range of $X$ is
|
|
itself a random variable (with its own distribution).
|
|
]
|
|
|
|
We can find expected value of $g(X)$ by
|
|
|
|
$
|
|
E[g(x)] = integral_(-infinity)^infinity g(x) f(x) dif x
|
|
$
|
|
|
|
or
|
|
|
|
$
|
|
E[g(x)] = sum_("all" x) g(x) f(x)
|
|
$
|
|
|
|
#example[
|
|
You roll a fair die to determine the winnings (or losses) $W$ of a player as
|
|
follows:
|
|
|
|
$
|
|
W = cases(-1", if the roll is 1, 2, or 3", 1", if the roll is a 4", 3", if the roll is 5 or 6")
|
|
$
|
|
|
|
What is the expected winnings/losses for the player during 1 roll of the die?
|
|
|
|
Let $X$ denote the outcome of the roll of the die. Then we can define our
|
|
random variable as $W = g(X)$ where the function $g$ is defined by $g(1) =
|
|
g(2) = g(3) = -1$ and so on.
|
|
|
|
Note that $P(W = -1) = P(X = 1 union X = 2 union X = 3) = 1/2$. Likewise $P(W=1)
|
|
= P(X=4) = 1/6$, and $P(W=3) = P(X=5 union X=6) = 1/3$.
|
|
|
|
Then
|
|
$
|
|
E[g(X)] = E[W] = (-1) dot P(W=-1) + (1) dot P(W=1) + (3) dot P(W=3) \
|
|
= -1 / 2 + 1 / 6 + 1 = 2 / 3
|
|
$
|
|
]
|
|
|
|
#example[
|
|
A stick of length $l$ is broken at a uniformly chosen random location. What is
|
|
the expected length of the longer piece?
|
|
|
|
Idea: if you break it before the halfway point, then the longer piece has length
|
|
given by $l - x$. If you break it after the halfway point, the longer piece
|
|
has length $x$.
|
|
|
|
Let the interval $[0,l]$ represent the stick and let $X ~ "Unif"[0,l]$ be the
|
|
location where the stick is broken. Then $X$ has density $f(x) = 1/l$ on
|
|
$[0,l]$ and 0 elsewhere.
|
|
|
|
Let $g(x)$ be th length of the longer piece when the stick is broken at $x$,
|
|
|
|
$
|
|
g(x) = cases(1-x", " &0 <= x < l/2, x", " &1/2 <= x <= l)
|
|
$
|
|
|
|
Then
|
|
$
|
|
E[g(X)] = integral_(-infinity)^infinity g(x) f(x) dif x = integral_0^(l / 2) (l-x) / l dif x + integral_(l / 2)^l x / l dif x \
|
|
= 3 / 4 l
|
|
$
|
|
|
|
So we expect the longer piece to be $3/4$ of the total length, which is a bit
|
|
pathological.
|
|
]
|
|
|
|
== Moments of a random variable
|
|
|
|
We continue discussing expectation but we introduce new terminology.
|
|
|
|
#fact[
|
|
The $n^"th"$ moment (or $n^"th"$ raw moment) of a discrete random variable $X$
|
|
with p.m.f. $p_X (x)$ is the expectation
|
|
|
|
$
|
|
E[X^n] = sum_k k^n p_X (k) = mu_n
|
|
$
|
|
|
|
If $X$ is continuous, then we have analogously
|
|
|
|
$
|
|
E[X^n] = integral_(-infinity)^infinity x^n f_X (x) = mu_n
|
|
$
|
|
]
|
|
|
|
The *deviation* is given by $sigma$ and the *variance* is given by $sigma^2$ and
|
|
|
|
$
|
|
sigma^2 = mu_2 - (mu_1)^2
|
|
$
|
|
|
|
$mu_3$ is used to measure "skewness" / asymmetry of a distribution. For
|
|
example, the normal distribution is very symmetric.
|
|
|
|
$mu_4$ is used to measure kurtosis/peakedness of a distribution.
|
|
|
|
== Central moments
|
|
|
|
Previously we discussed "raw moments." Be careful not to confuse them with
|
|
_central moments_.
|
|
|
|
#fact[
|
|
The $n^"th"$ central moment of a discrete random variable $X$ with p.m.f. $p_X
|
|
(x)$ is the expected value of the difference about the mean raised to the
|
|
$n^"th"$ power
|
|
|
|
$
|
|
E[(X-mu)^n] = sum_k (k - mu)^n p_X (k) = mu'_n
|
|
$
|
|
|
|
And of course in the continuous case,
|
|
|
|
$
|
|
E[(X-mu)^n] = integral_(-infinity)^infinity (x - mu)^n f_X (x) = mu'_n
|
|
$
|
|
]
|
|
|
|
In particular,
|
|
|
|
$
|
|
mu'_1 = E[(X-mu)^1] = integral_(-infinity)^infinity (x-mu)^1 f_X (x) dif x \
|
|
= integral_(infinity)^infinity x f_X (x) dif x = integral_(-infinity)^infinity mu f_X (x) dif x = mu - mu dot 1 = 0 \
|
|
mu'_2 = E[(X-mu)^2] = sigma^2_X = "Var"(X)
|
|
$
|
|
|
|
Effectively we're centering our distribution first.
|
|
|
|
#example[
|
|
Let $Y$ be a uniformly chosen integer from ${0,1,2,...,m}$. Find the first and
|
|
second moment of $Y$.
|
|
|
|
The p.m.f. of $Y$ is $p_Y (k) = 1/(m+1)$ for $k in [0,m]$. Thus,
|
|
|
|
$
|
|
E[Y] = sum_(k=0)^m k 1 / (m+1) = 1 / (m+1) sum_(k=0)^m k \
|
|
= m / 2
|
|
$
|
|
|
|
Then,
|
|
|
|
$
|
|
E[Y^2] = sum_(k=0)^m k^2 1 / (m+1) = 1 / (m+1) = (m(2m+1)) / 6
|
|
$
|
|
]
|
|
|
|
#example[
|
|
Let $c > 0$ and let $U$ be a uniform random variable on the interval $[0,c]$.
|
|
Find the $n^"th"$ moment for $U$ for all positive integers $n$.
|
|
|
|
The density function of $U$ is
|
|
|
|
$
|
|
f(x) = cases(1/c", if" &x in [0,c], 0", " &"otherwise")
|
|
$
|
|
|
|
Therefore the $n^"th"$ moment of $U$ is,
|
|
|
|
$
|
|
E[U^n] = integral_(-infinity)^infinity x^n f(x) dif x
|
|
$
|
|
]
|
|
|
|
#example[
|
|
Suppose the random variable $X ~ "Exp"(lambda)$. Find the second moment of $X$.
|
|
|
|
$
|
|
E[X^2] = integral_0^infinity x^2 lambda e^(-lambda x) dif x \
|
|
= 1 / (lambda^2) integral_0^infinity u^2 e^(-u) dif u \
|
|
= 1 / (lambda^2) Gamma(2 + 1) = 2! / lambda^2
|
|
$
|
|
]
|
|
|
|
#fact[
|
|
In general, to find teh $n^"th"$ moment of $X ~ "Exp"(lambda)$,
|
|
$
|
|
E[X^n] = integral^infinity_0 x^n lambda e^(-lambda x) dif x = n! / lambda^n
|
|
$
|
|
]
|
|
|
|
== Median and quartiles
|
|
|
|
When a random variable has rare (abnormal) values, its expectation may be a bad
|
|
indicator of where the center of the distribution lies.
|
|
|
|
#definition[
|
|
The *median* of a random variable $X$ is any real value $m$ that satisfies
|
|
|
|
$
|
|
P(X >= m) >= 1 / 2 "and" P(X <= m) >= 1 / 2
|
|
$
|
|
|
|
With half the probability on both ${X <= m}$ and ${X >= m}$, the median is
|
|
representative of the midpoint of the distribution. We say that the median is
|
|
more _robust_ because it is less affected by outliers. It is not necessarily
|
|
unique.
|
|
]
|
|
|
|
#example[
|
|
Let $X$ be discretely uniformly distributed in the set ${-100, 1, 2, ,3, ..., 9}$ so $X$ has probability mass function
|
|
$
|
|
p_X (-100) = p_X (1) = dots.c = p_X (9)
|
|
$
|
|
|
|
Find the expected value and median of $X$.
|
|
|
|
$
|
|
E[X] = (-100) dot 1 / 10 + (1) dot 1 / 10 + dots.c + (9) dot 1 / 10 = -5.5
|
|
$
|
|
|
|
While the median is any number $m in [4,5]$.
|
|
|
|
The median reflects the fact that 90% of the values and probability is in the
|
|
range $1,2,...,9$ while the mean is heavily influenced by the $-100$ value.
|
|
]
|
|
|
|
= President's Day lecture
|
|
|
|
...
|
|
|
|
= Lecture #datetime(day: 19, month: 2, year: 2025).display()
|
|
|
|
== Moment generating functions
|
|
|
|
Like the CDF, the moment generating function also completely characterizes the
|
|
distribution. That is, if you can find the MGF, it tells you all of the
|
|
information about the distribution. So it is an alternative way to characterize
|
|
a random variable.
|
|
|
|
They are "easy" to use for finding the distributions of:
|
|
|
|
- sums of independent random variables
|
|
- the distribution of the limit of a sequence of random variables
|
|
|
|
#definition[
|
|
Let $X$ be a random variable with all finite moments
|
|
$
|
|
E[X^k] = mu_k, k = 1,2,...
|
|
$
|
|
Then the *moment generating function* of a random variable $X$ is defined by
|
|
$M_x(t) = E[e^(t x)]$, for the real variable $t$.
|
|
]
|
|
|
|
All of the moments must be defined for the MGF to exist. The MGF looks like
|
|
|
|
$
|
|
sum_("all" x) e^(t x) p(x)
|
|
$
|
|
in the discrete case, and
|
|
$
|
|
integral^infinity_(-infinity) e^(t x) f(x) dif x
|
|
$
|
|
in the continuous case.
|
|
|
|
#proposition[
|
|
It holds that the $n^"th"$ derivative of $M$ evaluated at 0 gives the
|
|
$n^"th"$ moment.
|
|
$
|
|
M_x^((n)) (0) = E[X^n]
|
|
$
|
|
]
|
|
|
|
#proof[
|
|
$
|
|
M_X (t) &equiv E[e^(t x)] = E[1 + (t X) + (t X^2) / 2! + dots.c] \
|
|
&= E[1] + E[t X] + E[(t^2 X^2) / 2!] + dots.c \
|
|
&= E[1] + t E[X] + t^2 / 2! E[X^2] + dots.c \
|
|
&= 1 + t / 1! mu_1 + t^2 / 2! mu_2 + dots.c
|
|
$
|
|
|
|
The coefficient of $t^k/k!$ in the Taylor series expansion of $M_X (t)$ is the
|
|
$k^"th"$ moment. So an alternative way to get $mu_k$ is
|
|
|
|
$
|
|
mu_k = lr(((dif^k M(t))/(dif t^k)) |)_(t=0) = "coefficient of" t^k / k!
|
|
$
|
|
]
|
|
|
|
#example[Binomial][
|
|
|
|
Let $X ~ "Bin"(n,p)$. Then the MGF of $X$ is given by
|
|
|
|
$
|
|
sum_(k=0)^n e^(t k) vec(n,k) p^k q^(n-k) = sum_(k=0)^n vec(n,k) underbrace(p (e^t)^k,a) underbrace(q^(n-k), b)
|
|
$
|
|
|
|
Applying the binomial theorem
|
|
|
|
$
|
|
(a + b)^n = sum_(k=0)^n vec(n,k) a^k b^(n-k)
|
|
$
|
|
|
|
So we have
|
|
|
|
$
|
|
(q + p e^t)^n
|
|
$
|
|
|
|
Let's find the first moment
|
|
|
|
$
|
|
mu_1 = lr((dif M(t))/(dif t) |)_(t=0) \
|
|
= n p
|
|
$
|
|
|
|
The second moment:
|
|
|
|
$
|
|
mu_2 = lr((dif^2 M(t))/(dif t^2) |)_(t=0) \
|
|
= n(n-1) p^2 + n p
|
|
$
|
|
|
|
For example, if $X$ has MGF $(1/3 + 2/3 e^t)^10$, then $X ~ "Bin"(10,2/3)$.
|
|
]
|
|
|
|
#example[Poisson][
|
|
Let $X ~ "Pois"(lambda)$. Then the MGF of $X$ is given
|
|
$
|
|
M_X (t) = E[e^(t X)] \
|
|
sum^infinity_(x=0) e^(t x) e^(-lambda) lambda^x / x! \
|
|
e^(-lambda) sum^infinity_(x=0) e^(t x)lambda^x / x!
|
|
$
|
|
Note: $e^a = sum_(x=0) ^infinity a^x / x!$
|
|
$
|
|
= e^(-lambda) e^(lambda e^t) \
|
|
= e^(-lambda (1 - e^t))
|
|
$
|
|
Then, the first moment can be found by,
|
|
$
|
|
mu_1 = lr(e^(-lambda (1 - e^t)) (-lambda) (-e^t) |)_(t=0) = lambda
|
|
$
|
|
]
|
|
|
|
#example[Exponential][
|
|
Let $X ~"Exp"(lambda)$ with PDF
|
|
$
|
|
f(x) = cases(lambda e^(-lambda x) &"for" x > 0, 0 &"otherwise")
|
|
$
|
|
Find the MGF of $X$
|
|
|
|
$
|
|
M_X (t) &= integral^infinity_(-infinity) e^(t x) dot lambda e^(-lambda x) dif x \
|
|
&= lambda integral_0^infinity e^((t-lambda) x) dif x \
|
|
&= lambda lim_(b->infinity) integral_0^b e^((t - lambda) x) dif x \
|
|
$
|
|
This integral depends on $t$, so we should consider three cases. If $t =
|
|
lambda$, then the integral diverges.
|
|
|
|
If $t != lambda$,
|
|
$
|
|
E[e^(t X)] = lambda lim_(b->infinity) integral_0^b e^((t - lambda) x) dif x = lambda lim_(b -> infinity) [(e^((t - lambda) x) - 1) / (t - lambda)]^(x=b)_(x=0) \
|
|
lambda lim_(b -> infinity) (e^((t - lambda) b) - 1) / (t - lambda) = cases(infinity &"if" t > lambda, lambda/(lambda - t) &"if" t < lambda)
|
|
$
|
|
Combining with the $lambda = t$ case,
|
|
|
|
$
|
|
lambda lim_(b -> infinity) (e^((t - lambda) b) - 1) / (t - lambda) = cases(infinity &"if" t >= lambda, lambda/(lambda - t) &"if" t < lambda)
|
|
$
|
|
]
|
|
|
|
#example[Alternative parameterization of the exponential][
|
|
Consider $X ~ "Exp"(beta)$ with PDF
|
|
$
|
|
f(x) = cases(1/beta e^(-x/beta) &"for" x > 0, 0 &"otherwise")
|
|
$
|
|
and proceed as usual
|
|
$
|
|
M_X (t) = integral_(-infinity)^infinity e^(t x) dot 1 / beta e^(-x / beta) dif x = 1 / beta lim_(b-infinity) [e^((t - 1 / beta) x) / (t - 1 / beta)]_(x=0)^(x=b) = 1 / (1 - beta t)
|
|
$
|
|
So it's a geometric series
|
|
$
|
|
1 + beta t + (beta t)^2 + dots.c \
|
|
$
|
|
Multiply each $n^"th"$ term by $n/n!$
|
|
$
|
|
= 1 + beta t + 2 beta^2 (t^2 / 2!) + 6 beta^3 (t^3 / 3!) + dots.c
|
|
$
|
|
Recall that the coefficient of each $r^k/k! = mu)k$. So
|
|
- $E[x] = beta$
|
|
- $E[X^2] = 2 beta^2$
|
|
- $E[X^3] = 6 beta^3$
|
|
$
|
|
"Var"(X) = E[X^2] - (E[X])^2 = beta^2
|
|
$
|
|
]
|
|
|
|
#example[Uniform on $[0,1]$][
|
|
Let $X ~ U(0,1)$, then
|
|
$
|
|
M_X (t) &= integral_0^1 e^(t x) dot 1 dif x \
|
|
&= lr(e^(t x)/t |)_(x=0)^(x=1) = (e^t - 1) / t \
|
|
&= (cancel(1) + t^2 / 2! + dots.c - cancel(1)) / t \
|
|
&= 1 + t^2 / 2! + t^2 / 3! + t^3 / 4! + dots.c \
|
|
&= 1 + 1 / 2 t + 1 / 3 (t^2 / 2!) + 1 / 4(t^3 / 3!) + dots.c
|
|
$
|
|
So
|
|
- $E[X] = 1 / 2$
|
|
- $E[X^2] = 1 / 3$
|
|
- $E[X^n] = 1 / (n + 1)$
|
|
]
|
|
|
|
== Properties of the MGF
|
|
|
|
#definition[
|
|
Random variables $X$ and $Y$ are equal in distribution if $P(X in B) = P(Y in
|
|
B)$ for all all subsets $B$ of $RR$.
|
|
]
|
|
|
|
#abuse[
|
|
Abbreviate this by $X eq.delta Y$
|
|
]
|
|
|
|
#example[Normal distribution][
|
|
Let $Z ~ N(0,1)$. Then
|
|
$
|
|
E[e^(t Z)] = 1 / sqrt(2 pi) integral^infinity_(-infinity) e^(-1 / 2 z^2 + t z -1 / 2 t^2 + 1 / 2 t^2) dif z \
|
|
= e^(t^2 / 2) 1 / sqrt(2 pi) = integral_(-infinity)^infinity e^(-1 / 2 (z-t)^2) dif z = e^(t^2 / 2)
|
|
$
|
|
|
|
To get the MGF for a general normal RV, $X ~ N(mu, sigma^2)$, then
|
|
$
|
|
X = sigma Z + mu
|
|
$
|
|
we get
|
|
$
|
|
E[e^(t (sigma Z + mu))] = e^(t mu) E[e^(t sigma Z)] = e^(t mu) dot e^((t^2 sigma^2) / 2) = exp(mu t + (sigma^2 t^2) / 2)
|
|
$
|
|
]
|
|
|
|
== Joint distributions of RV
|
|
|
|
Looking at multiple random variables jointly. If $X$ and $Y$ are both random
|
|
variables defined on $Omega$< treat them as coordinates of a 2 dimensional
|
|
random vector. It's a vector valued function on $Omega$,
|
|
|
|
$
|
|
Omega -> RR^2
|
|
$
|
|
|
|
Valid both discretely and continuously
|
|
|
|
#example[
|
|
$
|
|
(X,Y)
|
|
$
|
|
1. Poker hand: $X$ is num of face cards, $Y$ is num of red cards.
|
|
2. Demographic info: $X$ = height, $Y$ = weight
|
|
]
|
|
|
|
In general, with $n$ random variables jointly where
|
|
$
|
|
X_1, X_2, ..., X_n
|
|
$
|
|
defined on $Omega$ are coordinates of an $n$-dimensional random vector that
|
|
maps the results to $RR^n$.
|
|
|
|
The probability distribution of $(X_1, dots.c, X_n)$ is now $P((X_1, dots.c,
|
|
X_n) in B)$ where $B$ are subsets of $RR^n$ (power set of $RR^n)$.
|
|
|
|
The probability distribution of the random vector is called the _joint
|
|
distribution_.
|
|
|
|
#fact[
|
|
Let $X$ and $Y$ both be discrete random variables defined on the same $Omega$
|
|
Then, the joint PMF is
|
|
$
|
|
P(X = x, Y = y) = P_(X,Y) (x,y)
|
|
$
|
|
where $p_(X,Y) (x,y) >= 0$ for all possible values $x,y$ of $X$ and $Y$
|
|
respectively.
|
|
] And,
|
|
$
|
|
sum_(x in X) sum_(y in Y) p_(X,Y) (x,y) = 1
|
|
$
|
|
|
|
#definition[
|
|
Let $X_1, X_2, ..., X_n$ are discrete random variables defined on $Omega$,
|
|
their joint PMF is given by
|
|
$
|
|
p(k_1, k_2, ..., k_n) = P(X_1 = k_1, X_1 = k_2, ..., X_n = k_n)
|
|
$
|
|
for all possible $k_1, ..., k_n$ of $X_1, ..., X_n$.
|
|
]
|
|
|
|
#fact[
|
|
The joint probability in set notation:
|
|
$
|
|
P(X_1 = k_1, X_1 = k_2, ..., X_n = k_n) = P({X_1 = k_1}sect{X_n = k_n})
|
|
$
|
|
The joint PDF has the same properties as single variable PDF
|
|
$
|
|
p_(X_1,X_2,X_n) (k_1,k_2,...,k_n) >= 0
|
|
$
|
|
]
|