2025-01-19 02:07:58 -08:00
#import "@youwen/zen:0.1.0": *
2025-01-06 17:45:05 -08:00
#import "@preview/ctheorems:1.1.3": *
2025-01-06 15:32:17 -08:00
2025-01-19 02:07:58 -08:00
#show: zen.with(
2025-01-06 17:45:05 -08:00
title: "PSTAT120A Course Notes",
2025-01-06 15:32:17 -08:00
author: "Youwen Wu",
2025-01-07 01:00:54 -08:00
date: "Winter 2025",
2025-01-06 17:45:05 -08:00
subtitle: "Taught by Brian Wainwright",
2025-01-06 15:32:17 -08:00
)
#outline()
2025-01-19 02:07:58 -08:00
= Introduction
PSTAT 120A is an introductory course on probability and statistics. However, it
is a theoretical course rather an applied statistics course. You will not learn
how to read or conduct real-world statistical studies. Leave your $p$-values at
home, this ain't your momma's AP Stats.
2025-01-09 00:49:33 -08:00
= Lecture #datetime(day: 6, month: 1, year: 2025).display()
2025-01-06 15:32:17 -08:00
== Preliminaries
2025-01-06 17:45:05 -08:00
#definition[
Statistics is the science dealing with the collection, summarization,
analysis, and interpretation of data.
2025-01-06 15:32:17 -08:00
]
== Set theory for dummies
2025-01-06 18:28:44 -08:00
A terse introduction to elementary naive set theory and the basic operations
upon them.
#remark[
Keep in mind that without $cal(Z F C)$ or another model of set theory that
resolves fundamental issues, our set theory is subject to paradoxes like
2025-01-07 01:00:54 -08:00
Russell's. Whoops, the universe doesn't exist.
2025-01-06 18:28:44 -08:00
]
2025-01-06 15:32:17 -08:00
2025-01-06 17:45:05 -08:00
#definition[
2025-01-08 23:33:52 -08:00
A *set* is a collection of elements.
2025-01-06 15:32:17 -08:00
]
#example[Examples of sets][
+ Trivial set: ${1}$
+ Empty set: $emptyset$
+ $A = {a,b,c}$
]
2025-01-06 18:32:33 -08:00
We can construct sets using set-builder notation (also sometimes called set
comprehension).
2025-01-06 15:32:17 -08:00
$ {"expression with" x | "conditions on" x} $
#example("Set builder notation")[
+ The set of all even integers: ${2n | n in ZZ}$
+ The set of all perfect squares in $RR$: ${x^2 | x in NN}$
]
We also have notation for working with sets:
2025-01-06 17:45:05 -08:00
With arbitrary sets $A$, $B$:
2025-01-06 15:32:17 -08:00
+ $a in A$ ($a$ is a member of the set $A$)
+ $a in.not A$ ($a$ is not a member of the set $A$)
2025-01-06 17:45:05 -08:00
+ $A subset.eq B$ (Set theory: $A$ is a subset of $B$) (Stats: $A$ is a sample space in $B$)
+ $A subset B$ (Proper subset: $A != B$)
2025-01-08 20:40:52 -08:00
+ $A^c$ or $A'$ (read "complement of $A$," and introduced later)
2025-01-06 15:32:17 -08:00
+ $A union B$ (Union of $A$ and $B$. Gives a set with both the elements of $A$ and $B$)
+ $A sect B$ (Intersection of $A$ and $B$. Gives a set consisting of the elements in *both* $A$ and $B$)
+ $A \\ B$ (Set difference. The set of all elements of $A$ that are not also in $B$)
+ $A times B$ (Cartesian product. Ordered pairs of $(a,b)$ $forall a in A$, $forall b in B$)
We can also write a few of these operations precisely as set comprehensions.
2025-01-06 17:45:05 -08:00
+ $A subset B => A = {a | a in B, forall a in A}$
2025-01-06 15:32:17 -08:00
+ $A union B = {x | x in A or x in B}$ (here $or$ is the logical OR)
+ $A sect B = {x | x in A and x in B}$ (here $and$ is the logical AND)
+ $A \\ B = {a | a in A and a in.not B}$
+ $A times B = {(a,b) | forall a in A, forall b in B}$
2025-01-08 20:40:52 -08:00
Take a moment and convince yourself that these definitions are equivalent to
the previous ones.
2025-01-06 18:28:44 -08:00
#definition[
The universal set $Omega$ is the set of all objects in a given set
theoretical universe.
]
2025-01-06 15:32:17 -08:00
2025-01-08 20:40:52 -08:00
With the above definition, we can now introduce the set complement.
#definition[
The set complement $A'$ is given by
$
A' = Omega \\ A
$
where $Omega$ is the _universal set_.
]
2025-01-06 15:32:17 -08:00
#example[The real plane][
2025-01-06 18:28:44 -08:00
The real plane $RR^2$ can be defined as a Cartesian product of $RR$ with
itself.
2025-01-06 15:32:17 -08:00
$ RR^2 = RR times RR $
]
2025-01-06 17:45:05 -08:00
Check your intuition that this makes sense. Why do you think $RR^n$ was chosen
as the notation for $n$ dimensional spaces in $RR$?
2025-01-06 15:32:17 -08:00
2025-01-07 17:58:53 -08:00
#definition[Disjoint sets][
2025-01-06 15:32:17 -08:00
If $A sect B$ = $emptyset$, then we say that $A$ and $B$ are *disjoint*.
]
2025-01-07 17:58:53 -08:00
#fact[
2025-01-06 17:45:05 -08:00
For any sets $A$ and $B$, we have DeMorgan's Laws:
+ $(A union B)' = A' sect B'$
+ $(A sect B)' = A' union B'$
2025-01-06 15:32:17 -08:00
]
2025-01-07 01:00:54 -08:00
#fact[Generalized DeMorgan's][
2025-01-08 15:43:40 -08:00
+ $(union.big_i A_i)' = sect.big_i A_i '$
+ $(sect.big_i A_i)' = union.big_i A_i '$
2025-01-06 15:32:17 -08:00
]
2025-01-06 17:45:05 -08:00
== Sizes of infinity
2025-01-06 15:32:17 -08:00
2025-01-06 17:45:05 -08:00
#definition[
2025-01-06 15:32:17 -08:00
Let $N(A)$ be the number of elements in $A$. $N(A)$ is called the _cardinality_ of $A$.
]
2025-01-08 20:40:52 -08:00
We say a set is finite if it has finite cardinality, or infinite if it has an
infinite cardinality.
2025-01-06 15:32:17 -08:00
Infinite sets can be either _countably infinite_ or _uncountably infinite_.
When a set is countably infinite, its cardinality is $aleph_0$ (here $aleph$ is
the Hebrew letter aleph and read "aleph null").
When a set is uncountably infinite, its cardinality is greater than $aleph_0$.
#example("Countable sets")[
+ The natural numbers $NN$.
+ The rationals $QQ$.
+ The natural numbers $ZZ$.
2025-01-07 01:00:54 -08:00
+ The set of all logical tautologies.
2025-01-06 15:32:17 -08:00
]
#example("Uncountable sets")[
+ The real numbers $RR$.
+ The real numbers in the interval $[0,1]$.
2025-01-07 01:00:54 -08:00
+ The _power set_ of $ZZ$, which is the set of all subsets of $ZZ$.
]
#remark[
All the uncountable sets above have cardinality $2^(aleph_0)$ or $aleph_1$ or
$frak(c)$ or $beth_1$. This is the _cardinality of the continuum_, also
called "aleph 1" or "beth 1".
However, in general uncountably infinite sets do not have the same
cardinality.
2025-01-06 15:32:17 -08:00
]
2025-01-06 18:28:44 -08:00
#fact[
2025-01-06 15:32:17 -08:00
If a set is countably infinite, then it has a bijection with $ZZ$. This means
2025-01-07 01:00:54 -08:00
every set with cardinality $aleph_0$ has a bijection to $ZZ$. More generally,
any sets with the same cardinality have a bijection between them.
2025-01-06 15:32:17 -08:00
]
2025-01-07 17:58:53 -08:00
This gives us the following equivalent statement:
#fact[
Two sets have the same cardinality if and only if there exists a bijective
function between them. In symbols,
$ N(A) = N(B) <==> exists F : A <-> B $
]
2025-01-08 15:43:40 -08:00
= Lecture #datetime(day: 8, month: 1, year: 2025).display()
== Probability
#definition[
A *random experiment* is one in which the set of all possible outcomes is known in advance, but one can't predict which outcome will occur on a given trial of the experiment.
]
#example("Finite sample spaces")[
Toss a coin:
$ Omega = {H,T} $
Roll a pair of dice:
$ Omega = {1,2,3,4,5,6} times {1,2,3,4,5,6} $
]
#example("Countably infinite sample spaces")[
Shoot a basket until you make one:
$ Omega = {M, F M, F F M, F F F M, dots} $
]
#example("Uncountably infinite sample space")[
Waiting time for a bus:
$ Omega = {T : t >= 0} $
]
#fact[
Elements of $Omega$ are called sample points.
]
#definition[
Any properly defined subset of $Omega$ is called an *event*.
]
#example[Dice][
Rolling a fair die twice, let $A$ be the event that the combined score of both dice is 10.
$ A = {(4,6,), (5,5),(6,4)} $
]
2025-01-08 20:40:52 -08:00
Probabilistic concepts in the parlance of set theory:
2025-01-08 15:43:40 -08:00
- Superset ($Omega$) $<->$ sample space
- Element $<->$ outcome / sample point ($omega$)
- Disjoint sets $<->$ mutually exclusive events
== Classical approach
Classical approach:
$ P(a) = (hash A) / (hash Omega) $
Requires equally likely outcomes and finite sample spaces.
#remark[
With an infinite sample space, the probability becomes 0, which is often wrong.
]
#example("Dice again")[
Rolling a fair die twice, let $A$ be the event that the combined score of both dice is 10.
$
A &= {(4,6,), (5,5),(6,4)} \
P(A) &= 3 / 36 = 1 / 12
$
]
== Relative frequency approach
2025-01-19 02:07:58 -08:00
An approach done commonly by applied statisticians who work in the disgusting
real world. This is where we are generally concerned with irrelevant concerns
like accurate sampling and $p$-values and such. I am told this is covered in
PSTAT 120B, so hopefully I can avoid ever taking that class (as a pure math
major).
2025-01-08 15:43:40 -08:00
$
P(A) = (hash "of times" A "occurs in large number of trials") / (hash "of trials")
$
#example[
Flipping a coin to determine the probability of it landing heads.
]
== Subjective approach
2025-01-08 20:40:52 -08:00
Personal definition of probability. Not "real" probability, merely co-opting
its parlance to lend credibility to subjective judgements of confidence.
2025-01-08 15:43:40 -08:00
== Axiomatic approach
2025-01-19 02:07:58 -08:00
Consider a random experiment. Then:
#definition[
The *sample space* $Omega$ is the set of all possible outcomes of the
experiment.
]
#definition[
Elements of $Omega$ are called *sample points*.
]
#definition[
Subsets of $Omega$ are called *events*. The collection of events (in other
terms, the power set of $Omega$) in $Omega$ is denoted by $cal(F)$.
]
2025-01-08 15:43:40 -08:00
#definition[
2025-01-19 02:07:58 -08:00
The *probability measure*, or probability distribution, or simply probability s a function $P$.
Let $P : cal(F) -> RR$ be a function satisfying the following axioms (properties).
2025-01-08 15:43:40 -08:00
+ $P(A) >= 0, forall A$
+ $P(Omega) = 1$
+ If $A_i sect A_j = emptyset, forall i != j$, then
$ P(union.big_(i=1)^infinity A_i) = sum_(i=1)^infinity P(A_i) $
]
2025-01-19 02:07:58 -08:00
The 3-tuple $(Omega, cal(F), P)$ is called a *probability space*.
#remark[
In more advanced texts you will see $Omega$ introduced as a so-called
$sigma$-algebra. A $sigma$-algebra on a set $Omega$ is a nonempty collection
$Sigma$ of subsets of $Omega$ that is closed under set complement, countable
unions, and as a corollary, countable intersections.
]
2025-01-08 20:40:52 -08:00
Now let us show various results with $P$.
2025-01-08 15:43:40 -08:00
#proposition[
$ P(emptyset) = 0 $
]
#proof[
By axiom 3,
$
A_1 = emptyset, A_2 = emptyset, A_3 = emptyset \
P(emptyset) = sum^infinity_(i=1) P(A_i) = sum^infinity_(i=1) P(emptyset)
$
Suppose $P(emptyset) != 0$. Then $P >= 0$ by axiom 1 but then $P -> infinity$ in the sum, which implies $Omega > 1$, which is disallowed by axiom 2. So $P(emptyset) = 0$.
]
#proposition[
If $A_1, A_2, ..., A_n$ are disjoint, then
$ P(union.big^n_(i=1) A_i) = sum^n_(i= 1) P(A_i) $
]
2025-01-08 20:40:52 -08:00
This is mostly a formal manipulation to derive the obviously true proposition from our axioms.
2025-01-08 15:43:40 -08:00
#proof[
2025-01-08 20:40:52 -08:00
Write any finite set $(A_1, A_2, ..., A_n)$ as an infinite set $(A_1, A_2, ..., A_n, emptyset, emptyset, ...)$. Then
$
P(union.big_(i=1)^infinity A_i) = sum^n_(i=1) P(A_i) + sum^infinity_(i=n+1) P(emptyset) = sum^n_(i=1) P(A_i)
$
And because all of the elements after $A_n$ are $emptyset$, their union adds no additional elements to the resultant union set of all $A_i$, so
$
P(union.big_(i=1)^infinity A_i) = P(union.big_(i=1)^n A_i) = sum_(i=1)^n P(A_i)
$
2025-01-08 15:43:40 -08:00
]
#proposition[Complement][
$ P(A') = 1 - P(A) $
]
#proof[
$
A' union A &= Omega \
A' sect A &= emptyset \
P(A' union A) &= P(A') + P(A) &"(by axiom 3)"\
2025-01-08 20:40:52 -08:00
= P(Omega) &= 1 &"(by axiom 2)" \
therefore P(A') &= 1 - P(A)
2025-01-08 15:43:40 -08:00
$
]
#proposition[
$ A subset.eq B => P(A) <= P(B) $
]
#proof[
$ B = A union (A' sect B) $
but $A$ and ($A' sect B$) are disjoint, so
$
P(B) &= P(A union (A' sect B)) \
&= P(A) + P(A' sect B) \
&therefore P(B) >= P(A)
$
]
#proposition[
$ P(A union B) = P(A) + P(B) - P(A sect B) $
]
#proof[
$
A = (A sect B) union (A sect B') \
=> P(A) = P(A sect B) + P(A sect B') \
=> P(B) = P(B sect A) + P(B sect A') \
P(A) + P(B) = P(A sect B) + P(A sect B) + P(A sect B') + P(A' sect B) \
=> P(A) + P(B) - P(A sect B) = P(A sect B) + P(A sect B') + P(A' sect B) \
$
]
2025-01-08 20:40:52 -08:00
#remark[
This is a stronger result of axiom 3, which generalizes for all sets $A$ and $B$ regardless of whether they're disjoint.
]
2025-01-09 00:49:33 -08:00
#remark[
These are mostly intuitively true statements (think about the probabilistic
concepts represented by the sets) in classical probability that we derive
rigorously from our axiomatic probability function $P$.
]
2025-01-08 15:43:40 -08:00
#example[
2025-01-09 00:49:33 -08:00
Now let us consider some trivial concepts in classical probability written in
the parlance of combinatorial probability.
2025-01-08 15:43:40 -08:00
Select one card from a deck of 52 cards.
2025-01-09 00:49:33 -08:00
Then the following is true:
2025-01-08 15:43:40 -08:00
$
Omega = {1,2,...,52} \
A = "card is a heart" = {H 2, H 3, H 4, ..., H"Ace"} \
B = "card is an Ace" = {H"Ace", C"Ace", D"Ace", S"Ace"} \
C = "card is black" = {C 2, C 3, ..., C"Ace", S 2, S 3, ..., S"Ace"} \
P(A) = 13 / 52,
P(B) = 4 / 52,
P(C) = 26 / 52 \
P(A sect B) = 1 / 52 \
P(A sect C) = 0 \
P(B sect C) = 2 / 52 \
P(A union B) = P(A) + P(B) - P(A sect B) = 16 / 52 \
P(B') = 1 - P(B) = 48 / 52 \
P(A sect B') = P(A) - P(A sect B) = 13 / 52 - 1 / 52 = 12 / 52 \
P((A sect B') union (A' sect B)) = P(A sect B') + P(A' sect B) = 15 / 52 \
P(A' sect B') = P(A union B)' = 1 - P(A union B) = 36 / 52
$
]
== Countable sample spaces
#definition[
2025-01-09 00:49:33 -08:00
A sample space $Omega$ is said to be *countable* if it's finite or countably infinite.
2025-01-08 15:43:40 -08:00
]
In such a case, one can list the elements of $Omega$.
$ Omega = {omega_1, omega_2, omega_3, ...} $
with associated probabilities, $p_1, p_2, p_3,...$, where
$
p_i = P(omega_i) >= 0 \
1 = P(Omega) = sum P(omega_i)
$
#example[Fair die, again][
All outcomes are equally likely,
$ p_1 = p_2 = ... = p_6 = 1 / 6 $
Let $A$ be the event that the score is odd = ${1,3,5}$
$ P(A) = 3 / 6 $
]
#example[Loaded die][
Consider a die where the probabilities of rolling odd sides is double the probability of rolling an even side.
$
p_2 = p_4 = p_6, p_1 = p_3 = p_5 = 2p_2 \
6p_2 + 3p_2 = 9p_2 = 1 \
p_2 = 1 / 9, p_1 = 2 / 9
$
]
#example[Coins][
Toss a fair coin until you get the first head.
$
Omega = {H, T H, T T H, ...} "(countably infinite)" \
P(H) = 1 / 2 \
P(T T H) = (1 / 2)^3 \
P(Omega) = sum_(n=1)^infinity (1 / 2)^n = 1 / (1 - 1 / 2) - 1 = 1
$
]
2025-01-08 20:40:52 -08:00
#example[Birthdays][
2025-01-08 15:43:40 -08:00
What is the probability two people share the same birthday?
$
Omega = [1,365] times [1,365] \
P(A) = 365 / 365^2 = 1 / 365
$
]
== Continuous sample spaces
#definition[
A *continuous sample space* contains an interval in $RR$ and is uncountably infinite.
]
#definition[
A probability density function (#smallcaps[pdf]) gives the probability at the point
$s$.
]
Properties of the #smallcaps[pdf]:
- $f(s) >= 0, forall p_i >= 0$
- $integral_S f(s) dif s = 1, forall p_i >= 0$
#example[
Waiting time for bus: $Omega = {s : s >= 0}$.
]
2025-01-19 02:07:58 -08:00
= Notes on counting
The cardinality of $A$ is given by $hash A$. Let us develop methods for finding
$hash A$ from a description of the set $A$ (in other words, methods for
counting).
== General multiplication principle
#fact[
Let $A$ and $B$ be finite sets, $k in ZZ^+$. Then let $f : A -> B$ be a
function such that each element in $B$ is the image of exactly $k$ elements
in $A$ (such a function is called _$k$-to-one_). Then $hash A = k dot hash
B$.
]<ktoone>
#example[
Four fully loaded 10-seater vans transported people to the picnic. How many
people were transported?
By @ktoone, we have $A$ is the set of people, $B$ is the set of vans, $f : A -> B$ maps a person to the van they ride in. So $f$ is a 10-to-one function, $hash A = 40$, $hash B = 4$, and clearly the answer is $10 dot 4 = 40$.
]
#definition[
An $n$-tuple is an ordered sequence of $n$ elements.
]
Many of our methods in probability rely on multiplying together multiple
outcomes to obtain their combined amount of outcomes. We make this explicit below in @tuplemultiplication.
#fact[
Suppose a set of $n$-tuples $(a_1, ..., a_n)$ obeys these rules:
+ There are $r_1$ choices for the first entry $a_1$.
+ Once the first $k$ entries $a_1, ..., a_k$ have been chosen, the number of alternatives for the next entry $a_(k+1)$ is $r_(k+1)$, regardless of the previous choices.
Then the total number of $n$-tuples is the product $r_1 dot r_2 dot r_2 dot dots dot r_n$.
]<tuplemultiplication>
#proof[
It is trivially true for $n = 1$ since you have $r_1$ choices of $a_1$ for a
1-tuple $(a_1)$.
Let $A$ be the set of all possible $n$-tuples and $B$ be the set of all
possible $(n+1)$-tuples. Now let us assume the statement is true for $A$.
Proceed by induction on $B$, noting that for each $n$-tuple in $A$, $(a_1,
..., a_n)$, we have $r_(n+1)$ tuples in $A$.
Let $f : B -> A$ be a function which takes each $(n+1)$-tuple and truncates the $a_(n+1)$ term, leaving us with just an $n$-tuple of the form $(a_1, a_2, ..., a_n)$.
$ f((a_1, ..., a_n, a_(n + 1))) = (a_1, ..., a_n) $
Now notice that $f$ is precisely a $r_(n+1)$-to-one function! Recall by
our assumption that @tuplemultiplication is true for $n$-tuples, so $A$ has $r_1 dot
r_2 dot ... dot r_n$ elements, or $hash A = r_1 dot ... dot r_n$. Then by
@ktoone, we have $hash B = hash A dot r_(n+1) = r_1 dot r_2 dot
... dot r_(n+1)$. Our induction is complete and we have proved @tuplemultiplication.
]
@tuplemultiplication is sometimes called the _general multiplication principle_.
We can use @tuplemultiplication to derive counting formulas for various
situations. Let $A_1, A_2, A_n$ be finite sets. Then as a corollary of
@tuplemultiplication, we can count the number of $n$-tuples in a finite
Cartesian product of $A_1, A_2, A_n$.
#fact[
Let $A_1, A_2, A_n$ be finite sets. Then
$
hash (A_1 times A_2 times ... times, A_n) = (hash A_1) dot (hash A_2) dot ... dot (hash A_n) = Pi^n_(i=1) (hash A_i)
$
]
#example[
How many distinct subsets does a set of size $n$ have?
The answer is $2^n$. Each subset can be encoded as an $n$-tuple with entries 0
or 1, where the $i$th entry is 1 if the $i$th element of the set is in the
subset and 0 if it is not.
Thus the number of subsets is the same as the cardinality of
$ {0,1} times ... times {0,1} = {0,1}^n $
which is $2^n$.
This is why given a set $X$ with cardinality $aleph$, we write the
cardinality of the power set of $X$ as $2^aleph$.
]
== Permutations
Now we can use the multiplication principle to count permutations.
#fact[
Consider all $k$-tuples $(a_1, ..., a_k)$ that can be constructed from a set $A$ of size $n, n>= k$ without repetition. The total number of these $k$-tuples is
$ (n)_k = n dot (n - 1) ... (n - k + 1) = n! / (n-k)! $
In particular, with $k=n$, each $n$-tuple is an ordering or _permutation_ of $A$. So the total number of permutations of a set of $n$ elements is $n!$.
]
#proof[
We construct the $k$-tuples sequentially. For the first element, we choose
one element from $A$ with $n$ alternatives. The next element has $n - 1$
alternatives. In general, after $j$ elements are chosen, there are $n - j +
1$ alternatives.
Then clearly after choosing $k$ elements for our $k$-tuple we have by
@tuplemultiplication the number of $k$-tuples being $n dot (n - 1) dot ...
dot (n - k + 1) = (n)_k$.
]