#import "./dvd.typ": *
#import "@preview/ctheorems:1.1.3": *

#show: dvdtyp.with(
  title: "PSTAT120A Course Notes",
  author: "Youwen Wu",
  date: "Winter 2025",
  subtitle: "Taught by Brian Wainwright",
)

#outline()

= Lecture #datetime(day: 6, month: 1, year: 2025).display()

== Preliminaries

#definition[
  Statistics is the science dealing with the collection, summarization,
  analysis, and interpretation of data.
]

== Set theory for dummies

A terse introduction to elementary naive set theory and the basic operations
upon them.

#remark[
  Keep in mind that without $cal(Z F C)$ or another model of set theory that
  resolves fundamental issues, our set theory is subject to paradoxes like
  Russell's. Whoops, the universe doesn't exist.
]

#definition[
  A *set* is a collection of elements.
]

#example[Examples of sets][
  + Trivial set: ${1}$
  + Empty set: $emptyset$
  + $A = {a,b,c}$
]

We can construct sets using set-builder notation (also sometimes called set
comprehension).

$ {"expression with" x | "conditions on" x} $

#example("Set builder notation")[
  + The set of all even integers: ${2n | n in ZZ}$
  + The set of all perfect squares in $RR$: ${x^2 | x in NN}$
]

We also have notation for working with sets:

With arbitrary sets $A$, $B$:

+ $a in A$ ($a$ is a member of the set $A$)
+ $a in.not A$ ($a$ is not a member of the set $A$)
+ $A subset.eq B$ (Set theory: $A$ is a subset of $B$) (Stats: $A$ is a sample space in $B$)
+ $A subset B$ (Proper subset: $A != B$)
+ $A^c$ or $A'$ (read "complement of $A$," and introduced later)
+ $A union B$ (Union of $A$ and $B$. Gives a set with both the elements of $A$ and $B$)
+ $A sect B$ (Intersection of $A$ and $B$. Gives a set consisting of the elements in *both* $A$ and $B$)
+ $A \\ B$ (Set difference. The set of all elements of $A$ that are not also in $B$)
+ $A times B$ (Cartesian product. Ordered pairs of $(a,b)$ $forall a in A$, $forall b in B$)

We can also write a few of these operations precisely as set comprehensions.

+ $A subset B => A = {a | a in B, forall a in A}$
+ $A union B = {x | x in A or x in B}$ (here $or$ is the logical OR)
+ $A sect B = {x | x in A and x in B}$ (here $and$ is the logical AND)
+ $A \\ B = {a | a in A and a in.not B}$
+ $A times B = {(a,b) | forall a in A, forall b in B}$

Take a moment and convince yourself that these definitions are equivalent to
the previous ones.

#definition[
  The universal set $Omega$ is the set of all objects in a given set
  theoretical universe.
]

With the above definition, we can now introduce the set complement.

#definition[
  The set complement $A'$ is given by
  $
    A' = Omega \\ A
  $
  where $Omega$ is the _universal set_.
]

#example[The real plane][
  The real plane $RR^2$ can be defined as a Cartesian product of $RR$ with
  itself.

  $ RR^2 = RR times RR $
]

Check your intuition that this makes sense. Why do you think $RR^n$ was chosen
as the notation for $n$ dimensional spaces in $RR$?

#definition[Disjoint sets][
  If $A sect B$ = $emptyset$, then we say that $A$ and $B$ are *disjoint*.
]

#fact[
  For any sets $A$ and $B$, we have DeMorgan's Laws:
  + $(A union B)' = A' sect B'$
  + $(A sect B)' = A' union B'$
]

#fact[Generalized DeMorgan's][
  + $(union.big_i A_i)' = sect.big_i A_i '$
  + $(sect.big_i A_i)' = union.big_i A_i '$
]

== Sizes of infinity

#definition[
  Let $N(A)$ be the number of elements in $A$. $N(A)$ is called the _cardinality_ of $A$.
]

We say a set is finite if it has finite cardinality, or infinite if it has an
infinite cardinality.

Infinite sets can be either _countably infinite_ or _uncountably infinite_.

When a set is countably infinite, its cardinality is $aleph_0$ (here $aleph$ is
the Hebrew letter aleph and read "aleph null").

When a set is uncountably infinite, its cardinality is greater than $aleph_0$.

#example("Countable sets")[
  + The natural numbers $NN$.
  + The rationals $QQ$.
  + The natural numbers $ZZ$.
  + The set of all logical tautologies.
]

#example("Uncountable sets")[
  + The real numbers $RR$.
  + The real numbers in the interval $[0,1]$.
  + The _power set_ of $ZZ$, which is the set of all subsets of $ZZ$.
]

#remark[
  All the uncountable sets above have cardinality $2^(aleph_0)$ or $aleph_1$ or
  $frak(c)$ or $beth_1$. This is the _cardinality of the continuum_, also
  called "aleph 1" or "beth 1".

  However, in general uncountably infinite sets do not have the same
  cardinality.
]

#fact[
  If a set is countably infinite, then it has a bijection with $ZZ$. This means
  every set with cardinality $aleph_0$ has a bijection to $ZZ$. More generally,
  any sets with the same cardinality have a bijection between them.
]

This gives us the following equivalent statement:

#fact[
  Two sets have the same cardinality if and only if there exists a bijective
  function between them. In symbols,

  $ N(A) = N(B) <==> exists F : A <-> B $
]

= Lecture #datetime(day: 8, month: 1, year: 2025).display()

== Probability

#definition[
  A *random experiment* is one in which the set of all possible outcomes is known in advance, but one can't predict which outcome will occur on a given trial of the experiment.
]

#example("Finite sample spaces")[
  Toss a coin:
  $ Omega = {H,T} $

  Roll a pair of dice:
  $ Omega = {1,2,3,4,5,6} times {1,2,3,4,5,6} $
]

#example("Countably infinite sample spaces")[
  Shoot a basket until you make one:
  $ Omega = {M, F M, F F M, F F F M, dots} $
]

#example("Uncountably infinite sample space")[
  Waiting time for a bus:
  $ Omega = {T : t >= 0} $
]

#fact[
  Elements of $Omega$ are called sample points.
]

#definition[
  Any properly defined subset of $Omega$ is called an *event*.
]

#example[Dice][
  Rolling a fair die twice, let $A$ be the event that the combined score of both dice is 10.

  $ A = {(4,6,), (5,5),(6,4)} $
]

Probabilistic concepts in the parlance of set theory:

- Superset ($Omega$) $<->$ sample space
- Element $<->$ outcome / sample point ($omega$)
- Disjoint sets $<->$ mutually exclusive events

== Classical approach

Classical approach:

$ P(a) = (hash A) / (hash Omega) $

Requires equally likely outcomes and finite sample spaces.

#remark[
  With an infinite sample space, the probability becomes 0, which is often wrong.
]

#example("Dice again")[
  Rolling a fair die twice, let $A$ be the event that the combined score of both dice is 10.

  $
    A &= {(4,6,), (5,5),(6,4)} \
    P(A) &= 3 / 36 = 1 / 12
  $
]

== Relative frequency approach

$
  P(A) = (hash "of times" A "occurs in large number of trials") / (hash "of trials")
$

#example[
  Flipping a coin to determine the probability of it landing heads.
]

== Subjective approach

Personal definition of probability. Not "real" probability, merely co-opting
its parlance to lend credibility to subjective judgements of confidence.

== Axiomatic approach

Our focus in PSTAT 120A. It seems rather silly to call this approach axiomatic
given we are essentially just defining a function with a few given properties
and deriving theorems from it while working atop our pre-existing (shaky,
non-rigorous) "axioms" of set theory, but this is the terminology that the
course uses.

#definition[
  Let $P : X -> RR$ be a function satisfying the following axioms (properties).

  + $P(A) >= 0, forall A$
  + $P(Omega) = 1$
  + If $A_i sect A_j = emptyset, forall i != j$, then
    $ P(union.big_(i=1)^infinity A_i) = sum_(i=1)^infinity P(A_i) $
]

Now let us show various results with $P$.

#proposition[
  $ P(emptyset) = 0 $
]

#proof[
  By axiom 3,

  $
    A_1 = emptyset, A_2 = emptyset, A_3 = emptyset \
    P(emptyset) = sum^infinity_(i=1) P(A_i) = sum^infinity_(i=1) P(emptyset)
  $
  Suppose $P(emptyset) != 0$. Then $P >= 0$ by axiom 1 but then $P -> infinity$ in the sum, which implies $Omega > 1$, which is disallowed by axiom 2. So $P(emptyset) = 0$.
]

#proposition[
  If $A_1, A_2, ..., A_n$ are disjoint, then
  $ P(union.big^n_(i=1) A_i) = sum^n_(i= 1) P(A_i) $
]

This is mostly a formal manipulation to derive the obviously true proposition from our axioms.

#proof[
  Write any finite set $(A_1, A_2, ..., A_n)$ as an infinite set $(A_1, A_2, ..., A_n, emptyset, emptyset, ...)$. Then
  $
    P(union.big_(i=1)^infinity A_i) = sum^n_(i=1) P(A_i) + sum^infinity_(i=n+1) P(emptyset) = sum^n_(i=1) P(A_i)
  $
  And because all of the elements after $A_n$ are $emptyset$, their union adds no additional elements to the resultant union set of all $A_i$, so
  $
    P(union.big_(i=1)^infinity A_i) = P(union.big_(i=1)^n A_i) = sum_(i=1)^n P(A_i)
  $
]

#proposition[Complement][
  $ P(A') = 1 - P(A) $
]

#proof[
  $
    A' union A &= Omega \
    A' sect A &= emptyset \
    P(A' union A) &= P(A') + P(A) &"(by axiom 3)"\
    = P(Omega) &= 1 &"(by axiom 2)" \
    therefore P(A') &= 1 - P(A)
  $
]

#proposition[
  $ A subset.eq B => P(A) <= P(B) $
]

#proof[
  $ B = A union (A' sect B) $

  but $A$ and ($A' sect B$) are disjoint, so

  $
    P(B) &= P(A union (A' sect B)) \
    &= P(A) + P(A' sect B) \
    &therefore P(B) >= P(A)
  $
]

#proposition[
  $ P(A union B) = P(A) + P(B) - P(A sect B) $
]

#proof[
  $
    A = (A sect B) union (A sect B') \
    => P(A) = P(A sect B) + P(A sect B') \
    => P(B) = P(B sect A) + P(B sect A') \
    P(A) + P(B) = P(A sect B) + P(A sect B) + P(A sect B') + P(A' sect B) \
    => P(A) + P(B) - P(A sect B) = P(A sect B) + P(A sect B') + P(A' sect B) \
  $
]

#remark[
  This is a stronger result of axiom 3, which generalizes for all sets $A$ and $B$ regardless of whether they're disjoint.
]

#remark[
  These are mostly intuitively true statements (think about the probabilistic
  concepts represented by the sets) in classical probability that we derive
  rigorously from our axiomatic probability function $P$.
]

#example[
  Now let us consider some trivial concepts in classical probability written in
  the parlance of combinatorial probability.

  Select one card from a deck of 52 cards.
  Then the following is true:

  $
    Omega = {1,2,...,52} \
    A = "card is a heart" = {H 2, H 3, H 4, ..., H"Ace"} \
    B = "card is an Ace" = {H"Ace", C"Ace", D"Ace", S"Ace"} \
    C = "card is black" = {C 2, C 3, ..., C"Ace", S 2, S 3, ..., S"Ace"} \
    P(A) = 13 / 52,
    P(B) = 4 / 52,
    P(C) = 26 / 52 \
    P(A sect B) = 1 / 52 \
    P(A sect C) = 0 \
    P(B sect C) = 2 / 52 \
    P(A union B) = P(A) + P(B) - P(A sect B) = 16 / 52 \
    P(B') = 1 - P(B) = 48 / 52 \
    P(A sect B') = P(A) - P(A sect B) = 13 / 52 - 1 / 52 = 12 / 52 \
    P((A sect B') union (A' sect B)) = P(A sect B') + P(A' sect B) = 15 / 52 \
    P(A' sect B') = P(A union B)' = 1 - P(A union B) = 36 / 52
  $
]

== Countable sample spaces

#definition[
  A sample space $Omega$ is said to be *countable* if it's finite or countably infinite.
]

In such a case, one can list the elements of $Omega$.

$ Omega = {omega_1, omega_2, omega_3, ...} $
with associated probabilities, $p_1, p_2, p_3,...$, where
$
  p_i = P(omega_i) >= 0 \
  1 = P(Omega) = sum P(omega_i)
$

#example[Fair die, again][
  All outcomes are equally likely,
  $ p_1 = p_2 = ... = p_6 = 1 / 6 $
  Let $A$ be the event that the score is odd = ${1,3,5}$
  $ P(A) = 3 / 6 $
]

#example[Loaded die][
  Consider a die where the probabilities of rolling odd sides is double the probability of rolling an even side.
  $
    p_2 = p_4 = p_6, p_1 = p_3 = p_5 = 2p_2 \
    6p_2 + 3p_2 = 9p_2 = 1 \
    p_2 = 1 / 9, p_1 = 2 / 9
  $
]

#example[Coins][
  Toss a fair coin until you get the first head.
  $
    Omega = {H, T H, T T H, ...} "(countably infinite)" \
    P(H) = 1 / 2 \
    P(T T H) = (1 / 2)^3 \
    P(Omega) = sum_(n=1)^infinity (1 / 2)^n = 1 / (1 - 1 / 2) - 1 = 1
  $
]

#example[Birthdays][
  What is the probability two people share the same birthday?

  $
    Omega = [1,365] times [1,365] \
    P(A) = 365 / 365^2 = 1 / 365
  $
]

== Continuous sample spaces

#definition[
  A *continuous sample space* contains an interval in $RR$ and is uncountably infinite.
]

#definition[
  A probability density function (#smallcaps[pdf]) gives the probability at the point
  $s$.
]

Properties of the #smallcaps[pdf]:

- $f(s) >= 0, forall p_i >= 0$
- $integral_S f(s) dif s = 1, forall p_i >= 0$

#example[
  Waiting time for bus: $Omega = {s : s >= 0}$.
]