#import "./dvd.typ": * #import "@preview/ctheorems:1.1.3": * #show: dvdtyp.with( title: "PSTAT120A Course Notes", author: "Youwen Wu", date: "Winter 2025", subtitle: "Taught by Brian Wainwright", ) #outline() = Lecture #datetime(day: 6, month: 1, year: 2025).display() == Preliminaries #definition[ Statistics is the science dealing with the collection, summarization, analysis, and interpretation of data. ] == Set theory for dummies A terse introduction to elementary naive set theory and the basic operations upon them. #remark[ Keep in mind that without $cal(Z F C)$ or another model of set theory that resolves fundamental issues, our set theory is subject to paradoxes like Russell's. Whoops, the universe doesn't exist. ] #definition[ A *set* is a collection of elements. ] #example[Examples of sets][ + Trivial set: ${1}$ + Empty set: $emptyset$ + $A = {a,b,c}$ ] We can construct sets using set-builder notation (also sometimes called set comprehension). $ {"expression with" x | "conditions on" x} $ #example("Set builder notation")[ + The set of all even integers: ${2n | n in ZZ}$ + The set of all perfect squares in $RR$: ${x^2 | x in NN}$ ] We also have notation for working with sets: With arbitrary sets $A$, $B$: + $a in A$ ($a$ is a member of the set $A$) + $a in.not A$ ($a$ is not a member of the set $A$) + $A subset.eq B$ (Set theory: $A$ is a subset of $B$) (Stats: $A$ is a sample space in $B$) + $A subset B$ (Proper subset: $A != B$) + $A^c$ or $A'$ (read "complement of $A$," and introduced later) + $A union B$ (Union of $A$ and $B$. Gives a set with both the elements of $A$ and $B$) + $A sect B$ (Intersection of $A$ and $B$. Gives a set consisting of the elements in *both* $A$ and $B$) + $A \\ B$ (Set difference. The set of all elements of $A$ that are not also in $B$) + $A times B$ (Cartesian product. Ordered pairs of $(a,b)$ $forall a in A$, $forall b in B$) We can also write a few of these operations precisely as set comprehensions. + $A subset B => A = {a | a in B, forall a in A}$ + $A union B = {x | x in A or x in B}$ (here $or$ is the logical OR) + $A sect B = {x | x in A and x in B}$ (here $and$ is the logical AND) + $A \\ B = {a | a in A and a in.not B}$ + $A times B = {(a,b) | forall a in A, forall b in B}$ Take a moment and convince yourself that these definitions are equivalent to the previous ones. #definition[ The universal set $Omega$ is the set of all objects in a given set theoretical universe. ] With the above definition, we can now introduce the set complement. #definition[ The set complement $A'$ is given by $ A' = Omega \\ A $ where $Omega$ is the _universal set_. ] #example[The real plane][ The real plane $RR^2$ can be defined as a Cartesian product of $RR$ with itself. $ RR^2 = RR times RR $ ] Check your intuition that this makes sense. Why do you think $RR^n$ was chosen as the notation for $n$ dimensional spaces in $RR$? #definition[Disjoint sets][ If $A sect B$ = $emptyset$, then we say that $A$ and $B$ are *disjoint*. ] #fact[ For any sets $A$ and $B$, we have DeMorgan's Laws: + $(A union B)' = A' sect B'$ + $(A sect B)' = A' union B'$ ] #fact[Generalized DeMorgan's][ + $(union.big_i A_i)' = sect.big_i A_i '$ + $(sect.big_i A_i)' = union.big_i A_i '$ ] == Sizes of infinity #definition[ Let $N(A)$ be the number of elements in $A$. $N(A)$ is called the _cardinality_ of $A$. ] We say a set is finite if it has finite cardinality, or infinite if it has an infinite cardinality. Infinite sets can be either _countably infinite_ or _uncountably infinite_. When a set is countably infinite, its cardinality is $aleph_0$ (here $aleph$ is the Hebrew letter aleph and read "aleph null"). When a set is uncountably infinite, its cardinality is greater than $aleph_0$. #example("Countable sets")[ + The natural numbers $NN$. + The rationals $QQ$. + The natural numbers $ZZ$. + The set of all logical tautologies. ] #example("Uncountable sets")[ + The real numbers $RR$. + The real numbers in the interval $[0,1]$. + The _power set_ of $ZZ$, which is the set of all subsets of $ZZ$. ] #remark[ All the uncountable sets above have cardinality $2^(aleph_0)$ or $aleph_1$ or $frak(c)$ or $beth_1$. This is the _cardinality of the continuum_, also called "aleph 1" or "beth 1". However, in general uncountably infinite sets do not have the same cardinality. ] #fact[ If a set is countably infinite, then it has a bijection with $ZZ$. This means every set with cardinality $aleph_0$ has a bijection to $ZZ$. More generally, any sets with the same cardinality have a bijection between them. ] This gives us the following equivalent statement: #fact[ Two sets have the same cardinality if and only if there exists a bijective function between them. In symbols, $ N(A) = N(B) <==> exists F : A <-> B $ ] = Lecture #datetime(day: 8, month: 1, year: 2025).display() == Probability #definition[ A *random experiment* is one in which the set of all possible outcomes is known in advance, but one can't predict which outcome will occur on a given trial of the experiment. ] #example("Finite sample spaces")[ Toss a coin: $ Omega = {H,T} $ Roll a pair of dice: $ Omega = {1,2,3,4,5,6} times {1,2,3,4,5,6} $ ] #example("Countably infinite sample spaces")[ Shoot a basket until you make one: $ Omega = {M, F M, F F M, F F F M, dots} $ ] #example("Uncountably infinite sample space")[ Waiting time for a bus: $ Omega = {T : t >= 0} $ ] #fact[ Elements of $Omega$ are called sample points. ] #definition[ Any properly defined subset of $Omega$ is called an *event*. ] #example[Dice][ Rolling a fair die twice, let $A$ be the event that the combined score of both dice is 10. $ A = {(4,6,), (5,5),(6,4)} $ ] Probabilistic concepts in the parlance of set theory: - Superset ($Omega$) $<->$ sample space - Element $<->$ outcome / sample point ($omega$) - Disjoint sets $<->$ mutually exclusive events == Classical approach Classical approach: $ P(a) = (hash A) / (hash Omega) $ Requires equally likely outcomes and finite sample spaces. #remark[ With an infinite sample space, the probability becomes 0, which is often wrong. ] #example("Dice again")[ Rolling a fair die twice, let $A$ be the event that the combined score of both dice is 10. $ A &= {(4,6,), (5,5),(6,4)} \ P(A) &= 3 / 36 = 1 / 12 $ ] == Relative frequency approach $ P(A) = (hash "of times" A "occurs in large number of trials") / (hash "of trials") $ #example[ Flipping a coin to determine the probability of it landing heads. ] == Subjective approach Personal definition of probability. Not "real" probability, merely co-opting its parlance to lend credibility to subjective judgements of confidence. == Axiomatic approach Our focus in PSTAT 120A. It seems rather silly to call this approach axiomatic given we are essentially just defining a function with a few given properties and deriving theorems from it while working atop our pre-existing (shaky, non-rigorous) "axioms" of set theory, but this is the terminology that the course uses. #definition[ Let $P : X -> RR$ be a function satisfying the following axioms (properties). + $P(A) >= 0, forall A$ + $P(Omega) = 1$ + If $A_i sect A_j = emptyset, forall i != j$, then $ P(union.big_(i=1)^infinity A_i) = sum_(i=1)^infinity P(A_i) $ ] Now let us show various results with $P$. #proposition[ $ P(emptyset) = 0 $ ] #proof[ By axiom 3, $ A_1 = emptyset, A_2 = emptyset, A_3 = emptyset \ P(emptyset) = sum^infinity_(i=1) P(A_i) = sum^infinity_(i=1) P(emptyset) $ Suppose $P(emptyset) != 0$. Then $P >= 0$ by axiom 1 but then $P -> infinity$ in the sum, which implies $Omega > 1$, which is disallowed by axiom 2. So $P(emptyset) = 0$. ] #proposition[ If $A_1, A_2, ..., A_n$ are disjoint, then $ P(union.big^n_(i=1) A_i) = sum^n_(i= 1) P(A_i) $ ] This is mostly a formal manipulation to derive the obviously true proposition from our axioms. #proof[ Write any finite set $(A_1, A_2, ..., A_n)$ as an infinite set $(A_1, A_2, ..., A_n, emptyset, emptyset, ...)$. Then $ P(union.big_(i=1)^infinity A_i) = sum^n_(i=1) P(A_i) + sum^infinity_(i=n+1) P(emptyset) = sum^n_(i=1) P(A_i) $ And because all of the elements after $A_n$ are $emptyset$, their union adds no additional elements to the resultant union set of all $A_i$, so $ P(union.big_(i=1)^infinity A_i) = P(union.big_(i=1)^n A_i) = sum_(i=1)^n P(A_i) $ ] #proposition[Complement][ $ P(A') = 1 - P(A) $ ] #proof[ $ A' union A &= Omega \ A' sect A &= emptyset \ P(A' union A) &= P(A') + P(A) &"(by axiom 3)"\ = P(Omega) &= 1 &"(by axiom 2)" \ therefore P(A') &= 1 - P(A) $ ] #proposition[ $ A subset.eq B => P(A) <= P(B) $ ] #proof[ $ B = A union (A' sect B) $ but $A$ and ($A' sect B$) are disjoint, so $ P(B) &= P(A union (A' sect B)) \ &= P(A) + P(A' sect B) \ &therefore P(B) >= P(A) $ ] #proposition[ $ P(A union B) = P(A) + P(B) - P(A sect B) $ ] #proof[ $ A = (A sect B) union (A sect B') \ => P(A) = P(A sect B) + P(A sect B') \ => P(B) = P(B sect A) + P(B sect A') \ P(A) + P(B) = P(A sect B) + P(A sect B) + P(A sect B') + P(A' sect B) \ => P(A) + P(B) - P(A sect B) = P(A sect B) + P(A sect B') + P(A' sect B) \ $ ] #remark[ This is a stronger result of axiom 3, which generalizes for all sets $A$ and $B$ regardless of whether they're disjoint. ] #remark[ These are mostly intuitively true statements (think about the probabilistic concepts represented by the sets) in classical probability that we derive rigorously from our axiomatic probability function $P$. ] #example[ Now let us consider some trivial concepts in classical probability written in the parlance of combinatorial probability. Select one card from a deck of 52 cards. Then the following is true: $ Omega = {1,2,...,52} \ A = "card is a heart" = {H 2, H 3, H 4, ..., H"Ace"} \ B = "card is an Ace" = {H"Ace", C"Ace", D"Ace", S"Ace"} \ C = "card is black" = {C 2, C 3, ..., C"Ace", S 2, S 3, ..., S"Ace"} \ P(A) = 13 / 52, P(B) = 4 / 52, P(C) = 26 / 52 \ P(A sect B) = 1 / 52 \ P(A sect C) = 0 \ P(B sect C) = 2 / 52 \ P(A union B) = P(A) + P(B) - P(A sect B) = 16 / 52 \ P(B') = 1 - P(B) = 48 / 52 \ P(A sect B') = P(A) - P(A sect B) = 13 / 52 - 1 / 52 = 12 / 52 \ P((A sect B') union (A' sect B)) = P(A sect B') + P(A' sect B) = 15 / 52 \ P(A' sect B') = P(A union B)' = 1 - P(A union B) = 36 / 52 $ ] == Countable sample spaces #definition[ A sample space $Omega$ is said to be *countable* if it's finite or countably infinite. ] In such a case, one can list the elements of $Omega$. $ Omega = {omega_1, omega_2, omega_3, ...} $ with associated probabilities, $p_1, p_2, p_3,...$, where $ p_i = P(omega_i) >= 0 \ 1 = P(Omega) = sum P(omega_i) $ #example[Fair die, again][ All outcomes are equally likely, $ p_1 = p_2 = ... = p_6 = 1 / 6 $ Let $A$ be the event that the score is odd = ${1,3,5}$ $ P(A) = 3 / 6 $ ] #example[Loaded die][ Consider a die where the probabilities of rolling odd sides is double the probability of rolling an even side. $ p_2 = p_4 = p_6, p_1 = p_3 = p_5 = 2p_2 \ 6p_2 + 3p_2 = 9p_2 = 1 \ p_2 = 1 / 9, p_1 = 2 / 9 $ ] #example[Coins][ Toss a fair coin until you get the first head. $ Omega = {H, T H, T T H, ...} "(countably infinite)" \ P(H) = 1 / 2 \ P(T T H) = (1 / 2)^3 \ P(Omega) = sum_(n=1)^infinity (1 / 2)^n = 1 / (1 - 1 / 2) - 1 = 1 $ ] #example[Birthdays][ What is the probability two people share the same birthday? $ Omega = [1,365] times [1,365] \ P(A) = 365 / 365^2 = 1 / 365 $ ] == Continuous sample spaces #definition[ A *continuous sample space* contains an interval in $RR$ and is uncountably infinite. ] #definition[ A probability density function (#smallcaps[pdf]) gives the probability at the point $s$. ] Properties of the #smallcaps[pdf]: - $f(s) >= 0, forall p_i >= 0$ - $integral_S f(s) dif s = 1, forall p_i >= 0$ #example[ Waiting time for bus: $Omega = {s : s >= 0}$. ]