1145 lines
36 KiB
Text
1145 lines
36 KiB
Text
#import "@youwen/zen:0.1.0": *
|
|
#import "@preview/ctheorems:1.1.3": *
|
|
|
|
#show: zen.with(
|
|
title: "PSTAT120A Course Notes",
|
|
author: "Youwen Wu",
|
|
date: "Winter 2025",
|
|
subtitle: "Taught by Brian Wainwright",
|
|
)
|
|
|
|
#outline()
|
|
|
|
= Introduction
|
|
|
|
PSTAT 120A is an introductory course on probability and statistics. However, it
|
|
is a theoretical course rather an applied statistics course. You will not learn
|
|
how to read or conduct real-world statistical studies. Leave your $p$-values at
|
|
home, this ain't your momma's AP Stats.
|
|
|
|
= Lecture #datetime(day: 6, month: 1, year: 2025).display()
|
|
|
|
== Preliminaries
|
|
|
|
#definition[
|
|
Statistics is the science dealing with the collection, summarization,
|
|
analysis, and interpretation of data.
|
|
]
|
|
|
|
== Set theory for dummies
|
|
|
|
A terse introduction to elementary naive set theory and the basic operations
|
|
upon them.
|
|
|
|
#remark[
|
|
Keep in mind that without $cal(Z F C)$ or another model of set theory that
|
|
resolves fundamental issues, our set theory is subject to paradoxes like
|
|
Russell's. Whoops, the universe doesn't exist.
|
|
]
|
|
|
|
#definition[
|
|
A *set* is a collection of elements.
|
|
]
|
|
|
|
#example[Examples of sets][
|
|
+ Trivial set: ${1}$
|
|
+ Empty set: $emptyset$
|
|
+ $A = {a,b,c}$
|
|
]
|
|
|
|
We can construct sets using set-builder notation (also sometimes called set
|
|
comprehension).
|
|
|
|
$ {"expression with" x | "conditions on" x} $
|
|
|
|
#example("Set builder notation")[
|
|
+ The set of all even integers: ${2n | n in ZZ}$
|
|
+ The set of all perfect squares in $RR$: ${x^2 | x in NN}$
|
|
]
|
|
|
|
We also have notation for working with sets:
|
|
|
|
With arbitrary sets $A$, $B$:
|
|
|
|
+ $a in A$ ($a$ is a member of the set $A$)
|
|
+ $a in.not A$ ($a$ is not a member of the set $A$)
|
|
+ $A subset.eq B$ (Set theory: $A$ is a subset of $B$) (Stats: $A$ is a sample space in $B$)
|
|
+ $A subset B$ (Proper subset: $A != B$)
|
|
+ $A^c$ or $A'$ (read "complement of $A$," and introduced later)
|
|
+ $A union B$ (Union of $A$ and $B$. Gives a set with both the elements of $A$ and $B$)
|
|
+ $A sect B$ (Intersection of $A$ and $B$. Gives a set consisting of the elements in *both* $A$ and $B$)
|
|
+ $A \\ B$ (Set difference. The set of all elements of $A$ that are not also in $B$)
|
|
+ $A times B$ (Cartesian product. Ordered pairs of $(a,b)$ $forall a in A$, $forall b in B$)
|
|
|
|
We can also write a few of these operations precisely as set comprehensions.
|
|
|
|
+ $A subset B => A = {a | a in B, forall a in A}$
|
|
+ $A union B = {x | x in A or x in B}$ (here $or$ is the logical OR)
|
|
+ $A sect B = {x | x in A and x in B}$ (here $and$ is the logical AND)
|
|
+ $A \\ B = {a | a in A and a in.not B}$
|
|
+ $A times B = {(a,b) | forall a in A, forall b in B}$
|
|
|
|
Take a moment and convince yourself that these definitions are equivalent to
|
|
the previous ones.
|
|
|
|
#definition[
|
|
The universal set $Omega$ is the set of all objects in a given set
|
|
theoretical universe.
|
|
]
|
|
|
|
With the above definition, we can now introduce the set complement.
|
|
|
|
#definition[
|
|
The set complement $A'$ is given by
|
|
$
|
|
A' = Omega \\ A
|
|
$
|
|
where $Omega$ is the _universal set_.
|
|
]
|
|
|
|
#example[The real plane][
|
|
The real plane $RR^2$ can be defined as a Cartesian product of $RR$ with
|
|
itself.
|
|
|
|
$ RR^2 = RR times RR $
|
|
]
|
|
|
|
Check your intuition that this makes sense. Why do you think $RR^n$ was chosen
|
|
as the notation for $n$ dimensional spaces in $RR$?
|
|
|
|
#definition[Disjoint sets][
|
|
If $A sect B$ = $emptyset$, then we say that $A$ and $B$ are *disjoint*.
|
|
]
|
|
|
|
#fact[
|
|
For any sets $A$ and $B$, we have DeMorgan's Laws:
|
|
+ $(A union B)' = A' sect B'$
|
|
+ $(A sect B)' = A' union B'$
|
|
]
|
|
|
|
#fact[Generalized DeMorgan's][
|
|
+ $(union.big_i A_i)' = sect.big_i A_i '$
|
|
+ $(sect.big_i A_i)' = union.big_i A_i '$
|
|
]
|
|
|
|
== Sizes of infinity
|
|
|
|
#definition[
|
|
Let $N(A)$ be the number of elements in $A$. $N(A)$ is called the _cardinality_ of $A$.
|
|
]
|
|
|
|
We say a set is finite if it has finite cardinality, or infinite if it has an
|
|
infinite cardinality.
|
|
|
|
Infinite sets can be either _countably infinite_ or _uncountably infinite_.
|
|
|
|
When a set is countably infinite, its cardinality is $aleph_0$ (here $aleph$ is
|
|
the Hebrew letter aleph and read "aleph null").
|
|
|
|
When a set is uncountably infinite, its cardinality is greater than $aleph_0$.
|
|
|
|
#example("Countable sets")[
|
|
+ The natural numbers $NN$.
|
|
+ The rationals $QQ$.
|
|
+ The natural numbers $ZZ$.
|
|
+ The set of all logical tautologies.
|
|
]
|
|
|
|
#example("Uncountable sets")[
|
|
+ The real numbers $RR$.
|
|
+ The real numbers in the interval $[0,1]$.
|
|
+ The _power set_ of $ZZ$, which is the set of all subsets of $ZZ$.
|
|
]
|
|
|
|
#remark[
|
|
All the uncountable sets above have cardinality $2^(aleph_0)$ or $aleph_1$ or
|
|
$frak(c)$ or $beth_1$. This is the _cardinality of the continuum_, also
|
|
called "aleph 1" or "beth 1".
|
|
|
|
However, in general uncountably infinite sets do not have the same
|
|
cardinality.
|
|
]
|
|
|
|
#fact[
|
|
If a set is countably infinite, then it has a bijection with $ZZ$. This means
|
|
every set with cardinality $aleph_0$ has a bijection to $ZZ$. More generally,
|
|
any sets with the same cardinality have a bijection between them.
|
|
]
|
|
|
|
This gives us the following equivalent statement:
|
|
|
|
#fact[
|
|
Two sets have the same cardinality if and only if there exists a bijective
|
|
function between them. In symbols,
|
|
|
|
$ N(A) = N(B) <==> exists F : A <-> B $
|
|
]
|
|
|
|
= Lecture #datetime(day: 8, month: 1, year: 2025).display()
|
|
|
|
== Probability
|
|
|
|
#definition[
|
|
A *random experiment* is one in which the set of all possible outcomes is known in advance, but one can't predict which outcome will occur on a given trial of the experiment.
|
|
]
|
|
|
|
#example("Finite sample spaces")[
|
|
Toss a coin:
|
|
$ Omega = {H,T} $
|
|
|
|
Roll a pair of dice:
|
|
$ Omega = {1,2,3,4,5,6} times {1,2,3,4,5,6} $
|
|
]
|
|
|
|
#example("Countably infinite sample spaces")[
|
|
Shoot a basket until you make one:
|
|
$ Omega = {M, F M, F F M, F F F M, dots} $
|
|
]
|
|
|
|
#example("Uncountably infinite sample space")[
|
|
Waiting time for a bus:
|
|
$ Omega = {T : t >= 0} $
|
|
]
|
|
|
|
#fact[
|
|
Elements of $Omega$ are called sample points.
|
|
]
|
|
|
|
#definition[
|
|
Any properly defined subset of $Omega$ is called an *event*.
|
|
]
|
|
|
|
#example[Dice][
|
|
Rolling a fair die twice, let $A$ be the event that the combined score of both dice is 10.
|
|
|
|
$ A = {(4,6,), (5,5),(6,4)} $
|
|
]
|
|
|
|
Probabilistic concepts in the parlance of set theory:
|
|
|
|
- Superset ($Omega$) $<->$ sample space
|
|
- Element $<->$ outcome / sample point ($omega$)
|
|
- Disjoint sets $<->$ mutually exclusive events
|
|
|
|
== Classical approach
|
|
|
|
Classical approach:
|
|
|
|
$ P(a) = (hash A) / (hash Omega) $
|
|
|
|
Requires equally likely outcomes and finite sample spaces.
|
|
|
|
#remark[
|
|
With an infinite sample space, the probability becomes 0, which is often wrong.
|
|
]
|
|
|
|
#example("Dice again")[
|
|
Rolling a fair die twice, let $A$ be the event that the combined score of both dice is 10.
|
|
|
|
$
|
|
A &= {(4,6,), (5,5),(6,4)} \
|
|
P(A) &= 3 / 36 = 1 / 12
|
|
$
|
|
]
|
|
|
|
== Relative frequency approach
|
|
|
|
An approach done commonly by applied statisticians who work in the disgusting
|
|
real world. This is where we are generally concerned with irrelevant concerns
|
|
like accurate sampling and $p$-values and such. I am told this is covered in
|
|
PSTAT 120B, so hopefully I can avoid ever taking that class (as a pure math
|
|
major).
|
|
|
|
$
|
|
P(A) = (hash "of times" A "occurs in large number of trials") / (hash "of trials")
|
|
$
|
|
|
|
#example[
|
|
Flipping a coin to determine the probability of it landing heads.
|
|
]
|
|
|
|
== Subjective approach
|
|
|
|
Personal definition of probability. Not "real" probability, merely co-opting
|
|
its parlance to lend credibility to subjective judgements of confidence.
|
|
|
|
== Axiomatic approach
|
|
|
|
Consider a random experiment. Then:
|
|
|
|
#definition[
|
|
The *sample space* $Omega$ is the set of all possible outcomes of the
|
|
experiment.
|
|
]
|
|
|
|
#definition[
|
|
Elements of $Omega$ are called *sample points*.
|
|
]
|
|
|
|
#definition[
|
|
Subsets of $Omega$ are called *events*. The collection of events (in other
|
|
terms, the power set of $Omega$) in $Omega$ is denoted by $cal(F)$.
|
|
]
|
|
|
|
#definition[
|
|
The *probability measure*, or probability distribution, or simply probability s a function $P$.
|
|
|
|
Let $P : cal(F) -> RR$ be a function satisfying the following axioms (properties).
|
|
|
|
+ $P(A) >= 0, forall A$
|
|
+ $P(Omega) = 1$
|
|
+ If $A_i sect A_j = emptyset, forall i != j$, then
|
|
$ P(union.big_(i=1)^infinity A_i) = sum_(i=1)^infinity P(A_i) $
|
|
]
|
|
|
|
The 3-tuple $(Omega, cal(F), P)$ is called a *probability space*.
|
|
|
|
#remark[
|
|
In more advanced texts you will see $Omega$ introduced as a so-called
|
|
$sigma$-algebra. A $sigma$-algebra on a set $Omega$ is a nonempty collection
|
|
$Sigma$ of subsets of $Omega$ that is closed under set complement, countable
|
|
unions, and as a corollary, countable intersections.
|
|
]
|
|
|
|
Now let us show various results with $P$.
|
|
|
|
#proposition[
|
|
$ P(emptyset) = 0 $
|
|
]
|
|
|
|
#proof[
|
|
By axiom 3,
|
|
|
|
$
|
|
A_1 = emptyset, A_2 = emptyset, A_3 = emptyset \
|
|
P(emptyset) = sum^infinity_(i=1) P(A_i) = sum^infinity_(i=1) P(emptyset)
|
|
$
|
|
Suppose $P(emptyset) != 0$. Then $P >= 0$ by axiom 1 but then $P -> infinity$ in the sum, which implies $Omega > 1$, which is disallowed by axiom 2. So $P(emptyset) = 0$.
|
|
]
|
|
|
|
#proposition[
|
|
If $A_1, A_2, ..., A_n$ are disjoint, then
|
|
$ P(union.big^n_(i=1) A_i) = sum^n_(i= 1) P(A_i) $
|
|
]
|
|
|
|
This is mostly a formal manipulation to derive the obviously true proposition from our axioms.
|
|
|
|
#proof[
|
|
Write any finite set $(A_1, A_2, ..., A_n)$ as an infinite set $(A_1, A_2, ..., A_n, emptyset, emptyset, ...)$. Then
|
|
$
|
|
P(union.big_(i=1)^infinity A_i) = sum^n_(i=1) P(A_i) + sum^infinity_(i=n+1) P(emptyset) = sum^n_(i=1) P(A_i)
|
|
$
|
|
And because all of the elements after $A_n$ are $emptyset$, their union adds no additional elements to the resultant union set of all $A_i$, so
|
|
$
|
|
P(union.big_(i=1)^infinity A_i) = P(union.big_(i=1)^n A_i) = sum_(i=1)^n P(A_i)
|
|
$
|
|
]
|
|
|
|
#proposition[Complement][
|
|
$ P(A') = 1 - P(A) $
|
|
]
|
|
|
|
#proof[
|
|
$
|
|
A' union A &= Omega \
|
|
A' sect A &= emptyset \
|
|
P(A' union A) &= P(A') + P(A) &"(by axiom 3)"\
|
|
= P(Omega) &= 1 &"(by axiom 2)" \
|
|
therefore P(A') &= 1 - P(A)
|
|
$
|
|
]
|
|
|
|
#proposition[
|
|
$ A subset.eq B => P(A) <= P(B) $
|
|
]
|
|
|
|
#proof[
|
|
$ B = A union (A' sect B) $
|
|
|
|
but $A$ and ($A' sect B$) are disjoint, so
|
|
|
|
$
|
|
P(B) &= P(A union (A' sect B)) \
|
|
&= P(A) + P(A' sect B) \
|
|
&therefore P(B) >= P(A)
|
|
$
|
|
]
|
|
|
|
#proposition[
|
|
$ P(A union B) = P(A) + P(B) - P(A sect B) $
|
|
]
|
|
|
|
#proof[
|
|
$
|
|
A = (A sect B) union (A sect B') \
|
|
=> P(A) = P(A sect B) + P(A sect B') \
|
|
=> P(B) = P(B sect A) + P(B sect A') \
|
|
P(A) + P(B) = P(A sect B) + P(A sect B) + P(A sect B') + P(A' sect B) \
|
|
=> P(A) + P(B) - P(A sect B) = P(A sect B) + P(A sect B') + P(A' sect B) \
|
|
$
|
|
]
|
|
|
|
#remark[
|
|
This is a stronger result of axiom 3, which generalizes for all sets $A$ and $B$ regardless of whether they're disjoint.
|
|
]
|
|
|
|
#remark[
|
|
These are mostly intuitively true statements (think about the probabilistic
|
|
concepts represented by the sets) in classical probability that we derive
|
|
rigorously from our axiomatic probability function $P$.
|
|
]
|
|
|
|
#example[
|
|
Now let us consider some trivial concepts in classical probability written in
|
|
the parlance of combinatorial probability.
|
|
|
|
Select one card from a deck of 52 cards.
|
|
Then the following is true:
|
|
|
|
$
|
|
Omega = {1,2,...,52} \
|
|
A = "card is a heart" = {H 2, H 3, H 4, ..., H"Ace"} \
|
|
B = "card is an Ace" = {H"Ace", C"Ace", D"Ace", S"Ace"} \
|
|
C = "card is black" = {C 2, C 3, ..., C"Ace", S 2, S 3, ..., S"Ace"} \
|
|
P(A) = 13 / 52,
|
|
P(B) = 4 / 52,
|
|
P(C) = 26 / 52 \
|
|
P(A sect B) = 1 / 52 \
|
|
P(A sect C) = 0 \
|
|
P(B sect C) = 2 / 52 \
|
|
P(A union B) = P(A) + P(B) - P(A sect B) = 16 / 52 \
|
|
P(B') = 1 - P(B) = 48 / 52 \
|
|
P(A sect B') = P(A) - P(A sect B) = 13 / 52 - 1 / 52 = 12 / 52 \
|
|
P((A sect B') union (A' sect B)) = P(A sect B') + P(A' sect B) = 15 / 52 \
|
|
P(A' sect B') = P(A union B)' = 1 - P(A union B) = 36 / 52
|
|
$
|
|
]
|
|
|
|
== Countable sample spaces
|
|
|
|
#definition[
|
|
A sample space $Omega$ is said to be *countable* if it's finite or countably infinite.
|
|
]
|
|
|
|
In such a case, one can list the elements of $Omega$.
|
|
|
|
$ Omega = {omega_1, omega_2, omega_3, ...} $
|
|
with associated probabilities, $p_1, p_2, p_3,...$, where
|
|
$
|
|
p_i = P(omega_i) >= 0 \
|
|
1 = P(Omega) = sum P(omega_i)
|
|
$
|
|
|
|
#example[Fair die, again][
|
|
All outcomes are equally likely,
|
|
$ p_1 = p_2 = ... = p_6 = 1 / 6 $
|
|
Let $A$ be the event that the score is odd = ${1,3,5}$
|
|
$ P(A) = 3 / 6 $
|
|
]
|
|
|
|
#example[Loaded die][
|
|
Consider a die where the probabilities of rolling odd sides is double the probability of rolling an even side.
|
|
$
|
|
p_2 = p_4 = p_6, p_1 = p_3 = p_5 = 2p_2 \
|
|
6p_2 + 3p_2 = 9p_2 = 1 \
|
|
p_2 = 1 / 9, p_1 = 2 / 9
|
|
$
|
|
]
|
|
|
|
#example[Coins][
|
|
Toss a fair coin until you get the first head.
|
|
$
|
|
Omega = {H, T H, T T H, ...} "(countably infinite)" \
|
|
P(H) = 1 / 2 \
|
|
P(T T H) = (1 / 2)^3 \
|
|
P(Omega) = sum_(n=1)^infinity (1 / 2)^n = 1 / (1 - 1 / 2) - 1 = 1
|
|
$
|
|
]
|
|
|
|
#example[Birthdays][
|
|
What is the probability two people share the same birthday?
|
|
|
|
$
|
|
Omega = [1,365] times [1,365] \
|
|
P(A) = 365 / 365^2 = 1 / 365
|
|
$
|
|
]
|
|
|
|
== Continuous sample spaces
|
|
|
|
#definition[
|
|
A *continuous sample space* contains an interval in $RR$ and is uncountably infinite.
|
|
]
|
|
|
|
#definition[
|
|
A probability density function (#smallcaps[pdf]) gives the probability at the point
|
|
$s$.
|
|
]
|
|
|
|
Properties of the #smallcaps[pdf]:
|
|
|
|
- $f(s) >= 0, forall p_i >= 0$
|
|
- $integral_S f(s) dif s = 1, forall p_i >= 0$
|
|
|
|
#example[
|
|
Waiting time for bus: $Omega = {s : s >= 0}$.
|
|
]
|
|
|
|
= Notes on counting
|
|
|
|
The cardinality of $A$ is given by $hash A$. Let us develop methods for finding
|
|
$hash A$ from a description of the set $A$ (in other words, methods for
|
|
counting).
|
|
|
|
== General multiplication principle
|
|
|
|
#fact[
|
|
Let $A$ and $B$ be finite sets, $k in ZZ^+$. Then let $f : A -> B$ be a
|
|
function such that each element in $B$ is the image of exactly $k$ elements
|
|
in $A$ (such a function is called _$k$-to-one_). Then $hash A = k dot hash
|
|
B$.
|
|
]<ktoone>
|
|
|
|
#example[
|
|
Four fully loaded 10-seater vans transported people to the picnic. How many
|
|
people were transported?
|
|
|
|
By @ktoone, we have $A$ is the set of people, $B$ is the set of vans, $f : A -> B$ maps a person to the van they ride in. So $f$ is a 10-to-one function, $hash A = 40$, $hash B = 4$, and clearly the answer is $10 dot 4 = 40$.
|
|
]
|
|
|
|
#definition[
|
|
An $n$-tuple is an ordered sequence of $n$ elements.
|
|
]
|
|
|
|
Many of our methods in probability rely on multiplying together multiple
|
|
outcomes to obtain their combined amount of outcomes. We make this explicit below in @tuplemultiplication.
|
|
|
|
#fact[
|
|
Suppose a set of $n$-tuples $(a_1, ..., a_n)$ obeys these rules:
|
|
|
|
+ There are $r_1$ choices for the first entry $a_1$.
|
|
+ Once the first $k$ entries $a_1, ..., a_k$ have been chosen, the number of alternatives for the next entry $a_(k+1)$ is $r_(k+1)$, regardless of the previous choices.
|
|
|
|
Then the total number of $n$-tuples is the product $r_1 dot r_2 dot r_2 dot dots dot r_n$.
|
|
]<tuplemultiplication>
|
|
|
|
#proof[
|
|
It is trivially true for $n = 1$ since you have $r_1$ choices of $a_1$ for a
|
|
1-tuple $(a_1)$.
|
|
|
|
Let $A$ be the set of all possible $n$-tuples and $B$ be the set of all
|
|
possible $(n+1)$-tuples. Now let us assume the statement is true for $A$.
|
|
Proceed by induction on $B$, noting that for each $n$-tuple in $A$, $(a_1,
|
|
..., a_n)$, we have $r_(n+1)$ tuples in $A$.
|
|
|
|
Let $f : B -> A$ be a function which takes each $(n+1)$-tuple and truncates the $a_(n+1)$ term, leaving us with just an $n$-tuple of the form $(a_1, a_2, ..., a_n)$.
|
|
$ f((a_1, ..., a_n, a_(n + 1))) = (a_1, ..., a_n) $
|
|
Now notice that $f$ is precisely a $r_(n+1)$-to-one function! Recall by
|
|
our assumption that @tuplemultiplication is true for $n$-tuples, so $A$ has $r_1 dot
|
|
r_2 dot ... dot r_n$ elements, or $hash A = r_1 dot ... dot r_n$. Then by
|
|
@ktoone, we have $hash B = hash A dot r_(n+1) = r_1 dot r_2 dot
|
|
... dot r_(n+1)$. Our induction is complete and we have proved @tuplemultiplication.
|
|
]
|
|
|
|
@tuplemultiplication is sometimes called the _general multiplication principle_.
|
|
|
|
We can use @tuplemultiplication to derive counting formulas for various
|
|
situations. Let $A_1, A_2, A_n$ be finite sets. Then as a corollary of
|
|
@tuplemultiplication, we can count the number of $n$-tuples in a finite
|
|
Cartesian product of $A_1, A_2, A_n$.
|
|
|
|
#fact[
|
|
Let $A_1, A_2, A_n$ be finite sets. Then
|
|
|
|
$
|
|
hash (A_1 times A_2 times ... times, A_n) = (hash A_1) dot (hash A_2) dot ... dot (hash A_n) = Pi^n_(i=1) (hash A_i)
|
|
$
|
|
]
|
|
|
|
#example[
|
|
How many distinct subsets does a set of size $n$ have?
|
|
|
|
The answer is $2^n$. Each subset can be encoded as an $n$-tuple with entries 0
|
|
or 1, where the $i$th entry is 1 if the $i$th element of the set is in the
|
|
subset and 0 if it is not.
|
|
|
|
Thus the number of subsets is the same as the cardinality of
|
|
$ {0,1} times ... times {0,1} = {0,1}^n $
|
|
which is $2^n$.
|
|
|
|
This is why given a set $X$ with cardinality $aleph$, we write the
|
|
cardinality of the power set of $X$ as $2^aleph$.
|
|
]
|
|
|
|
== Permutations
|
|
|
|
Now we can use the multiplication principle to count permutations.
|
|
|
|
#fact[
|
|
Consider all $k$-tuples $(a_1, ..., a_k)$ that can be constructed from a set $A$ of size $n, n>= k$ without repetition. The total number of these $k$-tuples is
|
|
$ (n)_k = n dot (n - 1) ... (n - k + 1) = n! / (n-k)! $
|
|
|
|
In particular, with $k=n$, each $n$-tuple is an ordering or _permutation_ of $A$. So the total number of permutations of a set of $n$ elements is $n!$.
|
|
]<permutation>
|
|
|
|
#proof[
|
|
We construct the $k$-tuples sequentially. For the first element, we choose
|
|
one element from $A$ with $n$ alternatives. The next element has $n - 1$
|
|
alternatives. In general, after $j$ elements are chosen, there are $n - j +
|
|
1$ alternatives.
|
|
|
|
Then clearly after choosing $k$ elements for our $k$-tuple we have by
|
|
@tuplemultiplication the number of $k$-tuples being $n dot (n - 1) dot ...
|
|
dot (n - k + 1) = (n)_k$.
|
|
]
|
|
|
|
#example[
|
|
Consider a round table with 8 seats.
|
|
|
|
+ In how many ways can we seat 8 guests around the table?
|
|
+ In how many ways can we do this if we do not differentiate between seating arrangements that are rotations of each other?
|
|
|
|
For (1), we easily see that we're simply asking for permutations of an
|
|
8-tuple, so $8!$ is the answer.
|
|
|
|
For (2), we number each person and each seat from 1-8, then always place person 1 in seat 1, and count the permutations of the other 7 people in the other 7 seats. Then the answer is $7!$.
|
|
|
|
Alternatively, notice that each arrangement has 8 equivalent arrangements under rotation. So the answer is $8!/8 = 7!$.
|
|
]
|
|
|
|
== Counting from sets
|
|
|
|
We turn our attention to sets, which unlike tuples are unordered collections.
|
|
|
|
#fact[
|
|
Let $n,k in NN$ with $0 <= k <= n$. The numbers of distinct subsets of size $k$ that a set of size $n$ has is given by the *binomial coefficient*
|
|
$ vec(n,k) = n! / (k! (n-k)!) $
|
|
]
|
|
|
|
#proof[
|
|
Let $A$ be a set of size $n$. By @permutation, $n!/(n-k)!$ unique ordered
|
|
$k$-tuples can be constructed from elements of $A$. Each subset of $A$ of
|
|
size $k$ has exactly $k!$ different orderings, and hence appears exactly $k!$
|
|
times among the ordered $k$-tuples. Thus the number of subsets of size $k$ is
|
|
$n! / (k! (n-k)!)$.
|
|
]
|
|
|
|
#example[
|
|
In a class there are 12 boys and 14 girls. How many different teams of 7 pupils
|
|
with 3 boys and 4 girls can be create?
|
|
|
|
First let us compute how many subsets of size 3 we can choose from the 12 boys and how many subsets of size 4 we can choose from the 14 girls.
|
|
|
|
$
|
|
"boys" &= vec(12,3) \
|
|
"girls" &= vec(14,4)
|
|
$
|
|
|
|
Then let us consider the entire team as a 2-tuple of (boys, girls). Then
|
|
there are $vec(12,3)$ alternatives for the choice of boys, and $vec(14,4)$ alternatives for
|
|
the choice of girls, so by the multiplication principle, we have the total being
|
|
|
|
$ vec(12,3) vec(14,4) $
|
|
]
|
|
|
|
#example[
|
|
Color the numbers 1, 2 red, the numbers 3, 4 green, and the numbers 5, 6
|
|
yellow. How many different two-element subsets of $A$ are there that have two
|
|
different colors?
|
|
|
|
First choose 2 colors, $vec(3,2) = 3$. Then from each color, choose one. Altogether it's
|
|
$ vec(3,2) vec(2,1) vec(2,1) = 3 dot 2 dot 2 = 12 $
|
|
]
|
|
|
|
One way to view $vec(n,k)$ is as the number of ways of painting $n$ elements
|
|
with two colors, red and yellow, with $k$ red and $n - k$ yellow elements. Let
|
|
us generalize to more than two colors.
|
|
|
|
#fact[
|
|
Let $n$ and $r$ be positive integers and $k_1, ..., k_r$ nonnegative integers
|
|
such that $k_1 + dots.c + k_r = n$. The number of ways of assigning labels
|
|
$1,2, ..., r$ to $n$ items so that for each $i = 1, 2, ..., r$, exactly $k_i$
|
|
items receive label $i$, is the *multinomial coefficient*
|
|
|
|
$ vec(n, (k_1, k_2, ..., k_r)) = vec(n!, k_1 ! k_2 ! dots.c k_r !) $
|
|
]<multinomial-coefficient>
|
|
|
|
#proof[
|
|
Order the $n$ integers in some manner, and assign labels like this: for the
|
|
first $k_1$ integers, assign the label 1, then for the next $k_2$ integers,
|
|
assign the label 2, and so on. The $i$th label will be assigned to all the
|
|
integers between positions $k_1 + dots.c + k_(i-1) + 1$ and $k_1 + dots.c +
|
|
k_i$.
|
|
|
|
Then notice that all possible orderings (permutations) of the integers gives
|
|
every possible way to label the integers. However, we overcount by some
|
|
amount. How much? The order of the integers with a given label don't matter,
|
|
so we need to deduplicate those.
|
|
|
|
Each set of labels is duplicated once for each way we can order all of the
|
|
elements with the same label. For label $i$, there are $k_i$ elements with
|
|
that label, so $k_i !$ ways to order those. By @tuplemultiplication, we know
|
|
that we can express the combined amount of ways each group of $k_1, ..., k_i$
|
|
numbers are labeled as $k_1 ! k_2 ! k_3 ! dots.c k_r !$.
|
|
|
|
So by @ktoone, we can account for the duplicates and the answer is
|
|
$ n! / (k_1 ! k_2 ! k_3 ! dots.c k_r !) $
|
|
]
|
|
|
|
#remark[
|
|
@multinomial-coefficient gives us a way to count how many ways there are to
|
|
fit $n$ distinguishable objects into $r$ distinguishable containers of
|
|
varying capacity.
|
|
|
|
To find the amount of ways to fit $n$ distinguishable objects into $k$
|
|
indistinguishable containers of _any_ capacity, use the "ball-and-urn"
|
|
technique.
|
|
]
|
|
|
|
#example[
|
|
How many different ways can six people be divided into three pairs?
|
|
|
|
First we use the multimonial coefficient to count the amount of ways to assign specific labels to pairs of elements:
|
|
$ vec(6, (2,2,2)) $
|
|
But notice that the actual labels themselves are irrelevant. Our multimonial
|
|
coefficient counts how many ways there are to assign 3 distinguishable
|
|
labels, say Pair 1, Pair 2, Pair 3, to our 6 elements.
|
|
|
|
To make this more explicit, say we had a 3-tuple where the position encoded
|
|
the label, where position 1 corresponds to Pair 1, and so on. Then the values
|
|
are the actual pairs of people (numbered 1-6). For instance
|
|
$ ((1,2), (3,4), (5,6)) $
|
|
corresponds to assigning the label Pair 1 to (1,2), Pair 2 to (3,4) and Pair
|
|
3 to (5,6). What our multimonial coefficient is doing is it's counting this,
|
|
as well as any other orderings of this tuple. For instance
|
|
$ ((3,4), (1,2), (5,6)) $
|
|
is also counted. However since in our case the actual labels are irrelevant,
|
|
the two examples shown above should really be counted only once.
|
|
|
|
How many extra times is each case counted? It turns out that we can think of
|
|
our multimonial coefficient as permuting the labels across our pairs. So in
|
|
this case it's permuting all the ways we can order 3 labels, which is $3! =
|
|
6$. That means by @ktoone our answer is
|
|
|
|
$ vec(6, (2,2,2)) / 3! = 15 $
|
|
]
|
|
|
|
#example("Poker")[
|
|
How many poker hands are in the category _one pair_?
|
|
|
|
A one pair is a hand with two cards of the same rank and three cards with ranks
|
|
different from each other and the pair.
|
|
|
|
We can count in two ways: we count all the ordered hands, then divide by $5!$
|
|
to remove overcounting, or we can build the unordered hands directly.
|
|
|
|
When finding the ordered hands, the key is to figure out how we can encode
|
|
our information in a tuple of the form described in @tuplemultiplication, and
|
|
then use @tuplemultiplication to compute the solution.
|
|
|
|
In this case, the first element encodes the two slots in the hand of 5 our
|
|
pair occupies, the second element encodes the first card of the pair, the
|
|
third element encodes the second card of the pair, and the fourth, fifth, and
|
|
sixth elements represent the 3 cards that are not of the same rank.
|
|
|
|
Now it is clear that the number of alternatives in each position of the
|
|
6-tuple does not depend on any of the others, so @tuplemultiplication
|
|
applies. Then we can determine the amount of alternatives for each position
|
|
in the 6-tuple and multiply them to determine the total amount of ways the
|
|
6-tuple can be constructed, giving us the total amount of ways to construct
|
|
ordered poker hands with one pairs.
|
|
|
|
First we choose 2 slots out of 5 positions (in the hand) so there are
|
|
$vec(5,2)$ alternatives. Then we choose any of the 52 cards for our first
|
|
pair card, so there are 52 alternatives. Then we choose any card with the
|
|
same rank for the second card in the pair, where there are 3 possible
|
|
alternatives. Then we choose the third card which must not be the same rank
|
|
as the first two, where there are 48 alternatives. The fourth card must not
|
|
be the same rank as the others, so there are 44 alternatives. Likewise, the
|
|
final card has 40 alternatives.
|
|
|
|
So the final answer is, remembering to divide by $5!$ because we don't care
|
|
about order,
|
|
$ (vec(5,2) dot 52 dot 3 dot 48 dot 44 dot 40) / 5! $
|
|
|
|
Alternatively, we can find way to build an unordered hand with the
|
|
requirements. First we choose the rank of the pair, then we choose two suits
|
|
for that rank, then we choose the remaining 3 different ranks, and finally a
|
|
suit for each of the ranks. Then, noting that we will now omit constructing
|
|
the tuple and explicitly listing alternatives for brevity, we have
|
|
$ 13 dot vec(5,2) dot vec(12, 3) dot 4^3 $
|
|
|
|
Both approaches given the same answer.
|
|
]
|
|
|
|
= Discussion section #datetime(day: 22, month: 1, year: 2025).display()
|
|
|
|
= Lecture #datetime(day: 23, month: 1, year: 2025).display()
|
|
|
|
== Independence
|
|
#definition("Independence")[
|
|
Two events $A subset Omega$ and $B subset Omega$ are independent if and only if
|
|
$ P(B sect A) = P(B)P(A) $
|
|
"Joint probability is equal to product of their marginal probabilities."
|
|
]
|
|
|
|
#fact[This definition must be used to show the independence of two events.]
|
|
|
|
#fact[
|
|
If $A$ and $B$ are independent, then,
|
|
$
|
|
P(A | B) = underbrace((P(A sect B)) / P(B), "conditional probability") = (P(A) P(B)) / P(B) = P(A)
|
|
$
|
|
]
|
|
|
|
#example[
|
|
Flip a fair coin 3 times. Let the events:
|
|
|
|
- $A$ = we have exactly one tails among the first 2 flips
|
|
- $B$ = we have exactly one tails among the last 2 flips
|
|
- $D$ = we get exactly one tails among all 3 flip
|
|
|
|
Show that $A$ and $B$ are independent.
|
|
What about $B$ and $D$?
|
|
|
|
Compute all of the possible events, then we see that
|
|
|
|
$
|
|
P(A sect B) = (hash (A sect B)) / (hash Omega) = 2 / 8 = 4 / 8 dot 4 / 8 = P(A) P(B)
|
|
$
|
|
|
|
So they are independent.
|
|
|
|
Repeat the same reasoning for $B$ and $D$, we see that they are not independent.
|
|
]
|
|
|
|
#example[
|
|
Suppose we have 4 red and 7 green balls in an urn. We choose two balls with replacement. Let
|
|
|
|
- $A$ = the first ball is red
|
|
- $B$ = the second ball is greeen
|
|
|
|
Are $A$ and $B$ independent?
|
|
|
|
$
|
|
hash Omega = 11 times 11 = 121 \
|
|
hash A = 4 dot 11 = 44 \
|
|
hash B = 11 dot 7 = 77 \
|
|
hash (A sect B) = 4 dot 7 = 28
|
|
$
|
|
]
|
|
|
|
#definition[
|
|
Events $A_1, ..., A_n$ are independent (mutually independent) if for every collection $A_i_1, ..., A_i_k$, where $2 <= k <= n$ and $1 <= i_1 < i_2 < dots.c < i_k <= n$,
|
|
|
|
$
|
|
P(A_i_1 sect A_i_2 sect dots.c sect A_i_k) = P(A_i_1) P(A_i_2) dots.c P(A_i_k)
|
|
$
|
|
]
|
|
|
|
#definition[
|
|
We say that the events $A_1, ..., A_n$ are *pairwise independent* if any two
|
|
different events $A_i$ and $A_j$ are independent for any $i != j$.
|
|
]
|
|
|
|
= Lecture #datetime(day: 27, year: 2025, month: 1).display()
|
|
|
|
== Bernoulli trials
|
|
|
|
The setup: the experiment has exactly two outcomes:
|
|
- Success -- $S$ or 1
|
|
- Failure -- $F$ or 0
|
|
|
|
Additionally:
|
|
$
|
|
P(S) = p, (0 < p < 1) \
|
|
P(F) = 1 - p = q
|
|
$
|
|
|
|
Construct the probability mass function:
|
|
|
|
$
|
|
P(X = 1) = p \
|
|
P(X = 0) = 1 - p
|
|
$
|
|
|
|
Write it as:
|
|
|
|
$ p_x(k) = p^k (1-p)^(1-k) $
|
|
|
|
for $k = 1$ and $k = 0$.
|
|
|
|
== Binomial distribution
|
|
|
|
The setup: very similar to Bernoulli, trials have exactly 2 outcomes. A bunch
|
|
of Bernoulli trials in a row.
|
|
|
|
Importantly: $p$ and $q$ are defined exactly the same in all trials.
|
|
|
|
This ties the binomial distribution to the sampling with replacement model,
|
|
since each trial does not affect the next.
|
|
|
|
We conduct $n$ *independent* trials of this experiment. Example with coins: each
|
|
flip independently has a $1/2$ chance of heads or tails (holds same for die,
|
|
rigged coin, etc).
|
|
|
|
$n$ is fixed, i.e. known ahead of time.
|
|
|
|
== Binomial random variable
|
|
|
|
Let $X = hash$ of successes in $n$ independent trials. For any particular
|
|
sequence of $n$ trials, it takes the form $Omega = {omega} "where" omega = S
|
|
F F dots.c F$ and is of length $n$.
|
|
|
|
Then $X(omega) = 0,1,2,...,n$ can take $n + 1$ possible values. The
|
|
probability of any particular sequence is given by the product of the
|
|
individual trial probabilities.
|
|
|
|
#example[
|
|
$ omega = S F F S F dots.c S = (p q q p q dots.c p) $
|
|
]
|
|
|
|
So $P(x = 0) = P(F F F dots.c F) = q dot q dot dots.c dot q = q^n$.
|
|
|
|
And
|
|
$
|
|
P(X = 1) = P(S F F dots.c F) + P(F S F F dots.c F) + dots.c + P(F F F dots.c F S) \
|
|
= underbrace(n, "possible outcomes") dot p^1 dot p^(n-1) \
|
|
= vec(n, 1) dot p^1 dot p^(n-1) \
|
|
= n dot p^1 dot p^(n-1)
|
|
$
|
|
|
|
Now we can generalize
|
|
|
|
$
|
|
P(X = 2) = vec(n,2) p^2 q^(n-2)
|
|
$
|
|
|
|
How about all successes?
|
|
|
|
$
|
|
P(X = n) = P(S S dots.c S) = p^n
|
|
$
|
|
|
|
We see that for all failures we have $q^n$ and all successes we have $p^n$.
|
|
Otherwise we use our method above.
|
|
|
|
In general, here is the probability mass function for the binomial random variable
|
|
|
|
$
|
|
P(X = k) = vec(n, k) p^k q^(n-k), "for" k = 0,1,2,...,n
|
|
$
|
|
|
|
|
|
Binomial distribution is very powerful. Choosing between two things, what are the probabilities?
|
|
|
|
To summarize the characterization of the binomial random variable:
|
|
|
|
- $n$ independent trials
|
|
- each trial results in binary success or failure
|
|
- with probability of success $p$, identically across trials
|
|
|
|
with $X = hash$ successes in *fixed* $n$ trials.
|
|
|
|
$ X ~ "Bin"(n,p) $
|
|
|
|
with probability mass function
|
|
|
|
$
|
|
P(X = x) = vec(n,x) p^x (1 - p)^(n-x) = p(x) "for" x = 0,1,2,...,n
|
|
$
|
|
|
|
We see this is in fact the binomial theorem!
|
|
|
|
$
|
|
p(x) >= 0, sum^n_(x=0) p(x) = sum^n_(x=0) vec(n,x) p^x q^(n-x) = (p + q)^n
|
|
$
|
|
|
|
In fact,
|
|
$
|
|
(p + q)^n = (p + (1 - p))^n = 1
|
|
$
|
|
|
|
#example[
|
|
Family 5 children, what is the probability that number of males = 2 if we
|
|
assume births are independent and probability of a male is 0.5.
|
|
|
|
First we check binomial criteria: $n$ independent trials, well formed
|
|
$S$/$F$, probability the same across trials. Let's say male is $S$ and
|
|
otherwise $F$.
|
|
|
|
We have $n=5$ and $p = 0.5$. We just need $P(X = 2)$.
|
|
|
|
$
|
|
P(X = 2) = vec(5,2) (0.5)^2 (0.5)^3 \
|
|
= (5 dot 4) / (2 dot 1) (1 / 2)^5 = 10 / 32
|
|
$
|
|
]
|
|
|
|
#example[
|
|
What is the probability of getting exactly three aces (1's) out of 10 throws
|
|
of a fair die?
|
|
|
|
Seems a little trickier but we can still write this as well defined $S$/$F$.
|
|
Let $S$ be getting an ace and $F$ being anything else.
|
|
|
|
Then $p = 1/6$ and $n = 10$. We want $P(X=3)$. So
|
|
|
|
$
|
|
P(X=3) = vec(10,3) p^3 q^7 = vec(10,3) (1 / 6)^3 (5 / 6)^7 \
|
|
approx 0.15505
|
|
$
|
|
]
|
|
|
|
#example[
|
|
Suppose we have two types of candy, red and black. Select $n$ candies. Let $X$
|
|
be the number of red candies among $n$ selected.
|
|
|
|
2 cases.
|
|
|
|
- case 1: with replacement: Binomial Distribution, $n$, $p = a/(a + b)$.
|
|
$ P(X = 2) = vec(n,2) (a / (a+b))^2 (b / (a+b))^(n-2) $
|
|
- case 2: without replacement: then use counting
|
|
$ P(X = x) = (vec(a,x) vec(b,n-x)) / vec(a+b,n) = p(x) $
|
|
]
|
|
|
|
We've done case 2 before, but now we introduce a random variable to represent
|
|
it.
|
|
|
|
$ P(X = x) = (vec(a,x) vec(b,n-x)) / vec(a+b,n) = p(x) $
|
|
|
|
is known as a *Hypergeometric distribution*.
|
|
|
|
== Hypergeometric distribution
|
|
|
|
There are different characterizations of the parameters, but
|
|
|
|
$ X ~ "Hypergeom"(hash "total", hash "successes", "sample size") $
|
|
|
|
For example,
|
|
$ X ~ "Hypergeom"(N, a, n) "where" N = a+b $
|
|
|
|
In the textbook, it's
|
|
$ X ~ "Hypergeom"(N, N_a, n) $
|
|
|
|
#remark[
|
|
If $x$ is very small relative to $a + b$, then both cases give similar (approx.
|
|
the same) answers.
|
|
]
|
|
|
|
For instance, if we're sampling for blood types from UCSB, and we take a
|
|
student out without replacement, we don't really change the sample size
|
|
substantially. So both answers give a similar result.
|
|
|
|
Suppose we have two types of items, type $A$ and type $B$. Let $N_A$ be $hash$
|
|
type $A$, $N_B$ $hash$ type $B$. $N = N_A + N_B$ is the total number of
|
|
objects.
|
|
|
|
We sample $n$ items *without replacement* ($n <= N$) with order not mattering.
|
|
Denote by $X$ the number of type $A$ objects in our sample.
|
|
|
|
#definition[
|
|
Let $0 <= N_A <= N$ and $1 <= n <= N$ be integers. A random variable $X$ has the *hypergeometric distribution* with parameters $(N, N_A, n)$ if $X$ takes values in the set ${0,1,...,n}$ and has p.m.f.
|
|
|
|
$ P(X = k) = (vec(N_A,k) vec(N-N_A,n-k)) / vec(N,n) = p(k) $
|
|
]
|
|
|
|
#example[
|
|
Let $N_A = 10$ defectives. Let $N_B = 90$ non-defectives. We select $n=5$ without replacement. What is the probability that 2 of the 5 selected are defective?
|
|
|
|
$
|
|
X ~ "Hypergeom" (N = 100, N_A = 10, n = 5)
|
|
$
|
|
|
|
We want $P(X=2)$.
|
|
|
|
$
|
|
P(X=2) = (vec(10,2) vec(90,3)) / vec(100,5) approx 0.0702
|
|
$
|
|
]
|
|
|
|
#remark[
|
|
Make sure you can distinguish when a problem is binomial or when it is
|
|
hypergeometric. This is very important on exams.
|
|
|
|
Recall that both ask about number of successes, in a fixed number of trials.
|
|
But binomial is sample with replacement (each trial is independent) and
|
|
sampling without replacement is hypergeometric.
|
|
]
|
|
|
|
#example[
|
|
Cat gives birth to 6 kittens. 2 are male, 4 are females. Your neighbor comes and picks up 3 kittens randomly to take home with them.
|
|
|
|
How to define random variable? What is p.m.f.?
|
|
|
|
Let $X$ be the number of male cats in the neighbor's selection.
|
|
|
|
$ X ~ "Hypergeom"(N = 6, N_A = 2, n = 3) $
|
|
and $X$ takes values in ${0,1,2}$. Find the p.m.f. by finding probabilities for these values.
|
|
|
|
$
|
|
&P(X = 0) = (vec(2,0) vec(4,3)) / vec(6,3) = 4 / 20 \
|
|
&P(X = 1) = (vec(2,1) vec(4,2)) / vec(6,3) = 12 / 20 \
|
|
&P(X = 2) = (vec(2,2) vec(4,1)) / vec(6,3) = 4 / 20 \
|
|
&P(X = 3) = (vec(2,3) vec(4,0)) / vec(6,3) = 0
|
|
$
|
|
|
|
Note that for $P(X=3)$, we are asking for 3 successes (drawing males) where
|
|
there are only 2 males, so it must be 0.
|
|
]
|
|
|
|
== Geometric distribution
|
|
|
|
Consider an infinite sequence of independent trials. e.g. number of attmepts until I make a basket.
|
|
|
|
Let $X_i$ denote the outcome of the $i^"th"$ trial, where success is 1 and failure is 0. Let $N$ be the number of trials needed to observe the first success in a sequence of independent trials with probabilty of success $p$. Then
|
|
|
|
We fail $k-1$ times and succeed on the $k^"th"$ try. Then:
|
|
|
|
$
|
|
P(N = k) = P(X_1 = 0, X_2 = 0, ..., X_(k-1) = 0, X_k = 1) = (1 - p)^(k-1) p
|
|
$
|
|
|
|
This is the probability of failures raised to the amount of failures, times
|
|
probability of success.
|
|
|
|
The key characteristic in these trials, we keep going until we succeed. There's
|
|
no $n$ choose $k$ in front like the binomial distribution because there's
|
|
exactly one sequence that gives us success.
|
|
|
|
#definition[
|
|
Let $0 < p <= 1$. A random variable $X$ has the geometric distribution with
|
|
success parameter $p$ if the possible values of $X$ are ${1,2,3,...}$ and $X$
|
|
satisfies
|
|
|
|
$
|
|
P(X=k) = (1-p)^(k-1) p
|
|
$
|
|
|
|
for positive integers $k$. Abbreviate this by $X ~ "Geom"(p)$.
|
|
]
|
|
|
|
#example[
|
|
What is the probability it takes more than seven rolls of a fair die to roll a
|
|
six?
|
|
|
|
Let $X$ be the number of rolls of a fair die until the first six. Then $X ~
|
|
"Geom"(1/6)$. Now we just want $P(X > 7)$.
|
|
|
|
$
|
|
P(X > 7) = sum^infinity_(k=8) P(X=k) = sum^infinity_(k=8) (5 / 6)^(k-1) 1 / 6
|
|
$
|
|
|
|
Re-indexing,
|
|
|
|
$
|
|
sum^infinity_(k=8) (5 / 6)^(k-1) 1 / 6 = 1 / 6 (5 / 6)^7 sum^infinity_(j=0) (5 / 6)^j
|
|
$
|
|
|
|
Now we calculate by standard methods:
|
|
|
|
$
|
|
1 / 6 (5 / 6)^7 sum^infinity_(j=0) (5 / 6)^j = 1 / 6 (5 / 6)^7 dot 1 / (1-5 / 6) =
|
|
(5 / 6)^7
|
|
$
|
|
]
|