From 503af186234b3e88595ac6b5e6e5a6066cdea8cc Mon Sep 17 00:00:00 2001
From: Youwen Wu <youwenw@gmail.com>
Date: Sun, 19 Jan 2025 02:07:58 -0800
Subject: [PATCH] auto-update(nvim): 2025-01-19 02:07:58

---
 .../pstat-120a/course-notes/main.typ          | 158 +++++++++++++++++-
 documents/by-course/pstat-120a/hw1/main.typ   |  23 +++
 2 files changed, 173 insertions(+), 8 deletions(-)

diff --git a/documents/by-course/pstat-120a/course-notes/main.typ b/documents/by-course/pstat-120a/course-notes/main.typ
index ca78488..c07b837 100644
--- a/documents/by-course/pstat-120a/course-notes/main.typ
+++ b/documents/by-course/pstat-120a/course-notes/main.typ
@@ -1,7 +1,7 @@
-#import "./dvd.typ": *
+#import "@youwen/zen:0.1.0": *
 #import "@preview/ctheorems:1.1.3": *
 
-#show: dvdtyp.with(
+#show: zen.with(
   title: "PSTAT120A Course Notes",
   author: "Youwen Wu",
   date: "Winter 2025",
@@ -10,6 +10,13 @@
 
 #outline()
 
+= Introduction
+
+PSTAT 120A is an introductory course on probability and statistics. However, it
+is a theoretical course rather an applied statistics course. You will not learn
+how to read or conduct real-world statistical studies. Leave your $p$-values at
+home, this ain't your momma's AP Stats.
+
 = Lecture #datetime(day: 6, month: 1, year: 2025).display()
 
 == Preliminaries
@@ -237,6 +244,12 @@ Requires equally likely outcomes and finite sample spaces.
 
 == Relative frequency approach
 
+An approach done commonly by applied statisticians who work in the disgusting
+real world. This is where we are generally concerned with irrelevant concerns
+like accurate sampling and $p$-values and such. I am told this is covered in
+PSTAT 120B, so hopefully I can avoid ever taking that class (as a pure math
+major).
+
 $
   P(A) = (hash "of times" A "occurs in large number of trials") / (hash "of trials")
 $
@@ -252,14 +265,26 @@ its parlance to lend credibility to subjective judgements of confidence.
 
 == Axiomatic approach
 
-Our focus in PSTAT 120A. It seems rather silly to call this approach axiomatic
-given we are essentially just defining a function with a few given properties
-and deriving theorems from it while working atop our pre-existing (shaky,
-non-rigorous) "axioms" of set theory, but this is the terminology that the
-course uses.
+Consider a random experiment. Then:
 
 #definition[
-  Let $P : X -> RR$ be a function satisfying the following axioms (properties).
+  The *sample space* $Omega$ is the set of all possible outcomes of the
+  experiment.
+]
+
+#definition[
+  Elements of $Omega$ are called *sample points*.
+]
+
+#definition[
+  Subsets of $Omega$ are called *events*. The collection of events (in other
+  terms, the power set of $Omega$) in $Omega$ is denoted by $cal(F)$.
+]
+
+#definition[
+  The *probability measure*, or probability distribution, or simply probability s a function $P$.
+
+  Let $P : cal(F) -> RR$ be a function satisfying the following axioms (properties).
 
   + $P(A) >= 0, forall A$
   + $P(Omega) = 1$
@@ -267,6 +292,15 @@ course uses.
     $ P(union.big_(i=1)^infinity A_i) = sum_(i=1)^infinity P(A_i) $
 ]
 
+The 3-tuple $(Omega, cal(F), P)$ is called a *probability space*.
+
+#remark[
+  In more advanced texts you will see $Omega$ introduced as a so-called
+  $sigma$-algebra. A $sigma$-algebra on a set $Omega$ is a nonempty collection
+  $Sigma$ of subsets of $Omega$ that is closed under set complement, countable
+  unions, and as a corollary, countable intersections.
+]
+
 Now let us show various results with $P$.
 
 #proposition[
@@ -450,3 +484,111 @@ Properties of the #smallcaps[pdf]:
 #example[
   Waiting time for bus: $Omega = {s : s >= 0}$.
 ]
+
+= Notes on counting
+
+The cardinality of $A$ is given by $hash A$. Let us develop methods for finding
+$hash A$ from a description of the set $A$ (in other words, methods for
+counting).
+
+== General multiplication principle
+
+#fact[
+  Let $A$ and $B$ be finite sets, $k in ZZ^+$. Then let $f : A -> B$ be a
+  function such that each element in $B$ is the image of exactly $k$ elements
+  in $A$ (such a function is called _$k$-to-one_). Then $hash A = k dot hash
+  B$.
+]<ktoone>
+
+#example[
+  Four fully loaded 10-seater vans transported people to the picnic. How many
+  people were transported?
+
+  By @ktoone, we have $A$ is the set of people, $B$ is the set of vans, $f : A -> B$ maps a person to the van they ride in. So $f$ is a 10-to-one function, $hash A = 40$, $hash B = 4$, and clearly the answer is $10 dot 4 = 40$.
+]
+
+#definition[
+  An $n$-tuple is an ordered sequence of $n$ elements.
+]
+
+Many of our methods in probability rely on multiplying together multiple
+outcomes to obtain their combined amount of outcomes. We make this explicit below in @tuplemultiplication.
+
+#fact[
+  Suppose a set of $n$-tuples $(a_1, ..., a_n)$ obeys these rules:
+
+  + There are $r_1$ choices for the first entry $a_1$.
+  + Once the first $k$ entries $a_1, ..., a_k$ have been chosen, the number of alternatives for the next entry $a_(k+1)$ is $r_(k+1)$, regardless of the previous choices.
+
+  Then the total number of $n$-tuples is the product $r_1 dot r_2 dot r_2 dot dots dot r_n$.
+]<tuplemultiplication>
+
+#proof[
+  It is trivially true for $n = 1$ since you have $r_1$ choices of $a_1$ for a
+  1-tuple $(a_1)$.
+
+  Let $A$ be the set of all possible $n$-tuples and $B$ be the set of all
+  possible $(n+1)$-tuples. Now let us assume the statement is true for $A$.
+  Proceed by induction on $B$, noting that for each $n$-tuple in $A$, $(a_1,
+  ..., a_n)$, we have $r_(n+1)$ tuples in $A$.
+
+  Let $f : B -> A$ be a function which takes each $(n+1)$-tuple and truncates the $a_(n+1)$ term, leaving us with just an $n$-tuple of the form $(a_1, a_2, ..., a_n)$.
+  $ f((a_1, ..., a_n, a_(n + 1))) = (a_1, ..., a_n) $
+  Now notice that $f$ is precisely a $r_(n+1)$-to-one function! Recall by
+  our assumption that @tuplemultiplication is true for $n$-tuples, so $A$ has $r_1 dot
+  r_2 dot ... dot r_n$ elements, or $hash A = r_1 dot ... dot r_n$. Then by
+  @ktoone, we have $hash B = hash A dot r_(n+1) = r_1 dot r_2 dot
+  ... dot r_(n+1)$. Our induction is complete and we have proved @tuplemultiplication.
+]
+
+@tuplemultiplication is sometimes called the _general multiplication principle_.
+
+We can use @tuplemultiplication to derive counting formulas for various
+situations. Let $A_1, A_2, A_n$ be finite sets. Then as a corollary of
+@tuplemultiplication, we can count the number of $n$-tuples in a finite
+Cartesian product of $A_1, A_2, A_n$.
+
+#fact[
+  Let $A_1, A_2, A_n$ be finite sets. Then
+
+  $
+    hash (A_1 times A_2 times ... times, A_n) = (hash A_1) dot (hash A_2) dot ... dot (hash A_n) = Pi^n_(i=1) (hash A_i)
+  $
+]
+
+#example[
+  How many distinct subsets does a set of size $n$ have?
+
+  The answer is $2^n$. Each subset can be encoded as an $n$-tuple with entries 0
+  or 1, where the $i$th entry is 1 if the $i$th element of the set is in the
+  subset and 0 if it is not.
+
+  Thus the number of subsets is the same as the cardinality of
+  $ {0,1} times ... times {0,1} = {0,1}^n $
+  which is $2^n$.
+
+  This is why given a set $X$ with cardinality $aleph$, we write the
+  cardinality of the power set of $X$ as $2^aleph$.
+]
+
+== Permutations
+
+Now we can use the multiplication principle to count permutations.
+
+#fact[
+  Consider all $k$-tuples $(a_1, ..., a_k)$ that can be constructed from a set $A$ of size $n, n>= k$ without repetition. The total number of these $k$-tuples is
+  $ (n)_k = n dot (n - 1) ... (n - k + 1) = n! / (n-k)! $
+
+  In particular, with $k=n$, each $n$-tuple is an ordering or _permutation_ of $A$. So the total number of permutations of a set of $n$ elements is $n!$.
+]
+
+#proof[
+  We construct the $k$-tuples sequentially. For the first element, we choose
+  one element from $A$ with $n$ alternatives. The next element has $n - 1$
+  alternatives. In general, after $j$ elements are chosen, there are $n - j +
+  1$ alternatives.
+
+  Then clearly after choosing $k$ elements for our $k$-tuple we have by
+  @tuplemultiplication the number of $k$-tuples being $n dot (n - 1) dot ...
+  dot (n - k + 1) = (n)_k$.
+]
diff --git a/documents/by-course/pstat-120a/hw1/main.typ b/documents/by-course/pstat-120a/hw1/main.typ
index 14f867d..57c6e6c 100644
--- a/documents/by-course/pstat-120a/hw1/main.typ
+++ b/documents/by-course/pstat-120a/hw1/main.typ
@@ -73,3 +73,26 @@
         $ {{x_1, x_2, x_3, x_4} : x_i >= 0, i = 1,...,6 sum_(j=1)^4 x_j = 6} $
       ]
   ]
+
++ #[
+    #set enum(numbering: "a)", spacing: 2em)
+
+    + #[
+        We want to determine how many ways to choose 8 people from 27 people, or $vec(27,8) = 2220075$.
+      ]
+    + #[
+        This is the same as the choosing 4 of the 12 men and 4 of the 15 women, and pairing each group of men with each group of women once. So,
+        $ vec(12,4) times vec(15, 4) = 675675 $
+      ]
+    + #[
+        First we determine the amount of ways to choose less than 2 women.
+
+        $ vec(15, 0) vec(12, 8) + vec(15, 1) times vec(12,7) $
+        Then the total amount of ways to choose 8 people, from part a, is $vec(27,8)$.
+
+        Then the chance of forming a committee with less than 2 women is
+        $ (vec(15, 0) vec(12, 8) + vec(15, 1) vec(12,7)) / vec(27,8) $
+        So our final answer is
+        $ 1 - (vec(15, 0) vec(12, 8) + vec(15, 1) vec(12,7)) / vec(27,8) $
+      ]
+  ]