From cb17974289b4ad2d487f6f18f9477268a5d9dc6e Mon Sep 17 00:00:00 2001 From: Youwen Wu Date: Mon, 10 Feb 2025 03:00:55 -0800 Subject: [PATCH] auto-update(nvim): 2025-02-10 03:00:55 --- .../pstat-120a/course-notes/main.typ | 245 +++++++++++++++++- 1 file changed, 239 insertions(+), 6 deletions(-) diff --git a/documents/by-course/pstat-120a/course-notes/main.typ b/documents/by-course/pstat-120a/course-notes/main.typ index 006a512..b6c7dca 100644 --- a/documents/by-course/pstat-120a/course-notes/main.typ +++ b/documents/by-course/pstat-120a/course-notes/main.typ @@ -12,10 +12,7 @@ = Introduction -PSTAT 120A is an introductory course on probability and statistics. However, it -is a theoretical course rather an applied statistics course. You will not learn -how to read or conduct real-world statistical studies. Leave your $p$-values at -home, this ain't your momma's AP Stats. +PSTAT 120A is an introductory course on probability and statistics with an emphasis on theory. = Lecture #datetime(day: 6, month: 1, year: 2025).display() @@ -772,8 +769,6 @@ us generalize to more than two colors. Both approaches given the same answer. ] -= Discussion section #datetime(day: 22, month: 1, year: 2025).display() - = Lecture #datetime(day: 23, month: 1, year: 2025).display() == Independence @@ -1144,6 +1139,244 @@ exactly one sequence that gives us success. $ ] += Notes on textbook chapter 3 + +Recall that a random variable $X$ is a function $X : Omega -> RR$ that gives +the probability of an event $omega in Omega$. The _probability distribution_ of +$X$ gives its important probabilistic information. The probability distribution +is a description of the probabilities $P(X in B)$ for subsets $B in RR$. We +describe the probability density function and the cumulative distribution +function. + +A random variable $X$ is discrete if there is countable $A$ such that $P(X in +A) = 1$. $k$ is a possible value if $P(X = k) > 0$. + +A discrete random variable has probability distribution entirely determined by +p.m.f $p(k) = P(X = k)$. The p.m.f. is a function from the set of possible +values of $X$ into $[0,1]$. Labeling the p.m.f. with the random variable is +done by $p_X (k)$. + +By the axioms of probability, + +$ + sum_k p_X (k) = sum_k P(X=k) = 1 +$ + +For a subset $B subset RR$, + +$ + P(X in B) = sum_(k in B) p_X (k) +$ + +Now we introduce another major class of random variables. + +#definition[ + Let $X$ be a random variable. If $f$ satisfies + + $ + P(X <= b) = integral^b_(-infinity) f(x) dif x + $ + + for all $b in RR$, then $f$ is the *probability density function* of $X$. +] + +The probability that $X in (-infinity, b]$ is equal to the area under the graph +of $f$ from $-infinity$ to $b$. + +A corollary is the following. + +#fact[ + $ P(X in B) = integral_B f(x) dif x $ +] + +for any $B subset RR$ where integration makes sense. + +The set can be bounded or unbounded, or any collection of intervals. + +#fact[ + $ P(a <= X <= b) = integral_a^b f(x) dif x $ + $ P(X > a) = integral_a^infinity f(x) dif x $ +] + +#fact[ + If a random variable $X$ has density function $f$ then individual point + values have probability zero: + + $ P(X = c) = integral_c^c f(x) dif x = 0, forall c in RR $ +] + +#remark[ + It follows a random variable with a density function is not discrete. Also + the probabilities of intervals are not changed by including or excluding + endpoints. +] + +How to determine which functions are p.d.f.s? Since $P(-infinity < X < +infinity) = 1$, a p.d.f. $f$ must satisfy + +$ + f(x) >= 0 forall x in RR \ + integral^infinity_(-infinity) f(x) dif x = 1 +$ + +#fact[ + Random variables with density functions are called _continuous_ random + variables. This does not imply that the random variable is a continuous + function on $Omega$ but it is standard terminology. +] + +#definition[ + Let $[a,b]$ be a bounded interval on the real line. A random variable $X$ has + the *uniform distribution* on $[a,b]$ if $X$ has density function + + $ + f(x) = cases( + 1/(b-a)", if" x in [a,b], + 0", if" x in.not [a,b] + ) + $ + + Abbreviate this by $X ~ "Unif"[a,b]$. +] + += Notes on week 3 lecture slides + +== Negative binomial + +Consider a sequence of Bernoulli trials with the following characteristics: + +- Each trial success or failure +- Prob. of success $p$ is same on each trial +- Trials are independent (notice they are not fixed to specific number) +- Experiment continues until $r$ successes are observed, where $r$ is a given parameter + +Then if $X$ is the number of trials necessary until $r$ successes are observed, +we say $X$ is a *negative binomial* random variable. + +#definition[ + Let $k in ZZ^+$ and $0 < p <= 1$. A random variable $X$ has the negative + binomial distribution with parameters ${k,p}$ if the possible values of $X$ + are the integers ${k,k+1, k+2, ...}$ and the p.m.f. is + + $ + P(X = n) = vec(n-1, k-1) p^k (1-p)^(n-k) "for" n >= k + $ + + Abbreviate this by $X ~ "Negbin"(k,p)$. +] + +#example[ + Steph Curry has a three point percentage of approx. $43%$. What is the + probability that Steph makes his third three-point basket on his $5^"th"$ + attempt? + + Let $X$ be number of attempts required to observe the 3rd success. Then, + + $ + X ~ "Negbin"(k = 3, p = 0.43) + $ + + So, + $ + P(X = 5) &= vec(5-1,3-1)(0.43)^3 (1 - 0.43)^(5-3) \ + &= vec(4,2) (0.43)^3 (0.57)^2 \ + &approx 0.155 + $ +] + +== Poisson distribution + +This p.m.f. follows from the Taylor expansion + +$ + e^lambda = sum_(k=0)^infinity lambda^k / k! +$ + +which implies that + +$ + sum_(k=0)^infinity e^(-lambda) lambda^k / k! = e^(-lambda) e^lambda = 1 +$ + +#definition[ + For an integer valued random variable $X$, we say $X ~ "Poisson"(lambda)$ if it has p.m.f. + + $ P(X = k) = e^(lambda) lambda^k / k! $ + + for $k in {0,1,2,...}$ for $lambda > 0$ and + + $ + sum_(k = 0)^infinity P(X=k) = 1 + $ +] + +The Poisson arises from the Binomial. It applies in the binomial context when +$n$ is very large ($n >= 100$) and $p$ is very small $p <= 0.05$, such that $n +p$ is a moderate number ($n p < 10$). + +Then $X$ follows a Poisson distribution with $lambda = n p$. + +$ + P("Bin"(n,p) = k) approx P("Poisson"(lambda = n p) = k) +$ + +for $k = 0,1,...,n$. + +#example[ + The number of typing errors in the page of a textbook. + + Let + + - $n$ be the number of letters of symbols per page (large) + - $p$ be the probability of error, small enough such that + - $lim_(n -> infinity) lim_(p -> 0) n p = lambda = 0.1$ + + What is the probability of exactly 1 error? + + We can approximate the distribution of $X$ with a $"Poisson"(lambda = 0.1)$ + distribution + + $ + P(X = 1) = (e^(-0.1) (0.1)^1) / 1! = 0.09048 + $ +] + +#example[ + The number of reported auto accidents in a big city on any given day + + Let + + - $n$ be the number of autos on the road + - $p$ be the probability of an accident for any individual is small such that + $lim_(n->infinity) lim_(p->0) n p = lambda = 2$ + + What is the probability of no accidents today? + + We can approximate $X$ by $"Poisson"(lambda = 2)$ + + $ + P(X = 0) = (e^(-2) (2)^0) / 0! = 0.1353 + $ +] + +A discrete example: + +#example[ + Suppose we have an election with candidates $B$ and $W$. A total of 10,000 + ballots were cast such that + + $ + 10,000 "votes" cases(5005 space B, 4995 space W) + $ + + But 15 ballots had irregularities and were disqualified. What is the + probability that the election results will change? + + There are three combinations of disqualified ballots that would result in a + different election outcome: 13 $B$ and 2 $W$, 14 $B$ and 1 $W$, and 15 $B$ + and 0 $W$. What is the probability of these? +] + = Lecture #datetime(day: 3, month: 2, year: 2025).display() == CDFs, PMFs, PDFs