From 34b9104855adb83b1c76a8cad4477f47822bc280 Mon Sep 17 00:00:00 2001 From: Youwen Wu Date: Sat, 1 Mar 2025 20:23:15 -0800 Subject: [PATCH] auto-update(nvim): 2025-03-01 20:23:15 --- .../pstat-120a/course-notes/main.typ | 152 ++++++++++++++++-- 1 file changed, 135 insertions(+), 17 deletions(-) diff --git a/documents/by-course/pstat-120a/course-notes/main.typ b/documents/by-course/pstat-120a/course-notes/main.typ index c1770a2..3181015 100644 --- a/documents/by-course/pstat-120a/course-notes/main.typ +++ b/documents/by-course/pstat-120a/course-notes/main.typ @@ -17,24 +17,8 @@ scribe's, not the instructor's. = Lecture #datetime(day: 6, month: 1, year: 2025).display() -== Preliminaries - -#definition[ - Statistics is the science dealing with the collection, summarization, - analysis, and interpretation of data. -] - == Set theory for dummies -A terse introduction to elementary naive set theory and the basic operations -upon them. - -#remark[ - Keep in mind that without $cal(Z F C)$ or another model of set theory that - resolves fundamental issues, our set theory is subject to paradoxes like - Russell's. Whoops, the universe doesn't exist. -] - #definition[ A *set* is a collection of elements. ] @@ -2186,7 +2170,76 @@ indicator of where the center of the distribution lies. = President's Day lecture -... +== Quantiles + +#definition[ + For $p in (0,1)$, the *$p^"th"$ quantile* of a random variable $X$ is any $x in RR$ satisfying + $ + P(X >= x) >= 1 - p "and" P(X <= x) >= p + $ +] + +We see that the median is the $0.5^"th"$ quantile. $p = 0.25$ is called the +"first quartile" (Q1). $p = 0.75$ is called the "third quartile" (Q3). + +$Q 3 - Q 1$ is called the $I Q R$, the interquartile range. + +== Variance + +Variance is a measure of spread or _variation_ from the mean. Variance is the +*expected squared deviations* about the mean. + +#definition[ + Let $X$ be a random variable with mean $mu$. The variance of $X$ is given by + $ + "Var"(X) = E[(X-mu)^2] = sigma_X^2 + $ + If $X$ is discrete with PMF $p_X(x)$, then the variance is + $ + "Var"(X) = sum_x (x-mu)^2 p_X (x) + $ + If $X$ is continuous with PMF $f_X (x)$, then the variance is + $ + "Var"(X) = integral^infinity_(-infinity) (x-mu)^2 f_X (x) dif x + $ +] + +Variance is the same as the second central moment. + +#fact[ + $sigma_X = sqrt("Var"(X))$ is the "standard deviation" of $X$. +] + +These tell us about how far spread out the points are. + +#example[Fair die][ + Find the variance for the value of a single roll of a fair die. + $ + sigma_X^2 = "Var"(X) &= E[(X-3.5)^2] \ + &= sum_("all" x) (x-3.5)^2 dot p_X (x) \ + &=91 / 6 + $ +] + +#example[Continuous $X$][ + Let $X$ be a continuous RV with PDF $f_X (x) = cases(1 &"for" 0 < x < 1, 0 &"otherwise")$ + + Find $E[X]$: + $ + integral_0^1 x dot f_X (x) dif x = 1 / 2 + $ + + Find $"Var"(X)$: + $ + E[(X- 1 / 2)^2] &= integral_0^1 (x- 1 / 2)^2 dot f_X (x) dif x \ + &= 1 / 12 + $ +] + +An easier formulation of variance is given by +$ + "Var"(X) equiv E[(X-mu)^2] = E[X^2] - mu^2 +$ = Lecture #datetime(day: 19, month: 2, year: 2025).display() @@ -2462,3 +2515,68 @@ $ p_(X_1,X_2,X_n) (k_1,k_2,...,k_n) >= 0 $ ] + += Joint distributions + +== Introduction + +Looking at 2 or more random variables at the same time. Treat $n$ random +variables as the coordinates of an $n$ dimensional *random vector*. In fact, like how a random variable is a function from $Omega -> RR$, the joint random vector is a vector-valued function +$ + vec(x,y) : Omega -> RR^2 +$ +The probability distribution of $(X_1,X_2,...,X_n)$ is now represented by +$ + P((X_1,X_2,...,X_n) in B) +$ +where $B$ are subsets of $RR^n$. The probability distribution of the random +vector is the *joint distribution*. The probability distribution of individual +coordinates $X_j$ are *marginal distributions*. + +== Discrete joint distributions + +Let $X$ and $Y$ both be discrete random variables defined on a common $Omega$. Then the joint PMF is given by +$ + P(X=x, Y=y) equiv p_(X,Y) (x,y) +$ +with the property that +$ + sum_("all" x) sum_("all" y) p_(X,Y) (x,y) = 1 +$ + +#definition[ + Let $X_1,X_2,...,X_n$ be discrete random variables defined on a common $Omega$, then their *joint probability mass function* is given by: + $ + p(k_1,k_2,...,k_n) = P(X_1 = k_1, X_2 = k_2, ..., X_n = k_n) + $ + for all possible values $k_1,k_2,...,k_n$ of $X_1,X_2,...,X_n$. +] + +#fact[ + The joint probability in set notation looks like + $ + P(X_1 = k_1, X_2 = k_2, ..., X_n = k_n) = P({X_1=k_1} sect {X_2 = k_2} sect dots.c sect {X_n=k_n}) + $ + The joint PDF has the same properties as the PDF for the single random variable, namely + $ + p_(X_1,X_2,...,X_n) (k_1,k_2,...,k_n) >= 0 \ + sum_(k_1,k_2,...,k_n) p_(X_1,X_2,...,X_n) (k_1,k_2,...,k_n) = 1 + $ +] + +#fact[ + Let $g : RR^n -> RR$ be a real-valued function on an $n$-vector. If $X_1,X_2,...,X_n$ are discrete random variables with joint PMF $p$ then + $ + E[g(X_1,X_2,...,X_n)] = sum_(k_1,k_2,...,k_n) g(k_1,k_2,...,k_n) p(k_1,k_2,...,k_n) + $ + provided the sum is well defined. +] + +#example[ + Flip a fair coin three times. Let $X$ be the number of tails in the first flip and $Y$ a total number of tails observed from all flips. Then the support of each variable is $S_X = {0,1}$ and $S_Y = {0,1,2,3}$. + + 1. Find the joint PMF of $(X,Y)$, $p_(X,Y) (x,y)$. + Just record the probability of the respective events. + + For example, the probability $X$ is 0 and $Y$ is 1 is $p_(X,Y) (0,1)$ is $2/8$. +]