auto-update(nvim): 2025-03-01 20:23:15
This commit is contained in:
parent
bb0b3da8b7
commit
34b9104855
1 changed files with 135 additions and 17 deletions
|
@ -17,24 +17,8 @@ scribe's, not the instructor's.
|
||||||
|
|
||||||
= Lecture #datetime(day: 6, month: 1, year: 2025).display()
|
= Lecture #datetime(day: 6, month: 1, year: 2025).display()
|
||||||
|
|
||||||
== Preliminaries
|
|
||||||
|
|
||||||
#definition[
|
|
||||||
Statistics is the science dealing with the collection, summarization,
|
|
||||||
analysis, and interpretation of data.
|
|
||||||
]
|
|
||||||
|
|
||||||
== Set theory for dummies
|
== Set theory for dummies
|
||||||
|
|
||||||
A terse introduction to elementary naive set theory and the basic operations
|
|
||||||
upon them.
|
|
||||||
|
|
||||||
#remark[
|
|
||||||
Keep in mind that without $cal(Z F C)$ or another model of set theory that
|
|
||||||
resolves fundamental issues, our set theory is subject to paradoxes like
|
|
||||||
Russell's. Whoops, the universe doesn't exist.
|
|
||||||
]
|
|
||||||
|
|
||||||
#definition[
|
#definition[
|
||||||
A *set* is a collection of elements.
|
A *set* is a collection of elements.
|
||||||
]
|
]
|
||||||
|
@ -2186,7 +2170,76 @@ indicator of where the center of the distribution lies.
|
||||||
|
|
||||||
= President's Day lecture
|
= President's Day lecture
|
||||||
|
|
||||||
...
|
== Quantiles
|
||||||
|
|
||||||
|
#definition[
|
||||||
|
For $p in (0,1)$, the *$p^"th"$ quantile* of a random variable $X$ is any $x in RR$ satisfying
|
||||||
|
$
|
||||||
|
P(X >= x) >= 1 - p "and" P(X <= x) >= p
|
||||||
|
$
|
||||||
|
]
|
||||||
|
|
||||||
|
We see that the median is the $0.5^"th"$ quantile. $p = 0.25$ is called the
|
||||||
|
"first quartile" (Q1). $p = 0.75$ is called the "third quartile" (Q3).
|
||||||
|
|
||||||
|
$Q 3 - Q 1$ is called the $I Q R$, the interquartile range.
|
||||||
|
|
||||||
|
== Variance
|
||||||
|
|
||||||
|
Variance is a measure of spread or _variation_ from the mean. Variance is the
|
||||||
|
*expected squared deviations* about the mean.
|
||||||
|
|
||||||
|
#definition[
|
||||||
|
Let $X$ be a random variable with mean $mu$. The variance of $X$ is given by
|
||||||
|
$
|
||||||
|
"Var"(X) = E[(X-mu)^2] = sigma_X^2
|
||||||
|
$
|
||||||
|
If $X$ is discrete with PMF $p_X(x)$, then the variance is
|
||||||
|
$
|
||||||
|
"Var"(X) = sum_x (x-mu)^2 p_X (x)
|
||||||
|
$
|
||||||
|
If $X$ is continuous with PMF $f_X (x)$, then the variance is
|
||||||
|
$
|
||||||
|
"Var"(X) = integral^infinity_(-infinity) (x-mu)^2 f_X (x) dif x
|
||||||
|
$
|
||||||
|
]
|
||||||
|
|
||||||
|
Variance is the same as the second central moment.
|
||||||
|
|
||||||
|
#fact[
|
||||||
|
$sigma_X = sqrt("Var"(X))$ is the "standard deviation" of $X$.
|
||||||
|
]
|
||||||
|
|
||||||
|
These tell us about how far spread out the points are.
|
||||||
|
|
||||||
|
#example[Fair die][
|
||||||
|
Find the variance for the value of a single roll of a fair die.
|
||||||
|
$
|
||||||
|
sigma_X^2 = "Var"(X) &= E[(X-3.5)^2] \
|
||||||
|
&= sum_("all" x) (x-3.5)^2 dot p_X (x) \
|
||||||
|
&=91 / 6
|
||||||
|
$
|
||||||
|
]
|
||||||
|
|
||||||
|
#example[Continuous $X$][
|
||||||
|
Let $X$ be a continuous RV with PDF $f_X (x) = cases(1 &"for" 0 < x < 1, 0 &"otherwise")$
|
||||||
|
|
||||||
|
Find $E[X]$:
|
||||||
|
$
|
||||||
|
integral_0^1 x dot f_X (x) dif x = 1 / 2
|
||||||
|
$
|
||||||
|
|
||||||
|
Find $"Var"(X)$:
|
||||||
|
$
|
||||||
|
E[(X- 1 / 2)^2] &= integral_0^1 (x- 1 / 2)^2 dot f_X (x) dif x \
|
||||||
|
&= 1 / 12
|
||||||
|
$
|
||||||
|
]
|
||||||
|
|
||||||
|
An easier formulation of variance is given by
|
||||||
|
$
|
||||||
|
"Var"(X) equiv E[(X-mu)^2] = E[X^2] - mu^2
|
||||||
|
$
|
||||||
|
|
||||||
= Lecture #datetime(day: 19, month: 2, year: 2025).display()
|
= Lecture #datetime(day: 19, month: 2, year: 2025).display()
|
||||||
|
|
||||||
|
@ -2462,3 +2515,68 @@ $
|
||||||
p_(X_1,X_2,X_n) (k_1,k_2,...,k_n) >= 0
|
p_(X_1,X_2,X_n) (k_1,k_2,...,k_n) >= 0
|
||||||
$
|
$
|
||||||
]
|
]
|
||||||
|
|
||||||
|
= Joint distributions
|
||||||
|
|
||||||
|
== Introduction
|
||||||
|
|
||||||
|
Looking at 2 or more random variables at the same time. Treat $n$ random
|
||||||
|
variables as the coordinates of an $n$ dimensional *random vector*. In fact, like how a random variable is a function from $Omega -> RR$, the joint random vector is a vector-valued function
|
||||||
|
$
|
||||||
|
vec(x,y) : Omega -> RR^2
|
||||||
|
$
|
||||||
|
The probability distribution of $(X_1,X_2,...,X_n)$ is now represented by
|
||||||
|
$
|
||||||
|
P((X_1,X_2,...,X_n) in B)
|
||||||
|
$
|
||||||
|
where $B$ are subsets of $RR^n$. The probability distribution of the random
|
||||||
|
vector is the *joint distribution*. The probability distribution of individual
|
||||||
|
coordinates $X_j$ are *marginal distributions*.
|
||||||
|
|
||||||
|
== Discrete joint distributions
|
||||||
|
|
||||||
|
Let $X$ and $Y$ both be discrete random variables defined on a common $Omega$. Then the joint PMF is given by
|
||||||
|
$
|
||||||
|
P(X=x, Y=y) equiv p_(X,Y) (x,y)
|
||||||
|
$
|
||||||
|
with the property that
|
||||||
|
$
|
||||||
|
sum_("all" x) sum_("all" y) p_(X,Y) (x,y) = 1
|
||||||
|
$
|
||||||
|
|
||||||
|
#definition[
|
||||||
|
Let $X_1,X_2,...,X_n$ be discrete random variables defined on a common $Omega$, then their *joint probability mass function* is given by:
|
||||||
|
$
|
||||||
|
p(k_1,k_2,...,k_n) = P(X_1 = k_1, X_2 = k_2, ..., X_n = k_n)
|
||||||
|
$
|
||||||
|
for all possible values $k_1,k_2,...,k_n$ of $X_1,X_2,...,X_n$.
|
||||||
|
]
|
||||||
|
|
||||||
|
#fact[
|
||||||
|
The joint probability in set notation looks like
|
||||||
|
$
|
||||||
|
P(X_1 = k_1, X_2 = k_2, ..., X_n = k_n) = P({X_1=k_1} sect {X_2 = k_2} sect dots.c sect {X_n=k_n})
|
||||||
|
$
|
||||||
|
The joint PDF has the same properties as the PDF for the single random variable, namely
|
||||||
|
$
|
||||||
|
p_(X_1,X_2,...,X_n) (k_1,k_2,...,k_n) >= 0 \
|
||||||
|
sum_(k_1,k_2,...,k_n) p_(X_1,X_2,...,X_n) (k_1,k_2,...,k_n) = 1
|
||||||
|
$
|
||||||
|
]
|
||||||
|
|
||||||
|
#fact[
|
||||||
|
Let $g : RR^n -> RR$ be a real-valued function on an $n$-vector. If $X_1,X_2,...,X_n$ are discrete random variables with joint PMF $p$ then
|
||||||
|
$
|
||||||
|
E[g(X_1,X_2,...,X_n)] = sum_(k_1,k_2,...,k_n) g(k_1,k_2,...,k_n) p(k_1,k_2,...,k_n)
|
||||||
|
$
|
||||||
|
provided the sum is well defined.
|
||||||
|
]
|
||||||
|
|
||||||
|
#example[
|
||||||
|
Flip a fair coin three times. Let $X$ be the number of tails in the first flip and $Y$ a total number of tails observed from all flips. Then the support of each variable is $S_X = {0,1}$ and $S_Y = {0,1,2,3}$.
|
||||||
|
|
||||||
|
1. Find the joint PMF of $(X,Y)$, $p_(X,Y) (x,y)$.
|
||||||
|
Just record the probability of the respective events.
|
||||||
|
|
||||||
|
For example, the probability $X$ is 0 and $Y$ is 1 is $p_(X,Y) (0,1)$ is $2/8$.
|
||||||
|
]
|
||||||
|
|
Loading…
Reference in a new issue