commit 6412a0ca6425635311ed2dd603ecd2e8bd8aa69f Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Date: Sun Feb 16 22:01:59 2025 +0000 Deploy to GitHub pages diff --git a/CNAME b/CNAME new file mode 100644 index 0000000..4f7e6a8 --- /dev/null +++ b/CNAME @@ -0,0 +1 @@ +blog.youwen.dev diff --git a/a-haskellian-blog.html b/a-haskellian-blog.html new file mode 100644 index 0000000..cb40b8f --- /dev/null +++ b/a-haskellian-blog.html @@ -0,0 +1,269 @@ + + + + a haskellian blog | The Involution + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+

+ The Involution. +

+
+
+

+ a web-log about computers and math and hacking. +

+ by Youwen Wu + | + + | + +
+
+
+

+ a haskellian blog +

+

+ a purely functional...blog? +

+
2024-05-25
+
+ (last updated: 2024-05-25T12:00:00Z) +
+
+

Welcome! This is the first post on The Involution and also one that tests all +of the features.

+ + + + +
+

A monad is just a monoid in the category of endofunctors, what’s the problem?

+
+

haskell?

+

This entire blog is generated with hakyll. It’s +a library for generating static sites for Haskell, a purely functional +programming language. It’s a library because it doesn’t come with as many +batteries included as tools like Hugo or Astro. You set up most of the site +yourself by calling the library from Haskell.

+

Here’s a brief excerpt:

+
main :: IO ()
+main = hakyllWith config $ do
+    forM_
+        [ "CNAME"
+        , "favicon.ico"
+        , "robots.txt"
+        , "_config.yml"
+        , "images/*"
+        , "out/*"
+        , "fonts/*"
+        ]
+        $ \f -> match f $ do
+            route idRoute
+            compile copyFileCompiler
+

The code highlighting is also generated by hakyll.

+
+

why?

+

Haskell is a purely functional language with no mutable state. Its syntax +actually makes it pretty elegant for declaring routes and “rendering” pipelines.

+
    +
  1. Haskell is cool.
  2. +
  3. It comes with enough features that I don’t feel like I have to build +everything from scratch.
  4. +
  5. It comes with Pandoc, a Haskell library for converting between markdown +formats. It’s probably more powerful than anything you could do in nodejs. +It renders all of the markdown to HTML as well as the math. +
      +
    1. It supports KaTeX as well as MathML. I’m a little disappointed with the +KaTeX though. It doesn’t directly render it, but simply injects the KaTeX +files and renders it client-side.
    2. +
  6. +
+

speaking of math

+

We can have math inline, like so: +ex2dx=π\int_{-\infty}^\infty \, e^{-x^2}\,dx = \sqrt{\pi}. This site ships semantic +MathML math with its HTML, and the MathJax script to the client.

+

It’d be nice if MathML could just be used and supported across all browsers, but +unfortunately we still aren’t quite there yet. Firefox is the only one where +everything looks 80% of the way to LaTeX. On Safari and Chrome, even simple +equations like π\sqrt{\pi} render improperly.

+

Pros of MathML:

+ +

Cons:

+ +

This site has MathJax render all of the math so it looks nice and standardized +across browsers, but the math still displays regardless (like say if MathJax +couldn’t load due to slow network) because of MathML. Best of both worlds.

+

Let’s try it now. Here’s a simple theorem:

+

an+bncn{a,b,c}n3 +a^n + b^n \ne c^n \, \forall\,\left\{ a,\,b,\,c \right\} \in \mathbb{Z} \land n \ge 3 +

+

The proof is trivial and will be left as an exercise to the reader.

+

seems a little overengineered

+

Probably is. Not as much as the old one, though.

+
+ + + + + diff --git a/an-assortment-of-preliminaries-on-linear-algebra.html b/an-assortment-of-preliminaries-on-linear-algebra.html new file mode 100644 index 0000000..9737c35 --- /dev/null +++ b/an-assortment-of-preliminaries-on-linear-algebra.html @@ -0,0 +1,604 @@ + + + + An assortment of preliminaries on linear algebra | The Involution + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+

+ The Involution. +

+
+
+

+ a web-log about computers and math and hacking. +

+ by Youwen Wu + | + + | + +
+
+
+

+ An assortment of preliminaries on linear algebra +

+

+ and also a test for pandoc +

+
2025-02-15
+
+ +
+
+

This entire document was written entirely in Typst and +directly translated to this file by Pandoc. It serves as a proof of concept of +a way to do static site generation from Typst files instead of Markdown.

+
+

I figured I should write this stuff down before I forgot it.

+

Basic Notions

+

Vector spaces

+

Before we can understand vectors, we need to first discuss vector +spaces. Thus far, you have likely encountered vectors primarily in +physics classes, generally in the two-dimensional plane. You may +conceptualize them as arrows in space. For vectors of size >3> 3, a hand +waving argument is made that they are essentially just arrows in higher +dimensional spaces.

+

It is helpful to take a step back from this primitive geometric +understanding of the vector. Let us build up a rigorous idea of vectors +from first principles.

+

Vector axioms

+

The so-called axioms of a vector space (which we’ll call the vector +space VV) are as follows:

+
    +
  1. Commutativity: u+v=v+u, u,vVu + v = v + u,\text{ }\forall u,v \in V

  2. +
  3. Associativity: +(u+v)+w=u+(v+w), u,v,wV(u + v) + w = u + (v + w),\text{ }\forall u,v,w \in V

  4. +
  5. Zero vector: \exists a special vector, denoted 00, such that +v+0=v, vVv + 0 = v,\text{ }\forall v \in V

  6. +
  7. Additive inverse: +vV, wV such that v+w=0\forall v \in V,\text{ }\exists w \in V\text{ such that }v + w = 0. +Such an additive inverse is generally denoted v- v

  8. +
  9. Multiplicative identity: 1v=v, vV1v = v,\text{ }\forall v \in V

  10. +
  11. Multiplicative associativity: +(αβ)v=α(βv) vV, scalars α,β(\alpha\beta)v = \alpha(\beta v)\text{ }\forall v \in V,\text{ scalars }\alpha,\beta

  12. +
  13. Distributive property for vectors: +α(u+v)=αu+αv u,vV, scalars α\alpha(u + v) = \alpha u + \alpha v\text{ }\forall u,v \in V,\text{ scalars }\alpha

  14. +
  15. Distributive property for scalars: +(α+β)v=αv+βv vV, scalars α,β(\alpha + \beta)v = \alpha v + \beta v\text{ }\forall v \in V,\text{ scalars }\alpha,\beta

  16. +
+

It is easy to show that the zero vector 00 and the additive inverse +v- v are unique. We leave the proof of this fact as an exercise.

+

These may seem difficult to memorize, but they are essentially the same +familiar algebraic properties of numbers you know from high school. The +important thing to remember is which operations are valid for what +objects. For example, you cannot add a vector and scalar, as it does not +make sense.

+

Remark. For those of you versed in computer science, you may recognize +this as essentially saying that you must ensure your operations are +type-safe. Adding a vector and scalar is not just “wrong” in the same +sense that 1+1=31 + 1 = 3 is wrong, it is an invalid question entirely +because vectors and scalars and different types of mathematical objects. +See [@chen2024digression] for more.

+

Vectors big and small

+

In order to begin your descent into what mathematicians colloquially +recognize as abstract vapid nonsense, let’s discuss which fields +constitute a vector space. We have the familiar field of \mathbb{R} +where all scalars are real numbers, with corresponding vector spaces +n{\mathbb{R}}^{n}, where nn is the length of the vector. We generally +discuss 2D or 3D vectors, corresponding to vectors of length 2 or 3; in +our case, 2{\mathbb{R}}^{2} and 3{\mathbb{R}}^{3}.

+

However, vectors in n{\mathbb{R}}^{n} can really be of any length. +Vectors can be viewed as arbitrary length lists of numbers (for the +computer science folk: think C++ std::vector).

+

Example. (123456789)9\begin{pmatrix} +1 \\ +2 \\ +3 \\ +4 \\ +5 \\ +6 \\ +7 \\ +8 \\ +9 +\end{pmatrix} \in {\mathbb{R}}^{9}

+

Keep in mind that vectors need not be in n{\mathbb{R}}^{n} at all. +Recall that a vector space need only satisfy the aforementioned axioms +of a vector space.

+

Example. The vector space n{\mathbb{C}}^{n} is similar to +n{\mathbb{R}}^{n}, except it includes complex numbers. All complex +vector spaces are real vector spaces (as you can simply restrict them to +only use the real numbers), but not the other way around.

+

From now on, let us refer to vector spaces n{\mathbb{R}}^{n} and +n{\mathbb{C}}^{n} as 𝔽n{\mathbb{F}}^{n}.

+

In general, we can have a vector space where the scalars are in an +arbitrary field, as long as the axioms are satisfied.

+

Example. The vector space of all polynomials of at most degree 3, or +3{\mathbb{P}}^{3}. It is not yet clear what this vector may look like. +We shall return to this example once we discuss basis.

+

Vector addition. Multiplication

+

Vector addition, represented by ++ can be done entrywise.

+

Example.

+

(123)+(456)=(1+42+53+6)=(579)\begin{pmatrix} +1 \\ +2 \\ +3 +\end{pmatrix} + \begin{pmatrix} +4 \\ +5 \\ +6 +\end{pmatrix} = \begin{pmatrix} +1 + 4 \\ +2 + 5 \\ +3 + 6 +\end{pmatrix} = \begin{pmatrix} +5 \\ +7 \\ +9 +\end{pmatrix} (123)(456)=(142536)=(41018)\begin{pmatrix} +1 \\ +2 \\ +3 +\end{pmatrix} \cdot \begin{pmatrix} +4 \\ +5 \\ +6 +\end{pmatrix} = \begin{pmatrix} +1 \cdot 4 \\ +2 \cdot 5 \\ +3 \cdot 6 +\end{pmatrix} = \begin{pmatrix} +4 \\ +10 \\ +18 +\end{pmatrix}

+

This is simple enough to understand. Again, the difficulty is simply +ensuring that you always perform operations with the correct types. +For example, once we introduce matrices, it doesn’t make sense to +multiply or add vectors and matrices in this fashion.

+

Vector-scalar multiplication

+

Multiplying a vector by a scalar simply results in each entry of the +vector being multiplied by the scalar.

+

Example.

+

β(abc)=(βaβbβc)\beta\begin{pmatrix} +a \\ +b \\ +c +\end{pmatrix} = \begin{pmatrix} +\beta \cdot a \\ +\beta \cdot b \\ +\beta \cdot c +\end{pmatrix}

+

Linear combinations

+

Given vector spaces VV and WW and vectors vVv \in V and wWw \in W, +v+wv + w is the linear combination of vv and ww.

+

Spanning systems

+

We say that a set of vectors v1,v2,,vnVv_{1},v_{2},\ldots,v_{n} \in V span VV +if the linear combination of the vectors can represent any arbitrary +vector vVv \in V.

+

Precisely, given scalars α1,α2,,αn\alpha_{1},\alpha_{2},\ldots,\alpha_{n},

+

α1v1+α2v2++αnvn=v,vV\alpha_{1}v_{1} + \alpha_{2}v_{2} + \ldots + \alpha_{n}v_{n} = v,\forall v \in V

+

Note that any scalar αk\alpha_{k} could be 0. Therefore, it is possible +for a subset of a spanning system to also be a spanning system. The +proof of this fact is left as an exercise.

+

Intuition for linear independence and dependence

+

We say that vv and ww are linearly independent if vv cannot be +represented by the scaling of ww, and ww cannot be represented by the +scaling of vv. Otherwise, they are linearly dependent.

+

You may intuitively visualize linear dependence in the 2D plane as two +vectors both pointing in the same direction. Clearly, scaling one vector +will allow us to reach the other vector. Linear independence is +therefore two vectors pointing in different directions.

+

Of course, this definition applies to vectors in any 𝔽n{\mathbb{F}}^{n}.

+

Formal definition of linear dependence and independence

+

Let us formally define linear independence for arbitrary vectors in +𝔽n{\mathbb{F}}^{n}. Given a set of vectors

+

v1,v2,,vnVv_{1},v_{2},\ldots,v_{n} \in V

+

we say they are linearly independent iff. the equation

+

α1v1+α2v2++αnvn=0\alpha_{1}v_{1} + \alpha_{2}v_{2} + \ldots + \alpha_{n}v_{n} = 0

+

has only a unique set of solutions +α1,α2,,αn\alpha_{1},\alpha_{2},\ldots,\alpha_{n} such that all αn\alpha_{n} are +zero.

+

Equivalently,

+

|α1|+|α2|++|αn|=0\left| \alpha_{1} \right| + \left| \alpha_{2} \right| + \ldots + \left| \alpha_{n} \right| = 0

+

More precisely,

+

i=1k|αi|=0\sum_{i = 1}^{k}\left| \alpha_{i} \right| = 0

+

Therefore, a set of vectors v1,v2,,vmv_{1},v_{2},\ldots,v_{m} is linearly +dependent if the opposite is true, that is there exists solution +α1,α2,,αm\alpha_{1},\alpha_{2},\ldots,\alpha_{m} to the equation

+

α1v1+α2v2++αmvm=0\alpha_{1}v_{1} + \alpha_{2}v_{2} + \ldots + \alpha_{m}v_{m} = 0

+

such that

+

i=1k|αi|0\sum_{i = 1}^{k}\left| \alpha_{i} \right| \neq 0

+

Basis

+

We say a system of vectors v1,v2,,vnVv_{1},v_{2},\ldots,v_{n} \in V is a basis +in VV if the system is both linearly independent and spanning. That is, +the system must be able to represent any vector in VV as well as +satisfy our requirements for linear independence.

+

Equivalently, we may say that a system of vectors in VV is a basis in +VV if any vector vVv \in V admits a unique representation as a linear +combination of vectors in the system. This is equivalent to our previous +statement, that the system must be spanning and linearly independent.

+

Standard basis

+

We may define a standard basis for a vector space. By convention, the +standard basis in 2{\mathbb{R}}^{2} is

+

(10)(01)\begin{pmatrix} +1 \\ +0 +\end{pmatrix}\begin{pmatrix} +0 \\ +1 +\end{pmatrix}

+

Verify that the above is in fact a basis (that is, linearly independent +and generating).

+

Recalling the definition of the basis, we can represent any vector in +2{\mathbb{R}}^{2} as the linear combination of the standard basis.

+

Therefore, for any arbitrary vector v2v \in {\mathbb{R}}^{2}, we can +represent it as

+

v=α1(10)+α2(01)v = \alpha_{1}\begin{pmatrix} +1 \\ +0 +\end{pmatrix} + \alpha_{2}\begin{pmatrix} +0 \\ +1 +\end{pmatrix}

+

Let us call α1\alpha_{1} and α2\alpha_{2} the coordinates of the +vector. Then, we can write vv as

+

v=(α1α2)v = \begin{pmatrix} +\alpha_{1} \\ +\alpha_{2} +\end{pmatrix}

+

For example, the vector

+

(12)\begin{pmatrix} +1 \\ +2 +\end{pmatrix}

+

represents

+

1(10)+2(01)1 \cdot \begin{pmatrix} +1 \\ +0 +\end{pmatrix} + 2 \cdot \begin{pmatrix} +0 \\ +1 +\end{pmatrix}

+

Verify that this aligns with your previous intuition of vectors.

+

You may recognize the standard basis in 2{\mathbb{R}}^{2} as the +familiar unit vectors

+

î,ĵ\hat{i},\hat{j}

+

This aligns with the fact that

+

(αβ)=αî+βĵ\begin{pmatrix} +\alpha \\ +\beta +\end{pmatrix} = \alpha\hat{i} + \beta\hat{j}

+

However, we may define a standard basis in any arbitrary vector space. +So, let

+

e1,e2,,ene_{1},e_{2},\ldots,e_{n}

+

be a standard basis in 𝔽n{\mathbb{F}}^{n}. Then, the coordinates +α1,α2,,αn\alpha_{1},\alpha_{2},\ldots,\alpha_{n} of a vector +v𝔽nv \in {\mathbb{F}}^{n} represent the following

+

(α1α2αn)=α1e1+α2+e2+αnen\begin{pmatrix} +\alpha_{1} \\ +\alpha_{2} \\ + \vdots \\ +\alpha_{n} +\end{pmatrix} = \alpha_{1}e_{1} + \alpha_{2} + e_{2} + \alpha_{n}e_{n}

+

Using our new notation, the standard basis in 2{\mathbb{R}}^{2} is

+

e1=(10),e2=(01)e_{1} = \begin{pmatrix} +1 \\ +0 +\end{pmatrix},e_{2} = \begin{pmatrix} +0 \\ +1 +\end{pmatrix}

+

Matrices

+

Before discussing any properties of matrices, let’s simply reiterate +what we learned in class about their notation. We say a matrix with rows +of length mm, and columns of size nn (in less precise terms, a matrix +with length mm and height nn) is a m×nm \times n matrix.

+

Given a matrix

+

A=(123456789)A = \begin{pmatrix} +1 & 2 & 3 \\ +4 & 5 & 6 \\ +7 & 8 & 9 +\end{pmatrix}

+

we refer to the entry in row jj and column kk as Aj,kA_{j,k} .

+

Matrix transpose

+

A formalism that is useful later on is called the transpose, and we +obtain it from a matrix AA by switching all the rows and columns. More +precisely, each row becomes a column instead. We use the notation +ATA^{T} to represent the transpose of AA.

+

(123456)T=(142536)\begin{pmatrix} +1 & 2 & 3 \\ +4 & 5 & 6 +\end{pmatrix}^{T} = \begin{pmatrix} +1 & 4 \\ +2 & 5 \\ +3 & 6 +\end{pmatrix}

+

Formally, we can say (AT)j,k=Ak,j\left( A^{T} \right)_{j,k} = A_{k,j}

+

Linear transformations

+

A linear transformation T:VWT:V \rightarrow W is a mapping between two +vector spaces VWV \rightarrow W, such that the following axioms are +satisfied:

+
    +
  1. T(v+w)=T(v)+T(w),vV,wWT(v + w) = T(v) + T(w),\forall v \in V,\forall w \in W

  2. +
  3. T(αv)+T(βw)=αT(v)+βT(w),vV,wWT(\alpha v) + T(\beta w) = \alpha T(v) + \beta T(w),\forall v \in V,\forall w \in W, +for all scalars α,β\alpha,\beta

  4. +
+

Definition. TT is a linear transformation iff.

+

T(αv+βw)=αT(v)+βT(w)T(\alpha v + \beta w) = \alpha T(v) + \beta T(w)

+

Abuse of notation. From now on, we may elide the parentheses and say +that T(v)=Tv,vVT(v) = Tv,\forall v \in V

+

Remark. A phrase that you may commonly hear is that linear +transformations preserve linearity. Essentially, straight lines remain +straight, parallel lines remain parallel, and the origin remains fixed +at 0. Take a moment to think about why this is true (at least, in lower +dimensional spaces you can visualize).

+

Examples.

+
    +
  1. Rotation for V=W=2V = W = {\mathbb{R}}^{2} (i.e. rotation in 2 +dimensions). Given v,w2v,w \in {\mathbb{R}}^{2}, and their linear +combination v+wv + w, a rotation of γ\gamma radians of v+wv + w is +equivalent to first rotating vv and ww individually by γ\gamma +and then taking their linear combination.

  2. +
  3. Differentiation of polynomials. In this case V=nV = {\mathbb{P}}^{n} +and W=n1W = {\mathbb{P}}^{n - 1}, where n{\mathbb{P}}^{n} is the +field of all polynomials of degree at most nn.

    +

    ddx(αv+βw)=αddxv+βddxw,vV,wW, scalars α,β\frac{d}{dx}(\alpha v + \beta w) = \alpha\frac{d}{dx}v + \beta\frac{d}{dx}w,\forall v \in V,w \in W,\forall\text{ scalars }\alpha,\beta

  4. +
+

Matrices represent linear transformations

+

Suppose we wanted to represent a linear transformation +T:𝔽n𝔽mT:{\mathbb{F}}^{n} \rightarrow {\mathbb{F}}^{m}. I propose that we +need encode how TT acts on the standard basis of 𝔽n{\mathbb{F}}^{n}.

+

Using our intuition from lower dimensional vector spaces, we know that +the standard basis in 2{\mathbb{R}}^{2} is the unit vectors î\hat{i} +and ĵ\hat{j}. Because linear transformations preserve linearity (i.e. +all straight lines remain straight and parallel lines remain parallel), +we can encode any transformation as simply changing î\hat{i} and +ĵ\hat{j}. And indeed, if any vector v2v \in {\mathbb{R}}^{2} can be +represented as the linear combination of î\hat{i} and ĵ\hat{j} (this +is the definition of a basis), it makes sense both symbolically and +geometrically that we can represent all linear transformations as the +transformations of the basis vectors.

+

Example. To reflect all vectors v2v \in {\mathbb{R}}^{2} across the +yy-axis, we can simply change the standard basis to

+

(10)(01)\begin{pmatrix} + - 1 \\ +0 +\end{pmatrix}\begin{pmatrix} +0 \\ +1 +\end{pmatrix}

+

Then, any vector in 2{\mathbb{R}}^{2} using this new basis will be +reflected across the yy-axis. Take a moment to justify this +geometrically.

+

Writing a linear transformation as matrix

+

For any linear transformation +T:𝔽m𝔽nT:{\mathbb{F}}^{m} \rightarrow {\mathbb{F}}^{n}, we can write it as an +n×mn \times m matrix AA. That is, there is a matrix AA with nn rows +and mm columns that can represent any linear transformation from +𝔽m𝔽n{\mathbb{F}}^{m} \rightarrow {\mathbb{F}}^{n}.

+

How should we write this matrix? Naturally, from our previous +discussion, we should write a matrix with each column being one of our +new transformed basis vectors.

+

Example. Our yy-axis reflection transformation from earlier. We write +the bases in a matrix

+

(1001)\begin{pmatrix} + - 1 & 0 \\ +0 & 1 +\end{pmatrix}

+

Matrix-vector multiplication

+

Perhaps you now see why the so-called matrix-vector multiplication is +defined the way it is. Recalling our definition of a basis, given a +basis in VV, any vector vVv \in V can be written as the linear +combination of the vectors in the basis. Then, given a linear +transformation represented by the matrix containing the new basis, we +simply write the linear combination with the new basis instead.

+

Example. Let us first write a vector in the standard basis in +2{\mathbb{R}}^{2} and then show how our matrix-vector multiplication +naturally corresponds to the definition of the linear transformation.

+

(12)2\begin{pmatrix} +1 \\ +2 +\end{pmatrix} \in {\mathbb{R}}^{2}

+

is the same as

+

1(10)+2(01)1 \cdot \begin{pmatrix} +1 \\ +0 +\end{pmatrix} + 2 \cdot \begin{pmatrix} +0 \\ +1 +\end{pmatrix}

+

Then, to perform our reflection, we need only replace the basis vector +(10)\begin{pmatrix} +1 \\ +0 +\end{pmatrix} with (10)\begin{pmatrix} + - 1 \\ +0 +\end{pmatrix}.

+

Then, the reflected vector is given by

+

1(10)+2(01)=(12)1 \cdot \begin{pmatrix} + - 1 \\ +0 +\end{pmatrix} + 2 \cdot \begin{pmatrix} +0 \\ +1 +\end{pmatrix} = \begin{pmatrix} + - 1 \\ +2 +\end{pmatrix}

+

We can clearly see that this is exactly how the matrix multiplication

+

(1001)(12)\begin{pmatrix} + - 1 & 0 \\ +0 & 1 +\end{pmatrix} \cdot \begin{pmatrix} +1 \\ +2 +\end{pmatrix} is defined! The column-by-coordinate rule for +matrix-vector multiplication says that we multiply the nthn^{\text{th}} +entry of the vector by the corresponding nthn^{\text{th}} column of the +matrix and sum them all up (take their linear combination). This +algorithm intuitively follows from our definition of matrices.

+

Matrix-matrix multiplication

+

As you may have noticed, a very similar natural definition arises for +the matrix-matrix multiplication. Multiplying two matrices ABA \cdot B +is essentially just taking each column of BB, and applying the linear +transformation defined by the matrix AA!

+
+ + + + + diff --git a/atom.xml b/atom.xml new file mode 100644 index 0000000..f9643d4 --- /dev/null +++ b/atom.xml @@ -0,0 +1,1761 @@ + + + The Involution + + + https://blog.youwen.dev/atom.xml + + Youwen Wu + + youwenw@gmail.com + + + 2025-02-16T00:00:00Z + + Random variables, distributions, and probability theory + + https://blog.youwen.dev/random-variables-distributions-and-probability-theory.html + 2025-02-16T00:00:00Z + 2025-02-16T00:00:00Z + +
+

+ Random variables, distributions, and probability theory +

+

+ An overview of discrete and continuous random variables and their distributions and moment generating functions +

+
2025-02-16
+
+ +
+
+

These are some notes I’ve been collecting on random variables, their +distributions, expected values, and moment generating functions. I +thought I’d write them down somewhere useful.

+

These are almost extracted verbatim from my in-class notes, which I take +in real time using Typst. I simply wrote a tiny compatibility shim to +allow Pandoc to render them to the web.

+
+

Random variables

+

First, some brief exposition on random variables. Quixotically, a random +variable is actually a function.

+

Standard notation: Ω\Omega is a sample space, ωΩ\omega \in \Omega is an +event.

+

Definition.

+

A random variable XX is a function +X:ΩX:\Omega \rightarrow {\mathbb{R}} that takes the set of possible +outcomes in a sample space, and maps it to a measurable +space, typically (as in +our case) a subset of \mathbb{R}.

+

Definition.

+

The state space of a random variable XX is all of the values XX +can take.

+

Example.

+

Let XX be a random variable that takes on the values +{0,1,2,3}\left\{ 0,1,2,3 \right\}. Then the state space of XX is the set +{0,1,2,3}\left\{ 0,1,2,3 \right\}.

+

Discrete random variables

+

A random variable XX is discrete if there is countable AA such that +P(XA)=1P(X \in A) = 1. kk is a possible value if P(X=k)>0P(X = k) > 0. We discuss +continuous random variables later.

+

The probability distribution of XX gives its important probabilistic +information. The probability distribution is a description of the +probabilities P(XB)P(X \in B) for subsets BB \in {\mathbb{R}}. We describe +the probability density function and the cumulative distribution +function.

+

A discrete random variable has probability distribution entirely +determined by its probability mass function (hereafter abbreviated p.m.f +or PMF) p(k)=P(X=k)p(k) = P(X = k). The p.m.f. is a function from the set of +possible values of XX into [0,1]\lbrack 0,1\rbrack. Labeling the p.m.f. +with the random variable is done by pX(k)p_{X}(k).

+

pX: State space of X[0,1]p_{X}:\text{ State space of }X \rightarrow \lbrack 0,1\rbrack

+

By the axioms of probability,

+

kpX(k)=kP(X=k)=1\sum_{k}p_{X}(k) = \sum_{k}P(X = k) = 1

+

For a subset BB \subset {\mathbb{R}},

+

P(XB)=kBpX(k)P(X \in B) = \sum_{k \in B}p_{X}(k)

+

Continuous random variables

+

Now as promised we introduce another major class of random variables.

+

Definition.

+

Let XX be a random variable. If ff satisfies

+

P(Xb)=bf(x)dxP(X \leq b) = \int_{- \infty}^{b}f(x)dx

+

for all bb \in {\mathbb{R}}, then ff is the probability density +function (hereafter abbreviated p.d.f. or PDF) of XX.

+

We immediately see that the p.d.f. is analogous to the p.m.f. of the +discrete case.

+

The probability that X(,b]X \in ( - \infty,b\rbrack is equal to the area +under the graph of ff from - \infty to bb.

+

A corollary is the following.

+

Fact.

+

P(XB)=Bf(x)dxP(X \in B) = \int_{B}f(x)dx

+

for any BB \subset {\mathbb{R}} where integration makes sense.

+

The set can be bounded or unbounded, or any collection of intervals.

+

Fact.

+

P(aXb)=abf(x)dxP(a \leq X \leq b) = \int_{a}^{b}f(x)dx +P(X>a)=af(x)dxP(X > a) = \int_{a}^{\infty}f(x)dx

+

Fact.

+

If a random variable XX has density function ff then individual point +values have probability zero:

+

P(X=c)=ccf(x)dx=0,cP(X = c) = \int_{c}^{c}f(x)dx = 0,\forall c \in {\mathbb{R}}

+

Remark.

+

It follows a random variable with a density function is not discrete. An +immediate corollary of this is that the probabilities of intervals are +not changed by including or excluding endpoints. So P(Xk)P(X \leq k) and +P(X<k)P(X < k) are equivalent.

+

How to determine which functions are p.d.f.s? Since +P(<X<)=1P( - \infty < X < \infty) = 1, a p.d.f. ff must satisfy

+

f(x)0xf(x)dx=1\begin{array}{r} +f(x) \geq 0\forall x \in {\mathbb{R}} \\ +\int_{- \infty}^{\infty}f(x)dx = 1 +\end{array}

+

Fact.

+

Random variables with density functions are called continuous random +variables. This does not imply that the random variable is a continuous +function on Ω\Omega but it is standard terminology.

+

Discrete distributions

+

Recall that the probability distribution of XX gives its important +probabilistic information. Let us discuss some of these distributions.

+

In general we first consider the experiment’s properties and theorize +about the distribution that its random variable takes. We can then apply +the distribution to find out various pieces of probabilistic +information.

+

Bernoulli trials

+

A Bernoulli trial is the original “experiment.” It’s simply a single +trial with a binary “success” or “failure” outcome. Encode this T/F, 0 +or 1, or however you’d like. It becomes immediately useful in defining +more complex distributions, so let’s analyze its properties.

+

The setup: the experiment has exactly two outcomes:

+
    +
  • Success – SS or 1

  • +
  • Failure – FF or 0

  • +
+

Additionally: P(S)=p,(0<p<1)P(F)=1p=q\begin{array}{r} +P(S) = p,(0 < p < 1) \\ +P(F) = 1 - p = q +\end{array}

+

Construct the probability mass function:

+

P(X=1)=pP(X=0)=1p\begin{array}{r} +P(X = 1) = p \\ +P(X = 0) = 1 - p +\end{array}

+

Write it as:

+

px(k)=pk(1p)1kp_{x(k)} = p^{k}(1 - p)^{1 - k}

+

for k=1k = 1 and k=0k = 0.

+

Binomial distribution

+

The setup: very similar to Bernoulli, trials have exactly 2 outcomes. A +bunch of Bernoulli trials in a row.

+

Importantly: pp and qq are defined exactly the same in all trials.

+

This ties the binomial distribution to the sampling with replacement +model, since each trial does not affect the next.

+

We conduct nn independent trials of this experiment. Example with +coins: each flip independently has a 12\frac{1}{2} chance of heads or +tails (holds same for die, rigged coin, etc).

+

nn is fixed, i.e. known ahead of time.

+

Binomial random variable

+

Let’s consider the random variable characterized by the binomial +distribution now.

+

Let X=#X = \# of successes in nn independent trials. For any particular +sequence of nn trials, it takes the form +Ω={ω} where ω=SFFF\Omega = \left\{ \omega \right\}\text{ where }\omega = SFF\cdots F and +is of length nn.

+

Then X(ω)=0,1,2,,nX(\omega) = 0,1,2,\ldots,n can take n+1n + 1 possible values. The +probability of any particular sequence is given by the product of the +individual trial probabilities.

+

Example.

+

ω=SFFSFS=(pqqpqp)\omega = SFFSF\cdots S = (pqqpq\cdots p)

+

So P(x=0)=P(FFFF)=qqq=qnP(x = 0) = P(FFF\cdots F) = q \cdot q \cdot \cdots \cdot q = q^{n}.

+

And P(X=1)=P(SFFF)+P(FSFFF)++P(FFFFS)=n possible outcomesp1pn1=(n1)p1pn1=np1pn1\begin{array}{r} +P(X = 1) = P(SFF\cdots F) + P(FSFF\cdots F) + \cdots + P(FFF\cdots FS) \\ + = \underset{\text{ possible outcomes}}{\underbrace{n}} \cdot p^{1} \cdot p^{n - 1} \\ + = \begin{pmatrix} +n \\ +1 +\end{pmatrix} \cdot p^{1} \cdot p^{n - 1} \\ + = n \cdot p^{1} \cdot p^{n - 1} +\end{array}

+

Now we can generalize

+

P(X=2)=(n2)p2qn2P(X = 2) = \begin{pmatrix} +n \\ +2 +\end{pmatrix}p^{2}q^{n - 2}

+

How about all successes?

+

P(X=n)=P(SSS)=pnP(X = n) = P(SS\cdots S) = p^{n}

+

We see that for all failures we have qnq^{n} and all successes we have +pnp^{n}. Otherwise we use our method above.

+

In general, here is the probability mass function for the binomial +random variable

+

P(X=k)=(nk)pkqnk, for k=0,1,2,,nP(X = k) = \begin{pmatrix} +n \\ +k +\end{pmatrix}p^{k}q^{n - k},\text{ for }k = 0,1,2,\ldots,n

+

Binomial distribution is very powerful. Choosing between two things, +what are the probabilities?

+

To summarize the characterization of the binomial random variable:

+
    +
  • nn independent trials

  • +
  • each trial results in binary success or failure

  • +
  • with probability of success pp, identically across trials

  • +
+

with X=#X = \# successes in fixed nn trials.

+

X Bin(n,p)X\sim\text{ Bin}(n,p)

+

with probability mass function

+

P(X=x)=(nx)px(1p)nx=p(x) for x=0,1,2,,nP(X = x) = \begin{pmatrix} +n \\ +x +\end{pmatrix}p^{x}(1 - p)^{n - x} = p(x)\text{ for }x = 0,1,2,\ldots,n

+

We see this is in fact the binomial theorem!

+

p(x)0,x=0np(x)=x=0n(nx)pxqnx=(p+q)np(x) \geq 0,\sum_{x = 0}^{n}p(x) = \sum_{x = 0}^{n}\begin{pmatrix} +n \\ +x +\end{pmatrix}p^{x}q^{n - x} = (p + q)^{n}

+

In fact, (p+q)n=(p+(1p))n=1(p + q)^{n} = \left( p + (1 - p) \right)^{n} = 1

+

Example.

+

What is the probability of getting exactly three aces (1’s) out of 10 +throws of a fair die?

+

Seems a little trickier but we can still write this as well defined +SS/FF. Let SS be getting an ace and FF being anything else.

+

Then p=16p = \frac{1}{6} and n=10n = 10. We want P(X=3)P(X = 3). So

+

P(X=3)=(103)p3q7=(103)(16)3(56)70.15505\begin{array}{r} +P(X = 3) = \begin{pmatrix} +10 \\ +3 +\end{pmatrix}p^{3}q^{7} = \begin{pmatrix} +10 \\ +3 +\end{pmatrix}\left( \frac{1}{6} \right)^{3}\left( \frac{5}{6} \right)^{7} \\ + \approx 0.15505 +\end{array}

+

With or without replacement?

+

I place particular emphasis on the fact that the binomial distribution +generally applies to cases where you’re sampling with replacement. +Consider the following: Example.

+

Suppose we have two types of candy, red and black. Select nn candies. +Let XX be the number of red candies among nn selected.

+

2 cases.

+
    +
  • case 1: with replacement: Binomial Distribution, nn, +p=aa+bp = \frac{a}{a + b}.
  • +
+

P(X=2)=(n2)(aa+b)2(ba+b)n2P(X = 2) = \begin{pmatrix} +n \\ +2 +\end{pmatrix}\left( \frac{a}{a + b} \right)^{2}\left( \frac{b}{a + b} \right)^{n - 2}

+
    +
  • case 2: without replacement: then use counting
  • +
+

P(X=x)=(ax)(bnx)(a+bn)=p(x)P(X = x) = \frac{\begin{pmatrix} +a \\ +x +\end{pmatrix}\begin{pmatrix} +b \\ +n - x +\end{pmatrix}}{\begin{pmatrix} +a + b \\ +n +\end{pmatrix}} = p(x)

+

In case 2, we used the elementary counting techniques we are already +familiar with. Immediately we see a distinct case similar to the +binomial but when sampling without replacement. Let’s formalize this as +a random variable!

+

Hypergeometric distribution

+

Let’s introduce a random variable to represent a situation like case 2 +above.

+

Definition.

+

P(X=x)=(ax)(bnx)(a+bn)=p(x)P(X = x) = \frac{\begin{pmatrix} +a \\ +x +\end{pmatrix}\begin{pmatrix} +b \\ +n - x +\end{pmatrix}}{\begin{pmatrix} +a + b \\ +n +\end{pmatrix}} = p(x)

+

is known as a Hypergeometric distribution.

+

Abbreviate this by:

+

X Hypergeom(# total,# successes, sample size)X\sim\text{ Hypergeom}\left( \#\text{ total},\#\text{ successes},\text{ sample size} \right)

+

For example,

+

X Hypergeom(N,Na,n)X\sim\text{ Hypergeom}\left( N,N_{a},n \right)

+

Remark.

+

If xx is very small relative to a+ba + b, then both cases give similar +(approx. the same) answers.

+

For instance, if we’re sampling for blood types from UCSB, and we take a +student out without replacement, we don’t really change the sample size +substantially. So both answers give a similar result.

+

Suppose we have two types of items, type AA and type BB. Let NAN_{A} +be #\# type AA, NBN_{B} #\# type BB. N=NA+NBN = N_{A} + N_{B} is the +total number of objects.

+

We sample nn items without replacement (nNn \leq N) with order not +mattering. Denote by XX the number of type AA objects in our sample.

+

Definition.

+

Let 0NAN0 \leq N_{A} \leq N and 1nN1 \leq n \leq N be integers. A random +variable XX has the hypergeometric distribution with parameters +(N,NA,n)\left( N,N_{A},n \right) if XX takes values in the set +{0,1,,n}\left\{ 0,1,\ldots,n \right\} and has p.m.f.

+

P(X=k)=(NAk)(NNAnk)(Nn)=p(k)P(X = k) = \frac{\begin{pmatrix} +N_{A} \\ +k +\end{pmatrix}\begin{pmatrix} +N - N_{A} \\ +n - k +\end{pmatrix}}{\begin{pmatrix} +N \\ +n +\end{pmatrix}} = p(k)

+

Example.

+

Let NA=10N_{A} = 10 defectives. Let NB=90N_{B} = 90 non-defectives. We select +n=5n = 5 without replacement. What is the probability that 2 of the 5 +selected are defective?

+

X Hypergeom (N=100,NA=10,n=5)X\sim\text{ Hypergeom }\left( N = 100,N_{A} = 10,n = 5 \right)

+

We want P(X=2)P(X = 2).

+

P(X=2)=(102)(903)(1005)0.0702P(X = 2) = \frac{\begin{pmatrix} +10 \\ +2 +\end{pmatrix}\begin{pmatrix} +90 \\ +3 +\end{pmatrix}}{\begin{pmatrix} +100 \\ +5 +\end{pmatrix}} \approx 0.0702

+

Remark.

+

Make sure you can distinguish when a problem is binomial or when it is +hypergeometric. This is very important on exams.

+

Recall that both ask about number of successes, in a fixed number of +trials. But binomial is sample with replacement (each trial is +independent) and sampling without replacement is hypergeometric.

+

Geometric distribution

+

Consider an infinite sequence of independent trials. e.g. number of +attempts until I make a basket.

+

In fact we can think of this as a variation on the binomial +distribution. But in this case we don’t sample nn times and ask how +many successes we have, we sample as many times as we need for one +success. Later on we’ll see this is really a specific case of another +distribution, the negative binomial.

+

Let XiX_{i} denote the outcome of the ithi^{\text{th}} trial, where +success is 1 and failure is 0. Let NN be the number of trials needed to +observe the first success in a sequence of independent trials with +probability of success pp. Then

+

We fail k1k - 1 times and succeed on the kthk^{\text{th}} try. Then:

+

P(N=k)=P(X1=0,X2=0,,Xk1=0,Xk=1)=(1p)k1pP(N = k) = P\left( X_{1} = 0,X_{2} = 0,\ldots,X_{k - 1} = 0,X_{k} = 1 \right) = (1 - p)^{k - 1}p

+

This is the probability of failures raised to the amount of failures, +times probability of success.

+

The key characteristic in these trials, we keep going until we succeed. +There’s no nn choose kk in front like the binomial distribution +because there’s exactly one sequence that gives us success.

+

Definition.

+

Let 0<p10 < p \leq 1. A random variable XX has the geometric distribution +with success parameter pp if the possible values of XX are +{1,2,3,}\left\{ 1,2,3,\ldots \right\} and XX satisfies

+

P(X=k)=(1p)k1pP(X = k) = (1 - p)^{k - 1}p

+

for positive integers kk. Abbreviate this by X Geom(p)X\sim\text{ Geom}(p).

+

Example.

+

What is the probability it takes more than seven rolls of a fair die to +roll a six?

+

Let XX be the number of rolls of a fair die until the first six. Then +X Geom(16)X\sim\text{ Geom}\left( \frac{1}{6} \right). Now we just want +P(X>7)P(X > 7).

+

P(X>7)=k=8P(X=k)=k=8(56)k116P(X > 7) = \sum_{k = 8}^{\infty}P(X = k) = \sum_{k = 8}^{\infty}\left( \frac{5}{6} \right)^{k - 1}\frac{1}{6}

+

Re-indexing,

+

k=8(56)k116=16(56)7j=0(56)j\sum_{k = 8}^{\infty}\left( \frac{5}{6} \right)^{k - 1}\frac{1}{6} = \frac{1}{6}\left( \frac{5}{6} \right)^{7}\sum_{j = 0}^{\infty}\left( \frac{5}{6} \right)^{j}

+

Now we calculate by standard methods:

+

16(56)7j=0(56)j=16(56)71156=(56)7\frac{1}{6}\left( \frac{5}{6} \right)^{7}\sum_{j = 0}^{\infty}\left( \frac{5}{6} \right)^{j} = \frac{1}{6}\left( \frac{5}{6} \right)^{7} \cdot \frac{1}{1 - \frac{5}{6}} = \left( \frac{5}{6} \right)^{7}

+

Negative binomial

+

As promised, here’s the negative binomial.

+

Consider a sequence of Bernoulli trials with the following +characteristics:

+
    +
  • Each trial success or failure

  • +
  • Prob. of success pp is same on each trial

  • +
  • Trials are independent (notice they are not fixed to specific +number)

  • +
  • Experiment continues until rr successes are observed, where rr is +a given parameter

  • +
+

Then if XX is the number of trials necessary until rr successes are +observed, we say XX is a negative binomial random variable.

+

Immediately we see that the geometric distribution is just the negative +binomial with r=1r = 1.

+

Definition.

+

Let k+k \in {\mathbb{Z}}^{+} and 0<p10 < p \leq 1. A random variable XX +has the negative binomial distribution with parameters +{k,p}\left\{ k,p \right\} if the possible values of XX are the integers +{k,k+1,k+2,}\left\{ k,k + 1,k + 2,\ldots \right\} and the p.m.f. is

+

P(X=n)=(n1k1)pk(1p)nk for nkP(X = n) = \begin{pmatrix} +n - 1 \\ +k - 1 +\end{pmatrix}p^{k}(1 - p)^{n - k}\text{ for }n \geq k

+

Abbreviate this by X Negbin(k,p)X\sim\text{ Negbin}(k,p).

+

Example.

+

Steph Curry has a three point percentage of approx. 43%43\%. What is the +probability that Steph makes his third three-point basket on his +5th5^{\text{th}} attempt?

+

Let XX be number of attempts required to observe the 3rd success. Then,

+

X Negbin(k=3,p=0.43)X\sim\text{ Negbin}(k = 3,p = 0.43)

+

So, P(X=5)=(5131)(0.43)3(10.43)53=(42)(0.43)3(0.57)20.155\begin{aligned} +P(X = 5) & = {\begin{pmatrix} +5 - 1 \\ +3 - 1 +\end{pmatrix}(0.43)}^{3}(1 - 0.43)^{5 - 3} \\ + & = \begin{pmatrix} +4 \\ +2 +\end{pmatrix}(0.43)^{3}(0.57)^{2} \\ + & \approx 0.155 +\end{aligned}

+

Poisson distribution

+

This p.m.f. follows from the Taylor expansion

+

eλ=k=0λkk!e^{\lambda} = \sum_{k = 0}^{\infty}\frac{\lambda^{k}}{k!}

+

which implies that

+

k=0eλλkk!=eλeλ=1\sum_{k = 0}^{\infty}e^{- \lambda}\frac{\lambda^{k}}{k!} = e^{- \lambda}e^{\lambda} = 1

+

Definition.

+

For an integer valued random variable XX, we say +X Poisson(λ)X\sim\text{ Poisson}(\lambda) if it has p.m.f.

+

P(X=k)=eλλkk!P(X = k) = e^{- \lambda}\frac{\lambda^{k}}{k!}

+

for k{0,1,2,}k \in \left\{ 0,1,2,\ldots \right\} for λ>0\lambda > 0 and

+

k=0P(X=k)=1\sum_{k = 0}^{\infty}P(X = k) = 1

+

The Poisson arises from the Binomial. It applies in the binomial context +when nn is very large (n100n \geq 100) and pp is very small +p0.05p \leq 0.05, such that npnp is a moderate number (np<10np < 10).

+

Then XX follows a Poisson distribution with λ=np\lambda = np.

+

P(Bin(n,p)=k)P(Poisson(λ=np)=k)P\left( \text{Bin}(n,p) = k \right) \approx P\left( \text{Poisson}(\lambda = np) = k \right)

+

for k=0,1,,nk = 0,1,\ldots,n.

+

The Poisson distribution is useful for finding the probabilities of rare +events over a continuous interval of time. By knowing λ=np\lambda = np for +small nn and pp, we can calculate many probabilities.

+

Example.

+

The number of typing errors in the page of a textbook.

+

Let

+
    +
  • nn be the number of letters of symbols per page (large)

  • +
  • pp be the probability of error, small enough such that

  • +
  • limnlimp0np=λ=0.1\lim\limits_{n \rightarrow \infty}\lim\limits_{p \rightarrow 0}np = \lambda = 0.1

  • +
+

What is the probability of exactly 1 error?

+

We can approximate the distribution of XX with a +Poisson(λ=0.1)\text{Poisson}(\lambda = 0.1) distribution

+

P(X=1)=e0.1(0.1)11!=0.09048P(X = 1) = \frac{e^{- 0.1}(0.1)^{1}}{1!} = 0.09048

+

Continuous distributions

+

All of the distributions we’ve been analyzing have been discrete, that +is, they apply to random variables with a +countable state space. +Even when the state space is infinite, as in the negative binomial, it +is countable. We can think of it as indexing each trial with a natural +number 0,1,2,3,0,1,2,3,\ldots.

+

Now we turn our attention to continuous random variables that operate on +uncountably infinite state spaces. For example, if we sample uniformly +inside of the interval [0,1]\lbrack 0,1\rbrack, there are an uncountably +infinite number of possible values we could obtain. We cannot index +these values by the natural numbers, by some theorems of set theory we +in fact know that the interval [0,1]\lbrack 0,1\rbrack has a bijection to +\mathbb{R} and has cardinality א1א_{1}.

+

Additionally we notice that asking for the probability that we pick a +certain point in the interval [0,1]\lbrack 0,1\rbrack makes no sense, there +are an infinite amount of sample points! Intuitively we should think +that the probability of choosing any particular point is 0. However, we +should be able to make statements about whether we can choose a point +that lies within a subset, like [0,0.5]\lbrack 0,0.5\rbrack.

+

Let’s formalize these ideas.

+

Definition.

+

Let XX be a random variable. If we have a function ff such that

+

P(Xb)=bf(x)dxP(X \leq b) = \int_{- \infty}^{b}f(x)dx for all +bb \in {\mathbb{R}}, then ff is the probability density function +of XX.

+

The probability that the value of XX lies in (,b]( - \infty,b\rbrack +equals the area under the curve of ff from - \infty to bb.

+

If ff satisfies this definition, then for any BB \subset {\mathbb{R}} +for which integration makes sense,

+

P(XB)=Bf(x)dxP(X \in B) = \int_{B}f(x)dx

+

Remark.

+

Recall from our previous discussion of random variables that the PDF is +the analogue of the PMF for discrete random variables.

+

Properties of a CDF:

+

Any CDF F(x)=P(Xx)F(x) = P(X \leq x) satisfies

+
    +
  1. Integrates to unity: F()=0F( - \infty) = 0, F()=1F(\infty) = 1

  2. +
  3. F(x)F(x) is non-decreasing in xx (monotonically increasing)

  4. +
+

s<tF(s)F(t)s < t \Rightarrow F(s) \leq F(t)

+
    +
  1. P(a<Xb)=P(Xb)P(Xa)=F(b)F(a)P(a < X \leq b) = P(X \leq b) - P(X \leq a) = F(b) - F(a)
  2. +
+

Like we mentioned before, we can only ask about things like +P(Xk)P(X \leq k), but not P(X=k)P(X = k). In fact P(X=k)=0P(X = k) = 0 for all kk. +An immediate corollary of this is that we can freely interchange \leq +and << and likewise for \geq and >>, since P(Xk)=P(X<k)P(X \leq k) = P(X < k) +if P(X=k)=0P(X = k) = 0.

+

Example.

+

Let XX be a continuous random variable with density (pdf)

+

f(x)={cx2for 0<x<20otherwise f(x) = \begin{cases} +cx^{2} & \text{for }0 < x < 2 \\ +0 & \text{otherwise } +\end{cases}

+
    +
  1. What is cc?
  2. +
+

cc is such that +1=f(x)dx=02cx2dx1 = \int_{- \infty}^{\infty}f(x)dx = \int_{0}^{2}cx^{2}dx

+
    +
  1. Find the probability that XX is between 1 and 1.4.
  2. +
+

Integrate the curve between 1 and 1.4.

+

11.438x2dx=(x38)|11.4=0.218\begin{array}{r} +\int_{1}^{1.4}\frac{3}{8}x^{2}dx = \left( \frac{x^{3}}{8} \right)|_{1}^{1.4} \\ + = 0.218 +\end{array}

+

This is the probability that XX lies between 1 and 1.4.

+
    +
  1. Find the probability that XX is between 1 and 3.
  2. +
+

Idea: integrate between 1 and 3, be careful after 2.

+

1238x2dx+230dx=\int_{1}^{2}\frac{3}{8}x^{2}dx + \int_{2}^{3}0dx =

+
    +
  1. What is the CDF for P(Xx)P(X \leq x)? Integrate the curve to xx.
  2. +
+

F(x)=P(Xx)=xf(t)dt=0x38t2dt=x38\begin{array}{r} +F(x) = P(X \leq x) = \int_{- \infty}^{x}f(t)dt \\ + = \int_{0}^{x}\frac{3}{8}t^{2}dt \\ + = \frac{x^{3}}{8} +\end{array}

+

Important: include the range!

+

F(x)={0for x0x38for 0<x<21for x2F(x) = \begin{cases} +0 & \text{for }x \leq 0 \\ +\frac{x^{3}}{8} & \text{for }0 < x < 2 \\ +1 & \text{for }x \geq 2 +\end{cases}

+
    +
  1. Find a point aa such that you integrate up to the point to find +exactly 12\frac{1}{2}
  2. +
+

the area.

+

We want to find 12=P(Xa)\frac{1}{2} = P(X \leq a).

+

12=P(Xa)=F(a)=a38a=43\frac{1}{2} = P(X \leq a) = F(a) = \frac{a^{3}}{8} \Rightarrow a = \sqrt[3]{4}

+

Now let us discuss some named continuous distributions.

+

The (continuous) uniform distribution

+

The most simple and the best of the named distributions!

+

Definition.

+

Let [a,b]\lbrack a,b\rbrack be a bounded interval on the real line. A +random variable XX has the uniform distribution on the interval +[a,b]\lbrack a,b\rbrack if XX has the density function

+

f(x)={1bafor x[a,b]0for x[a,b]f(x) = \begin{cases} +\frac{1}{b - a} & \text{for }x \in \lbrack a,b\rbrack \\ +0 & \text{for }x \notin \lbrack a,b\rbrack +\end{cases}

+

Abbreviate this by X Unif [a,b]X\sim\text{ Unif }\lbrack a,b\rbrack.

+

The graph of Unif [a,b]\text{Unif }\lbrack a,b\rbrack is a constant line at +height 1ba\frac{1}{b - a} defined across [a,b]\lbrack a,b\rbrack. The +integral is just the area of a rectangle, and we can check it is 1.

+

Fact.

+

For X Unif [a,b]X\sim\text{ Unif }\lbrack a,b\rbrack, its cumulative distribution +function (CDF) is given by:

+

Fx(x)={0for x<axabafor x[a,b]1for x>bF_{x}(x) = \begin{cases} +0 & \text{for }x < a \\ +\frac{x - a}{b - a} & \text{for }x \in \lbrack a,b\rbrack \\ +1 & \text{for }x > b +\end{cases}

+

Fact.

+

If X Unif [a,b]X\sim\text{ Unif }\lbrack a,b\rbrack, and +[c,d][a,b]\lbrack c,d\rbrack \subset \lbrack a,b\rbrack, then +P(cXd)=cd1badx=dcbaP(c \leq X \leq d) = \int_{c}^{d}\frac{1}{b - a}dx = \frac{d - c}{b - a}

+

Example.

+

Let YY be a uniform random variable on [2,5]\lbrack - 2,5\rbrack. Find the +probability that its absolute value is at least 1.

+

YY takes values in the interval [2,5]\lbrack - 2,5\rbrack, so the absolute +value is at least 1 iff. +Y[2,1][1,5]Y \in \lbrack - 2,1\rbrack \cup \lbrack 1,5\rbrack.

+

The density function of YY is +f(x)=15(2)=17f(x) = \frac{1}{5 - ( - 2)} = \frac{1}{7} on [2,5]\lbrack - 2,5\rbrack +and 0 everywhere else.

+

So,

+

P(|Y|1)=P(Y[2,1][1,5])=P(2Y1)+P(1Y5)=57\begin{aligned} +P\left( |Y| \geq 1 \right) & = P\left( Y \in \lbrack - 2, - 1\rbrack \cup \lbrack 1,5\rbrack \right) \\ + & = P( - 2 \leq Y \leq - 1) + P(1 \leq Y \leq 5) \\ + & = \frac{5}{7} +\end{aligned}

+

The exponential distribution

+

The geometric distribution can be viewed as modeling waiting times, in a +discrete setting, i.e. we wait for n1n - 1 failures to arrive at the +nthn^{\text{th}} success.

+

The exponential distribution is the continuous analogue to the geometric +distribution, in that we often use it to model waiting times in the +continuous sense. For example, the first custom to enter the barber +shop.

+

Definition.

+

Let 0<λ<0 < \lambda < \infty. A random variable XX has the exponential +distribution with parameter λ\lambda if XX has PDF

+

f(x)={λeλxfor x00for x<0f(x) = \begin{cases} +\lambda e^{- \lambda x} & \text{for }x \geq 0 \\ +0 & \text{for }x < 0 +\end{cases}

+

Abbreviate this by X Exp(λ)X\sim\text{ Exp}(\lambda), the exponential +distribution with rate λ\lambda.

+

The CDF of the Exp(λ)\text{Exp}(\lambda) distribution is given by:

+

F(t)+{0if t<01eλtif t0F(t) + \begin{cases} +0 & \text{if }t < 0 \\ +1 - e^{- \lambda t} & \text{if }t \geq 0 +\end{cases}

+

Example.

+

Suppose the length of a phone call, in minutes, is well modeled by an +exponential random variable with a rate λ=110\lambda = \frac{1}{10}.

+
    +
  1. What is the probability that a call takes more than 8 minutes?

  2. +
  3. What is the probability that a call takes between 8 and 22 minutes?

  4. +
+

Let XX be the length of the phone call, so that +X Exp(110)X\sim\text{ Exp}\left( \frac{1}{10} \right). Then we can find the +desired probability by:

+

P(X>8)=1P(X8)=1Fx(8)=1(1e(110)8)=e8100.4493\begin{aligned} +P(X > 8) & = 1 - P(X \leq 8) \\ + & = 1 - F_{x}(8) \\ + & = 1 - \left( 1 - e^{- \left( \frac{1}{10} \right) \cdot 8} \right) \\ + & = e^{- \frac{8}{10}} \approx 0.4493 +\end{aligned}

+

Now to find P(8<X<22)P(8 < X < 22), we can take the difference in CDFs:

+

P(X>8)P(X22)=e810e22100.3385\begin{aligned} + & P(X > 8) - P(X \geq 22) \\ + & = e^{- \frac{8}{10}} - e^{- \frac{22}{10}} \\ + & \approx 0.3385 +\end{aligned}

+

Fact (Memoryless property of the exponential distribution).

+

Suppose that X Exp(λ)X\sim\text{ Exp}(\lambda). Then for any s,t>0s,t > 0, we +have P(X>t+s|X>t)=P(X>s)P\left( X > t + s~|~X > t \right) = P(X > s)

+

This is like saying if I’ve been waiting 5 minutes and then 3 minutes +for the bus, what is the probability that I’m gonna wait more than 5 + 3 +minutes, given that I’ve already waited 5 minutes? And that’s precisely +equal to just the probability I’m gonna wait more than 3 minutes.

+

Proof.

+

P(X>t+s|X>t)=P(X>t+sX>t)P(X>t)=P(X>t+s)P(X>t)=eλ(t+s)eλt=eλsP(X>s)\begin{array}{r} +P\left( X > t + s~|~X > t \right) = \frac{P(X > t + s \cap X > t)}{P(X > t)} \\ + = \frac{P(X > t + s)}{P(X > t)} = \frac{e^{- \lambda(t + s)}}{e^{- \lambda t}} = e^{- \lambda s} \\ + \equiv P(X > s) +\end{array}

+

Gamma distribution

+

Definition.

+

Let r,λ>0r,\lambda > 0. A random variable XX has the gamma +distribution with parameters (r,λ)(r,\lambda) if XX is nonnegative and +has probability density function

+

f(x)={λrxr2Γ(r)eλxfor x00for x<0f(x) = \begin{cases} +\frac{\lambda^{r}x^{r - 2}}{\Gamma(r)}e^{- \lambda x} & \text{for }x \geq 0 \\ +0 & \text{for }x < 0 +\end{cases}

+

Abbreviate this by X Gamma(r,λ)X\sim\text{ Gamma}(r,\lambda).

+

The gamma function Γ(r)\Gamma(r) generalizes the factorial function and is +defined as

+

Γ(r)=0xr1exdx, for r>0\Gamma(r) = \int_{0}^{\infty}x^{r - 1}e^{- x}dx,\text{ for }r > 0

+

Special case: Γ(n)=(n1)!\Gamma(n) = (n - 1)! if n+n \in {\mathbb{Z}}^{+}.

+

Remark.

+

The Exp(λ)\text{Exp}(\lambda) distribution is a special case of the gamma +distribution, with parameter r=1r = 1.

+

The normal distribution

+

Also known as the Gaussian distribution, this is so important it gets +its own section.

+

Definition.

+

A random variable ZZ has the standard normal distribution if ZZ +has density function

+

φ(x)=12πex22\varphi(x) = \frac{1}{\sqrt{2\pi}}e^{- \frac{x^{2}}{2}} on the real +line. Abbreviate this by ZN(0,1)Z\sim N(0,1).

+

Fact (CDF of a standard normal random variable).

+

Let ZN(0,1)Z\sim N(0,1) be normally distributed. Then its CDF is given by +Φ(x)=xφ(s)ds=x12πe(s2)2ds\Phi(x) = \int_{- \infty}^{x}\varphi(s)ds = \int_{- \infty}^{x}\frac{1}{\sqrt{2\pi}}e^{\frac{- \left( - s^{2} \right)}{2}}ds

+

The normal distribution is so important, instead of the standard +fZ(x)f_{Z(x)} and Fz(x)F_{z(x)}, we use the special φ(x)\varphi(x) and +Φ(x)\Phi(x).

+

Fact.

+

es22ds=2π\int_{- \infty}^{\infty}e^{- \frac{s^{2}}{2}}ds = \sqrt{2\pi}

+

No closed form of the standard normal CDF Φ\Phi exists, so we are left +to either:

+
    +
  • approximate

  • +
  • use technology (calculator)

  • +
  • use the standard normal probability table in the textbook

  • +
+

To evaluate negative values, we can use the symmetry of the normal +distribution to apply the following identity:

+

Φ(x)=1Φ(x)\Phi( - x) = 1 - \Phi(x)

+

General normal distributions

+

We can compute any other parameters of the normal distribution using the +standard normal.

+

The general family of normal distributions is obtained by linear or +affine transformations of ZZ. Let μ\mu be real, and σ>0\sigma > 0, then

+

X=σZ+μX = \sigma Z + \mu is also a normally distributed random variable +with parameters (μ,σ2)\left( \mu,\sigma^{2} \right). The CDF of XX in terms +of Φ()\Phi( \cdot ) can be expressed as

+

FX(x)=P(Xx)=P(σZ+μx)=P(Zxμσ)=Φ(xμσ)\begin{aligned} +F_{X}(x) & = P(X \leq x) \\ + & = P(\sigma Z + \mu \leq x) \\ + & = P\left( Z \leq \frac{x - \mu}{\sigma} \right) \\ + & = \Phi(\frac{x - \mu}{\sigma}) +\end{aligned}

+

Also,

+

f(x)=F(x)=ddx[Φ(xuσ)]=1σφ(xuσ)=12πσ2e((xμ)2)2σ2f(x) = F\prime(x) = \frac{d}{dx}\left\lbrack \Phi(\frac{x - u}{\sigma}) \right\rbrack = \frac{1}{\sigma}\varphi(\frac{x - u}{\sigma}) = \frac{1}{\sqrt{2\pi\sigma^{2}}}e^{\frac{- \left( (x - \mu)^{2} \right)}{2\sigma^{2}}}

+

Definition.

+

Let μ\mu be real and σ>0\sigma > 0. A random variable XX has the +normal distribution with mean μ\mu and variance σ2\sigma^{2} if XX +has density function

+

f(x)=12πσ2e((xμ)2)2σ2f(x) = \frac{1}{\sqrt{2\pi\sigma^{2}}}e^{\frac{- \left( (x - \mu)^{2} \right)}{2\sigma^{2}}}

+

on the real line. Abbreviate this by +XN(μ,σ2)X\sim N\left( \mu,\sigma^{2} \right).

+

Fact.

+

Let XN(μ,σ2)X\sim N\left( \mu,\sigma^{2} \right) and Y=aX+bY = aX + b. Then +YN(aμ+b,a2σ2)Y\sim N\left( a\mu + b,a^{2}\sigma^{2} \right)

+

That is, YY is normally distributed with parameters +(aμ+b,a2σ2)\left( a\mu + b,a^{2}\sigma^{2} \right). In particular, +Z=XμσN(0,1)Z = \frac{X - \mu}{\sigma}\sim N(0,1) is a standard normal variable.

+

Expectation

+

Let’s discuss the expectation of a random variable, which is a similar +idea to the basic concept of mean.

+

Definition.

+

The expectation or mean of a discrete random variable XX is the +weighted average, with weights assigned by the corresponding +probabilities.

+

E(X)=all xixip(xi)E(X) = \sum_{\text{all }x_{i}}x_{i} \cdot p\left( x_{i} \right)

+

Example.

+

Find the expected value of a single roll of a fair die.

+
    +
  • X= score dotsX = \frac{\text{ score }}{\text{ dots}}

  • +
  • x=1,2,3,4,5,6x = 1,2,3,4,5,6

  • +
  • p(x)=16,16,16,16,16,16p(x) = \frac{1}{6},\frac{1}{6},\frac{1}{6},\frac{1}{6},\frac{1}{6},\frac{1}{6}

  • +
+

E[x]=116+216+616E\lbrack x\rbrack = 1 \cdot \frac{1}{6} + 2 \cdot \frac{1}{6}\ldots + 6 \cdot \frac{1}{6}

+

Binomial expected value

+

E[x]=npE\lbrack x\rbrack = np

+

Bernoulli expected value

+

Bernoulli is just binomial with one trial.

+

Recall that P(X=1)=pP(X = 1) = p and P(X=0)=1pP(X = 0) = 1 - p.

+

E[X]=1P(X=1)+0P(X=0)=pE\lbrack X\rbrack = 1 \cdot P(X = 1) + 0 \cdot P(X = 0) = p

+

Let AA be an event on Ω\Omega. Its indicator random variable IAI_{A} +is defined for ωΩ\omega \in \Omega by

+

IA(ω)={1, if ωA0, if ωAI_{A}(\omega) = \begin{cases} +1\text{, if } & \omega \in A \\ +0\text{, if } & \omega \notin A +\end{cases}

+

E[IA]=1P(A)=P(A)E\left\lbrack I_{A} \right\rbrack = 1 \cdot P(A) = P(A)

+

Geometric expected value

+

Let p[0,1]p \in \lbrack 0,1\rbrack and X Geom[p]X\sim\text{ Geom}\lbrack p\rbrack +be a geometric RV with probability of success pp. Recall that the +p.m.f. is pqk1pq^{k - 1}, where prob. of failure is defined by +q1pq ≔ 1 - p.

+

Then

+

E[X]=k=1kpqk1=pk=1kqk1\begin{aligned} +E\lbrack X\rbrack & = \sum_{k = 1}^{\infty}kpq^{k - 1} \\ + & = p \cdot \sum_{k = 1}^{\infty}k \cdot q^{k - 1} +\end{aligned}

+

Now recall from calculus that you can differentiate a power series term +by term inside its radius of convergence. So for |t|<1|t| < 1,

+

k=1ktk1=k=1ddttk=ddtk=1tk=ddt(11t)=1(1t)2E[x]=k=1kpqk1=pk=1kqk1=p(1(1q)2)=1p\begin{array}{r} +\sum_{k = 1}^{\infty}kt^{k - 1} = \sum_{k = 1}^{\infty}\frac{d}{dt}t^{k} = \frac{d}{dt}\sum_{k = 1}^{\infty}t^{k} = \frac{d}{dt}\left( \frac{1}{1 - t} \right) = \frac{1}{(1 - t)^{2}} \\ +\therefore E\lbrack x\rbrack = \sum_{k = 1}^{\infty}kpq^{k - 1} = p\sum_{k = 1}^{\infty}kq^{k - 1} = p\left( \frac{1}{(1 - q)^{2}} \right) = \frac{1}{p} +\end{array}

+

Expected value of a continuous RV

+

Definition.

+

The expectation or mean of a continuous random variable XX with density +function ff is

+

E[x]=xf(x)dxE\lbrack x\rbrack = \int_{- \infty}^{\infty}x \cdot f(x)dx

+

An alternative symbol is μ=E[x]\mu = E\lbrack x\rbrack.

+

μ\mu is the “first moment” of XX, analogous to physics, it’s the +“center of gravity” of XX.

+

Remark.

+

In general when moving between discrete and continuous RV, replace sums +with integrals, p.m.f. with p.d.f., and vice versa.

+

Example.

+

Suppose XX is a continuous RV with p.d.f.

+

fX(x)={2x, 0<x<10, elsewheref_{X}(x) = \begin{cases} +2x\text{, } & 0 < x < 1 \\ +0\text{, } & \text{elsewhere} +\end{cases}

+

E[X]=xf(x)dx=01x2xdx=23E\lbrack X\rbrack = \int_{- \infty}^{\infty}x \cdot f(x)dx = \int_{0}^{1}x \cdot 2xdx = \frac{2}{3}

+

Example (Uniform expectation).

+

Let XX be a uniform random variable on the interval +[a,b]\lbrack a,b\rbrack with X Unif[a,b]X\sim\text{ Unif}\lbrack a,b\rbrack. Find +the expected value of XX.

+

E[X]=xf(x)dx=abxbadx=1baabxdx=1bab2a22=b+a2 midpoint formula\begin{array}{r} +E\lbrack X\rbrack = \int_{- \infty}^{\infty}x \cdot f(x)dx = \int_{a}^{b}\frac{x}{b - a}dx \\ + = \frac{1}{b - a}\int_{a}^{b}xdx = \frac{1}{b - a} \cdot \frac{b^{2} - a^{2}}{2} = \underset{\text{ midpoint formula}}{\underbrace{\frac{b + a}{2}}} +\end{array}

+

Example (Exponential expectation).

+

Find the expected value of an exponential RV, with p.d.f.

+

fX(x)={λeλx, x>00, elsewheref_{X}(x) = \begin{cases} +\lambda e^{- \lambda x}\text{, } & x > 0 \\ +0\text{, } & \text{elsewhere} +\end{cases}

+

E[x]=xf(x)dx=0xλeλxdx=λ0xeλxdx=λ[x1λeλx|x=0x=01λeλxdx]=1λ\begin{array}{r} +E\lbrack x\rbrack = \int_{- \infty}^{\infty}x \cdot f(x)dx = \int_{0}^{\infty}x \cdot \lambda e^{- \lambda x}dx \\ + = \lambda \cdot \int_{0}^{\infty}x \cdot e^{- \lambda x}dx \\ + = \lambda \cdot \left\lbrack \left. -x\frac{1}{\lambda}e^{- \lambda x} \right|_{x = 0}^{x = \infty} - \int_{0}^{\infty} - \frac{1}{\lambda}e^{- \lambda x}dx \right\rbrack \\ + = \frac{1}{\lambda} +\end{array}

+

Example (Uniform dartboard).

+

Our dartboard is a disk of radius r0r_{0} and the dart lands uniformly +at random on the disk when thrown. Let RR be the distance of the dart +from the center of the disk. Find E[R]E\lbrack R\rbrack given density +function

+

fR(t)={2tr02, 0tr00, t<0 or t>r0f_{R}(t) = \begin{cases} +\frac{2t}{r_{0}^{2}}\text{, } & 0 \leq t \leq r_{0} \\ +0\text{, } & t < 0\text{ or }t > r_{0} +\end{cases}

+

E[R]=tfR(t)dt=0r0t2tr02dt=23r0\begin{array}{r} +E\lbrack R\rbrack = \int_{- \infty}^{\infty}tf_{R}(t)dt \\ + = \int_{0}^{r_{0}}t \cdot \frac{2t}{r_{0}^{2}}dt \\ + = \frac{2}{3}r_{0} +\end{array}

+

Expectation of derived values

+

If we can find the expected value of XX, can we find the expected value +of X2X^{2}? More precisely, can we find +E[X2]E\left\lbrack X^{2} \right\rbrack?

+

If the distribution is easy to see, then this is trivial. Otherwise we +have the following useful property:

+

E[X2]=all xx2fX(x)dxE\left\lbrack X^{2} \right\rbrack = \int_{\text{all }x}x^{2}f_{X}(x)dx

+

(for continuous RVs).

+

And in the discrete case,

+

E[X2]=all xx2pX(x)E\left\lbrack X^{2} \right\rbrack = \sum_{\text{all }x}x^{2}p_{X}(x)

+

In fact E[X2]E\left\lbrack X^{2} \right\rbrack is so important that we call +it the mean square.

+

Fact.

+

More generally, a real valued function g(X)g(X) defined on the range of +XX is itself a random variable (with its own distribution).

+

We can find expected value of g(X)g(X) by

+

E[g(x)]=g(x)f(x)dxE\left\lbrack g(x) \right\rbrack = \int_{- \infty}^{\infty}g(x)f(x)dx

+

or

+

E[g(x)]=all xg(x)f(x)E\left\lbrack g(x) \right\rbrack = \sum_{\text{all }x}g(x)f(x)

+

Example.

+

You roll a fair die to determine the winnings (or losses) WW of a +player as follows:

+

W={1,iftherollis1,2,or31,iftherollisa43,iftherollis5or6W = \begin{cases} + - 1,\ if\ the\ roll\ is\ 1,\ 2,\ or\ 3 \\ +1,\ if\ the\ roll\ is\ a\ 4 \\ +3,\ if\ the\ roll\ is\ 5\ or\ 6 +\end{cases}

+

What is the expected winnings/losses for the player during 1 roll of the +die?

+

Let XX denote the outcome of the roll of the die. Then we can define +our random variable as W=g(X)W = g(X) where the function gg is defined by +g(1)=g(2)=g(3)=1g(1) = g(2) = g(3) = - 1 and so on.

+

Note that P(W=1)=P(X=1X=2X=3)=12P(W = - 1) = P(X = 1 \cup X = 2 \cup X = 3) = \frac{1}{2}. +Likewise P(W=1)=P(X=4)=16P(W = 1) = P(X = 4) = \frac{1}{6}, and +P(W=3)=P(X=5X=6)=13P(W = 3) = P(X = 5 \cup X = 6) = \frac{1}{3}.

+

Then E[g(X)]=E[W]=(1)P(W=1)+(1)P(W=1)+(3)P(W=3)=12+16+1=23\begin{array}{r} +E\left\lbrack g(X) \right\rbrack = E\lbrack W\rbrack = ( - 1) \cdot P(W = - 1) + (1) \cdot P(W = 1) + (3) \cdot P(W = 3) \\ + = - \frac{1}{2} + \frac{1}{6} + 1 = \frac{2}{3} +\end{array}

+

Example.

+

A stick of length ll is broken at a uniformly chosen random location. +What is the expected length of the longer piece?

+

Idea: if you break it before the halfway point, then the longer piece +has length given by lxl - x. If you break it after the halfway point, +the longer piece has length xx.

+

Let the interval [0,l]\lbrack 0,l\rbrack represent the stick and let +X Unif[0,l]X\sim\text{ Unif}\lbrack 0,l\rbrack be the location where the stick is +broken. Then XX has density f(x)=1lf(x) = \frac{1}{l} on +[0,l]\lbrack 0,l\rbrack and 0 elsewhere.

+

Let g(x)g(x) be the length of the longer piece when the stick is broken at +xx,

+

g(x)={1x, 0x<l2x, 12xlg(x) = \begin{cases} +1 - x\text{, } & 0 \leq x < \frac{l}{2} \\ +x\text{, } & \frac{1}{2} \leq x \leq l +\end{cases}

+

Then E[g(X)]=g(x)f(x)dx=0l2lxldx+l2lxldx=34l\begin{array}{r} +E\left\lbrack g(X) \right\rbrack = \int_{- \infty}^{\infty}g(x)f(x)dx = \int_{0}^{\frac{l}{2}}\frac{l - x}{l}dx + \int_{\frac{l}{2}}^{l}\frac{x}{l}dx \\ + = \frac{3}{4}l +\end{array}

+

So we expect the longer piece to be 34\frac{3}{4} of the total length, +which is a bit pathological.

+

Moments of a random variable

+

We continue discussing expectation but we introduce new terminology.

+

Fact.

+

The nthn^{\text{th}} moment (or nthn^{\text{th}} raw moment) of a discrete +random variable XX with p.m.f. pX(x)p_{X}(x) is the expectation

+

E[Xn]=kknpX(k)=μnE\left\lbrack X^{n} \right\rbrack = \sum_{k}k^{n}p_{X}(k) = \mu_{n}

+

If XX is continuous, then we have analogously

+

E[Xn]=xnfX(x)=μnE\left\lbrack X^{n} \right\rbrack = \int_{- \infty}^{\infty}x^{n}f_{X}(x) = \mu_{n}

+

The deviation is given by σ\sigma and the variance is given by +σ2\sigma^{2} and

+

σ2=μ2(μ1)2\sigma^{2} = \mu_{2} - \left( \mu_{1} \right)^{2}

+

μ3\mu_{3} is used to measure “skewness” / asymmetry of a distribution. +For example, the normal distribution is very symmetric.

+

μ4\mu_{4} is used to measure kurtosis/peakedness of a distribution.

+

Central moments

+

Previously we discussed “raw moments.” Be careful not to confuse them +with central moments.

+

Fact.

+

The nthn^{\text{th}} central moment of a discrete random variable XX +with p.m.f. pX(x)p_{X}(x) is the expected value of the difference about the +mean raised to the nthn^{\text{th}} power

+

E[(Xμ)n]=k(kμ)npX(k)=μnE\left\lbrack (X - \mu)^{n} \right\rbrack = \sum_{k}(k - \mu)^{n}p_{X}(k) = \mu\prime_{n}

+

And of course in the continuous case,

+

E[(Xμ)n]=(xμ)nfX(x)=μnE\left\lbrack (X - \mu)^{n} \right\rbrack = \int_{- \infty}^{\infty}(x - \mu)^{n}f_{X}(x) = \mu\prime_{n}

+

In particular,

+

μ1=E[(Xμ)1]=(xμ)1fX(x)dx=xfX(x)dx=μfX(x)dx=μμ1=0μ2=E[(Xμ)2]=σX2= Var(X)\begin{array}{r} +\mu\prime_{1} = E\left\lbrack (X - \mu)^{1} \right\rbrack = \int_{- \infty}^{\infty}(x - \mu)^{1}f_{X}(x)dx \\ + = \int_{\infty}^{\infty}xf_{X}(x)dx = \int_{- \infty}^{\infty}\mu f_{X}(x)dx = \mu - \mu \cdot 1 = 0 \\ +\mu\prime_{2} = E\left\lbrack (X - \mu)^{2} \right\rbrack = \sigma_{X}^{2} = \text{ Var}(X) +\end{array}

+

Example.

+

Let YY be a uniformly chosen integer from +{0,1,2,,m}\left\{ 0,1,2,\ldots,m \right\}. Find the first and second moment of +YY.

+

The p.m.f. of YY is pY(k)=1m+1p_{Y}(k) = \frac{1}{m + 1} for +k[0,m]k \in \lbrack 0,m\rbrack. Thus,

+

E[Y]=k=0mk1m+1=1m+1k=0mk=m2\begin{array}{r} +E\lbrack Y\rbrack = \sum_{k = 0}^{m}k\frac{1}{m + 1} = \frac{1}{m + 1}\sum_{k = 0}^{m}k \\ + = \frac{m}{2} +\end{array}

+

Then,

+

E[Y2]=k=0mk21m+1=1m+1=m(2m+1)6E\left\lbrack Y^{2} \right\rbrack = \sum_{k = 0}^{m}k^{2}\frac{1}{m + 1} = \frac{1}{m + 1} = \frac{m(2m + 1)}{6}

+

Example.

+

Let c>0c > 0 and let UU be a uniform random variable on the interval +[0,c]\lbrack 0,c\rbrack. Find the nthn^{\text{th}} moment for UU for all +positive integers nn.

+

The density function of UU is

+

f(x)={1c, if x[0,c]0, otherwisef(x) = \begin{cases} +\frac{1}{c}\text{, if } & x \in \lbrack 0,c\rbrack \\ +0\text{, } & \text{otherwise} +\end{cases}

+

Therefore the nthn^{\text{th}} moment of UU is,

+

E[Un]=xnf(x)dxE\left\lbrack U^{n} \right\rbrack = \int_{- \infty}^{\infty}x^{n}f(x)dx

+

Example.

+

Suppose the random variable X Exp(λ)X\sim\text{ Exp}(\lambda). Find the second +moment of XX.

+

E[X2]=0x2λeλxdx=1λ20u2eudu=1λ2Γ(2+1)=2!λ2\begin{array}{r} +E\left\lbrack X^{2} \right\rbrack = \int_{0}^{\infty}x^{2}\lambda e^{- \lambda x}dx \\ + = \frac{1}{\lambda^{2}}\int_{0}^{\infty}u^{2}e^{- u}du \\ + = \frac{1}{\lambda^{2}}\Gamma(2 + 1) = \frac{2!}{\lambda^{2}} +\end{array}

+

Fact.

+

In general, to find teh nthn^{\text{th}} moment of +X Exp(λ)X\sim\text{ Exp}(\lambda), +E[Xn]=0xnλeλxdx=n!λnE\left\lbrack X^{n} \right\rbrack = \int_{0}^{\infty}x^{n}\lambda e^{- \lambda x}dx = \frac{n!}{\lambda^{n}}

+

Median and quartiles

+

When a random variable has rare (abnormal) values, its expectation may +be a bad indicator of where the center of the distribution lies.

+

Definition.

+

The median of a random variable XX is any real value mm that +satisfies

+

P(Xm)12 and P(Xm)12P(X \geq m) \geq \frac{1}{2}\text{ and }P(X \leq m) \geq \frac{1}{2}

+

With half the probability on both {Xm}\left\{ X \leq m \right\} and +{Xm}\left\{ X \geq m \right\}, the median is representative of the +midpoint of the distribution. We say that the median is more robust +because it is less affected by outliers. It is not necessarily unique.

+

Example.

+

Let XX be discretely uniformly distributed in the set +{100,1,2,,3,,9}\left\{ - 100,1,2,,3,\ldots,9 \right\} so XX has probability mass +function pX(100)=pX(1)==pX(9)p_{X}( - 100) = p_{X}(1) = \cdots = p_{X}(9)

+

Find the expected value and median of XX.

+

E[X]=(100)110+(1)110++(9)110=5.5E\lbrack X\rbrack = ( - 100) \cdot \frac{1}{10} + (1) \cdot \frac{1}{10} + \cdots + (9) \cdot \frac{1}{10} = - 5.5

+

While the median is any number m[4,5]m \in \lbrack 4,5\rbrack.

+

The median reflects the fact that 90% of the values and probability is +in the range 1,2,,91,2,\ldots,9 while the mean is heavily influenced by the +100- 100 value.

+ +]]>
+
+ + An assortment of preliminaries on linear algebra + + https://blog.youwen.dev/an-assortment-of-preliminaries-on-linear-algebra.html + 2025-02-15T00:00:00Z + 2025-02-15T00:00:00Z + +
+

+ An assortment of preliminaries on linear algebra +

+

+ and also a test for pandoc +

+
2025-02-15
+
+ +
+
+

This entire document was written entirely in Typst and +directly translated to this file by Pandoc. It serves as a proof of concept of +a way to do static site generation from Typst files instead of Markdown.

+
+

I figured I should write this stuff down before I forgot it.

+

Basic Notions

+

Vector spaces

+

Before we can understand vectors, we need to first discuss vector +spaces. Thus far, you have likely encountered vectors primarily in +physics classes, generally in the two-dimensional plane. You may +conceptualize them as arrows in space. For vectors of size >3> 3, a hand +waving argument is made that they are essentially just arrows in higher +dimensional spaces.

+

It is helpful to take a step back from this primitive geometric +understanding of the vector. Let us build up a rigorous idea of vectors +from first principles.

+

Vector axioms

+

The so-called axioms of a vector space (which we’ll call the vector +space VV) are as follows:

+
    +
  1. Commutativity: u+v=v+u, u,vVu + v = v + u,\text{ }\forall u,v \in V

  2. +
  3. Associativity: +(u+v)+w=u+(v+w), u,v,wV(u + v) + w = u + (v + w),\text{ }\forall u,v,w \in V

  4. +
  5. Zero vector: \exists a special vector, denoted 00, such that +v+0=v, vVv + 0 = v,\text{ }\forall v \in V

  6. +
  7. Additive inverse: +vV, wV such that v+w=0\forall v \in V,\text{ }\exists w \in V\text{ such that }v + w = 0. +Such an additive inverse is generally denoted v- v

  8. +
  9. Multiplicative identity: 1v=v, vV1v = v,\text{ }\forall v \in V

  10. +
  11. Multiplicative associativity: +(αβ)v=α(βv) vV, scalars α,β(\alpha\beta)v = \alpha(\beta v)\text{ }\forall v \in V,\text{ scalars }\alpha,\beta

  12. +
  13. Distributive property for vectors: +α(u+v)=αu+αv u,vV, scalars α\alpha(u + v) = \alpha u + \alpha v\text{ }\forall u,v \in V,\text{ scalars }\alpha

  14. +
  15. Distributive property for scalars: +(α+β)v=αv+βv vV, scalars α,β(\alpha + \beta)v = \alpha v + \beta v\text{ }\forall v \in V,\text{ scalars }\alpha,\beta

  16. +
+

It is easy to show that the zero vector 00 and the additive inverse +v- v are unique. We leave the proof of this fact as an exercise.

+

These may seem difficult to memorize, but they are essentially the same +familiar algebraic properties of numbers you know from high school. The +important thing to remember is which operations are valid for what +objects. For example, you cannot add a vector and scalar, as it does not +make sense.

+

Remark. For those of you versed in computer science, you may recognize +this as essentially saying that you must ensure your operations are +type-safe. Adding a vector and scalar is not just “wrong” in the same +sense that 1+1=31 + 1 = 3 is wrong, it is an invalid question entirely +because vectors and scalars and different types of mathematical objects. +See [@chen2024digression] for more.

+

Vectors big and small

+

In order to begin your descent into what mathematicians colloquially +recognize as abstract vapid nonsense, let’s discuss which fields +constitute a vector space. We have the familiar field of \mathbb{R} +where all scalars are real numbers, with corresponding vector spaces +n{\mathbb{R}}^{n}, where nn is the length of the vector. We generally +discuss 2D or 3D vectors, corresponding to vectors of length 2 or 3; in +our case, 2{\mathbb{R}}^{2} and 3{\mathbb{R}}^{3}.

+

However, vectors in n{\mathbb{R}}^{n} can really be of any length. +Vectors can be viewed as arbitrary length lists of numbers (for the +computer science folk: think C++ std::vector).

+

Example. (123456789)9\begin{pmatrix} +1 \\ +2 \\ +3 \\ +4 \\ +5 \\ +6 \\ +7 \\ +8 \\ +9 +\end{pmatrix} \in {\mathbb{R}}^{9}

+

Keep in mind that vectors need not be in n{\mathbb{R}}^{n} at all. +Recall that a vector space need only satisfy the aforementioned axioms +of a vector space.

+

Example. The vector space n{\mathbb{C}}^{n} is similar to +n{\mathbb{R}}^{n}, except it includes complex numbers. All complex +vector spaces are real vector spaces (as you can simply restrict them to +only use the real numbers), but not the other way around.

+

From now on, let us refer to vector spaces n{\mathbb{R}}^{n} and +n{\mathbb{C}}^{n} as 𝔽n{\mathbb{F}}^{n}.

+

In general, we can have a vector space where the scalars are in an +arbitrary field, as long as the axioms are satisfied.

+

Example. The vector space of all polynomials of at most degree 3, or +3{\mathbb{P}}^{3}. It is not yet clear what this vector may look like. +We shall return to this example once we discuss basis.

+

Vector addition. Multiplication

+

Vector addition, represented by ++ can be done entrywise.

+

Example.

+

(123)+(456)=(1+42+53+6)=(579)\begin{pmatrix} +1 \\ +2 \\ +3 +\end{pmatrix} + \begin{pmatrix} +4 \\ +5 \\ +6 +\end{pmatrix} = \begin{pmatrix} +1 + 4 \\ +2 + 5 \\ +3 + 6 +\end{pmatrix} = \begin{pmatrix} +5 \\ +7 \\ +9 +\end{pmatrix} (123)(456)=(142536)=(41018)\begin{pmatrix} +1 \\ +2 \\ +3 +\end{pmatrix} \cdot \begin{pmatrix} +4 \\ +5 \\ +6 +\end{pmatrix} = \begin{pmatrix} +1 \cdot 4 \\ +2 \cdot 5 \\ +3 \cdot 6 +\end{pmatrix} = \begin{pmatrix} +4 \\ +10 \\ +18 +\end{pmatrix}

+

This is simple enough to understand. Again, the difficulty is simply +ensuring that you always perform operations with the correct types. +For example, once we introduce matrices, it doesn’t make sense to +multiply or add vectors and matrices in this fashion.

+

Vector-scalar multiplication

+

Multiplying a vector by a scalar simply results in each entry of the +vector being multiplied by the scalar.

+

Example.

+

β(abc)=(βaβbβc)\beta\begin{pmatrix} +a \\ +b \\ +c +\end{pmatrix} = \begin{pmatrix} +\beta \cdot a \\ +\beta \cdot b \\ +\beta \cdot c +\end{pmatrix}

+

Linear combinations

+

Given vector spaces VV and WW and vectors vVv \in V and wWw \in W, +v+wv + w is the linear combination of vv and ww.

+

Spanning systems

+

We say that a set of vectors v1,v2,,vnVv_{1},v_{2},\ldots,v_{n} \in V span VV +if the linear combination of the vectors can represent any arbitrary +vector vVv \in V.

+

Precisely, given scalars α1,α2,,αn\alpha_{1},\alpha_{2},\ldots,\alpha_{n},

+

α1v1+α2v2++αnvn=v,vV\alpha_{1}v_{1} + \alpha_{2}v_{2} + \ldots + \alpha_{n}v_{n} = v,\forall v \in V

+

Note that any scalar αk\alpha_{k} could be 0. Therefore, it is possible +for a subset of a spanning system to also be a spanning system. The +proof of this fact is left as an exercise.

+

Intuition for linear independence and dependence

+

We say that vv and ww are linearly independent if vv cannot be +represented by the scaling of ww, and ww cannot be represented by the +scaling of vv. Otherwise, they are linearly dependent.

+

You may intuitively visualize linear dependence in the 2D plane as two +vectors both pointing in the same direction. Clearly, scaling one vector +will allow us to reach the other vector. Linear independence is +therefore two vectors pointing in different directions.

+

Of course, this definition applies to vectors in any 𝔽n{\mathbb{F}}^{n}.

+

Formal definition of linear dependence and independence

+

Let us formally define linear independence for arbitrary vectors in +𝔽n{\mathbb{F}}^{n}. Given a set of vectors

+

v1,v2,,vnVv_{1},v_{2},\ldots,v_{n} \in V

+

we say they are linearly independent iff. the equation

+

α1v1+α2v2++αnvn=0\alpha_{1}v_{1} + \alpha_{2}v_{2} + \ldots + \alpha_{n}v_{n} = 0

+

has only a unique set of solutions +α1,α2,,αn\alpha_{1},\alpha_{2},\ldots,\alpha_{n} such that all αn\alpha_{n} are +zero.

+

Equivalently,

+

|α1|+|α2|++|αn|=0\left| \alpha_{1} \right| + \left| \alpha_{2} \right| + \ldots + \left| \alpha_{n} \right| = 0

+

More precisely,

+

i=1k|αi|=0\sum_{i = 1}^{k}\left| \alpha_{i} \right| = 0

+

Therefore, a set of vectors v1,v2,,vmv_{1},v_{2},\ldots,v_{m} is linearly +dependent if the opposite is true, that is there exists solution +α1,α2,,αm\alpha_{1},\alpha_{2},\ldots,\alpha_{m} to the equation

+

α1v1+α2v2++αmvm=0\alpha_{1}v_{1} + \alpha_{2}v_{2} + \ldots + \alpha_{m}v_{m} = 0

+

such that

+

i=1k|αi|0\sum_{i = 1}^{k}\left| \alpha_{i} \right| \neq 0

+

Basis

+

We say a system of vectors v1,v2,,vnVv_{1},v_{2},\ldots,v_{n} \in V is a basis +in VV if the system is both linearly independent and spanning. That is, +the system must be able to represent any vector in VV as well as +satisfy our requirements for linear independence.

+

Equivalently, we may say that a system of vectors in VV is a basis in +VV if any vector vVv \in V admits a unique representation as a linear +combination of vectors in the system. This is equivalent to our previous +statement, that the system must be spanning and linearly independent.

+

Standard basis

+

We may define a standard basis for a vector space. By convention, the +standard basis in 2{\mathbb{R}}^{2} is

+

(10)(01)\begin{pmatrix} +1 \\ +0 +\end{pmatrix}\begin{pmatrix} +0 \\ +1 +\end{pmatrix}

+

Verify that the above is in fact a basis (that is, linearly independent +and generating).

+

Recalling the definition of the basis, we can represent any vector in +2{\mathbb{R}}^{2} as the linear combination of the standard basis.

+

Therefore, for any arbitrary vector v2v \in {\mathbb{R}}^{2}, we can +represent it as

+

v=α1(10)+α2(01)v = \alpha_{1}\begin{pmatrix} +1 \\ +0 +\end{pmatrix} + \alpha_{2}\begin{pmatrix} +0 \\ +1 +\end{pmatrix}

+

Let us call α1\alpha_{1} and α2\alpha_{2} the coordinates of the +vector. Then, we can write vv as

+

v=(α1α2)v = \begin{pmatrix} +\alpha_{1} \\ +\alpha_{2} +\end{pmatrix}

+

For example, the vector

+

(12)\begin{pmatrix} +1 \\ +2 +\end{pmatrix}

+

represents

+

1(10)+2(01)1 \cdot \begin{pmatrix} +1 \\ +0 +\end{pmatrix} + 2 \cdot \begin{pmatrix} +0 \\ +1 +\end{pmatrix}

+

Verify that this aligns with your previous intuition of vectors.

+

You may recognize the standard basis in 2{\mathbb{R}}^{2} as the +familiar unit vectors

+

î,ĵ\hat{i},\hat{j}

+

This aligns with the fact that

+

(αβ)=αî+βĵ\begin{pmatrix} +\alpha \\ +\beta +\end{pmatrix} = \alpha\hat{i} + \beta\hat{j}

+

However, we may define a standard basis in any arbitrary vector space. +So, let

+

e1,e2,,ene_{1},e_{2},\ldots,e_{n}

+

be a standard basis in 𝔽n{\mathbb{F}}^{n}. Then, the coordinates +α1,α2,,αn\alpha_{1},\alpha_{2},\ldots,\alpha_{n} of a vector +v𝔽nv \in {\mathbb{F}}^{n} represent the following

+

(α1α2αn)=α1e1+α2+e2+αnen\begin{pmatrix} +\alpha_{1} \\ +\alpha_{2} \\ + \vdots \\ +\alpha_{n} +\end{pmatrix} = \alpha_{1}e_{1} + \alpha_{2} + e_{2} + \alpha_{n}e_{n}

+

Using our new notation, the standard basis in 2{\mathbb{R}}^{2} is

+

e1=(10),e2=(01)e_{1} = \begin{pmatrix} +1 \\ +0 +\end{pmatrix},e_{2} = \begin{pmatrix} +0 \\ +1 +\end{pmatrix}

+

Matrices

+

Before discussing any properties of matrices, let’s simply reiterate +what we learned in class about their notation. We say a matrix with rows +of length mm, and columns of size nn (in less precise terms, a matrix +with length mm and height nn) is a m×nm \times n matrix.

+

Given a matrix

+

A=(123456789)A = \begin{pmatrix} +1 & 2 & 3 \\ +4 & 5 & 6 \\ +7 & 8 & 9 +\end{pmatrix}

+

we refer to the entry in row jj and column kk as Aj,kA_{j,k} .

+

Matrix transpose

+

A formalism that is useful later on is called the transpose, and we +obtain it from a matrix AA by switching all the rows and columns. More +precisely, each row becomes a column instead. We use the notation +ATA^{T} to represent the transpose of AA.

+

(123456)T=(142536)\begin{pmatrix} +1 & 2 & 3 \\ +4 & 5 & 6 +\end{pmatrix}^{T} = \begin{pmatrix} +1 & 4 \\ +2 & 5 \\ +3 & 6 +\end{pmatrix}

+

Formally, we can say (AT)j,k=Ak,j\left( A^{T} \right)_{j,k} = A_{k,j}

+

Linear transformations

+

A linear transformation T:VWT:V \rightarrow W is a mapping between two +vector spaces VWV \rightarrow W, such that the following axioms are +satisfied:

+
    +
  1. T(v+w)=T(v)+T(w),vV,wWT(v + w) = T(v) + T(w),\forall v \in V,\forall w \in W

  2. +
  3. T(αv)+T(βw)=αT(v)+βT(w),vV,wWT(\alpha v) + T(\beta w) = \alpha T(v) + \beta T(w),\forall v \in V,\forall w \in W, +for all scalars α,β\alpha,\beta

  4. +
+

Definition. TT is a linear transformation iff.

+

T(αv+βw)=αT(v)+βT(w)T(\alpha v + \beta w) = \alpha T(v) + \beta T(w)

+

Abuse of notation. From now on, we may elide the parentheses and say +that T(v)=Tv,vVT(v) = Tv,\forall v \in V

+

Remark. A phrase that you may commonly hear is that linear +transformations preserve linearity. Essentially, straight lines remain +straight, parallel lines remain parallel, and the origin remains fixed +at 0. Take a moment to think about why this is true (at least, in lower +dimensional spaces you can visualize).

+

Examples.

+
    +
  1. Rotation for V=W=2V = W = {\mathbb{R}}^{2} (i.e. rotation in 2 +dimensions). Given v,w2v,w \in {\mathbb{R}}^{2}, and their linear +combination v+wv + w, a rotation of γ\gamma radians of v+wv + w is +equivalent to first rotating vv and ww individually by γ\gamma +and then taking their linear combination.

  2. +
  3. Differentiation of polynomials. In this case V=nV = {\mathbb{P}}^{n} +and W=n1W = {\mathbb{P}}^{n - 1}, where n{\mathbb{P}}^{n} is the +field of all polynomials of degree at most nn.

    +

    ddx(αv+βw)=αddxv+βddxw,vV,wW, scalars α,β\frac{d}{dx}(\alpha v + \beta w) = \alpha\frac{d}{dx}v + \beta\frac{d}{dx}w,\forall v \in V,w \in W,\forall\text{ scalars }\alpha,\beta

  4. +
+

Matrices represent linear transformations

+

Suppose we wanted to represent a linear transformation +T:𝔽n𝔽mT:{\mathbb{F}}^{n} \rightarrow {\mathbb{F}}^{m}. I propose that we +need encode how TT acts on the standard basis of 𝔽n{\mathbb{F}}^{n}.

+

Using our intuition from lower dimensional vector spaces, we know that +the standard basis in 2{\mathbb{R}}^{2} is the unit vectors î\hat{i} +and ĵ\hat{j}. Because linear transformations preserve linearity (i.e. +all straight lines remain straight and parallel lines remain parallel), +we can encode any transformation as simply changing î\hat{i} and +ĵ\hat{j}. And indeed, if any vector v2v \in {\mathbb{R}}^{2} can be +represented as the linear combination of î\hat{i} and ĵ\hat{j} (this +is the definition of a basis), it makes sense both symbolically and +geometrically that we can represent all linear transformations as the +transformations of the basis vectors.

+

Example. To reflect all vectors v2v \in {\mathbb{R}}^{2} across the +yy-axis, we can simply change the standard basis to

+

(10)(01)\begin{pmatrix} + - 1 \\ +0 +\end{pmatrix}\begin{pmatrix} +0 \\ +1 +\end{pmatrix}

+

Then, any vector in 2{\mathbb{R}}^{2} using this new basis will be +reflected across the yy-axis. Take a moment to justify this +geometrically.

+

Writing a linear transformation as matrix

+

For any linear transformation +T:𝔽m𝔽nT:{\mathbb{F}}^{m} \rightarrow {\mathbb{F}}^{n}, we can write it as an +n×mn \times m matrix AA. That is, there is a matrix AA with nn rows +and mm columns that can represent any linear transformation from +𝔽m𝔽n{\mathbb{F}}^{m} \rightarrow {\mathbb{F}}^{n}.

+

How should we write this matrix? Naturally, from our previous +discussion, we should write a matrix with each column being one of our +new transformed basis vectors.

+

Example. Our yy-axis reflection transformation from earlier. We write +the bases in a matrix

+

(1001)\begin{pmatrix} + - 1 & 0 \\ +0 & 1 +\end{pmatrix}

+

Matrix-vector multiplication

+

Perhaps you now see why the so-called matrix-vector multiplication is +defined the way it is. Recalling our definition of a basis, given a +basis in VV, any vector vVv \in V can be written as the linear +combination of the vectors in the basis. Then, given a linear +transformation represented by the matrix containing the new basis, we +simply write the linear combination with the new basis instead.

+

Example. Let us first write a vector in the standard basis in +2{\mathbb{R}}^{2} and then show how our matrix-vector multiplication +naturally corresponds to the definition of the linear transformation.

+

(12)2\begin{pmatrix} +1 \\ +2 +\end{pmatrix} \in {\mathbb{R}}^{2}

+

is the same as

+

1(10)+2(01)1 \cdot \begin{pmatrix} +1 \\ +0 +\end{pmatrix} + 2 \cdot \begin{pmatrix} +0 \\ +1 +\end{pmatrix}

+

Then, to perform our reflection, we need only replace the basis vector +(10)\begin{pmatrix} +1 \\ +0 +\end{pmatrix} with (10)\begin{pmatrix} + - 1 \\ +0 +\end{pmatrix}.

+

Then, the reflected vector is given by

+

1(10)+2(01)=(12)1 \cdot \begin{pmatrix} + - 1 \\ +0 +\end{pmatrix} + 2 \cdot \begin{pmatrix} +0 \\ +1 +\end{pmatrix} = \begin{pmatrix} + - 1 \\ +2 +\end{pmatrix}

+

We can clearly see that this is exactly how the matrix multiplication

+

(1001)(12)\begin{pmatrix} + - 1 & 0 \\ +0 & 1 +\end{pmatrix} \cdot \begin{pmatrix} +1 \\ +2 +\end{pmatrix} is defined! The column-by-coordinate rule for +matrix-vector multiplication says that we multiply the nthn^{\text{th}} +entry of the vector by the corresponding nthn^{\text{th}} column of the +matrix and sum them all up (take their linear combination). This +algorithm intuitively follows from our definition of matrices.

+

Matrix-matrix multiplication

+

As you may have noticed, a very similar natural definition arises for +the matrix-matrix multiplication. Multiplying two matrices ABA \cdot B +is essentially just taking each column of BB, and applying the linear +transformation defined by the matrix AA!

+ +]]>
+
+ + Nix automatic hash updates made easy + + https://blog.youwen.dev/nix-automatic-hash-updates-made-easy.html + 2024-12-28T00:00:00Z + 2024-12-28T00:00:00Z + +
+

+ Nix automatic hash updates made easy +

+

+ keep your flakes up to date +

+
2024-12-28
+
+ +
+
+

Nix users often create flakes to package software out of tree, like this Zen +Browser flake I’ve been +maintaining. Keeping them up to date is a hassle though, since you have to +update the Subresource Integrity (SRI) hashes that Nix uses to ensure +reproducibility.

+

Here’s a neat method I’ve been using to cleanly handle automatic hash updates. +I use Nushell to easily work with data, prefetch +some hashes, and put it all in a JSON file that can be read by Nix at build +time.

+

First, let’s create a file called update.nu. At the top, place this shebang:

+
#!/usr/bin/env -S nix shell nixpkgs#nushell --command nu
+

This will execute the script in a Nushell environment, which is fetched by Nix.

+

Get the up to date URLs

+

We need to obtain the latest version of whatever software we want to update. +In this case, I’ll use GitHub releases as my source of truth.

+

You can use the GitHub API to fetch metadata about all the releases of a repository.

+
https://api.github.com/repos/($repo)/releases
+

Roughly speaking, the raw JSON returned by the GitHub releases API looks something like:

+
[
+   {tag_name: "foo", prerelease: false, ...},
+   {tag_name: "bar", prerelease: true, ...},
+   {tag_name: "foobar", prerelease: false, ...},
+]
+
+

Note that the ordering of the objects in the array is chronological.

+
+

Even if you aren’t using GitHub releases, as long as there is a reliable way to +programmatically fetch the latest download URLs of whatever software you’re +packaging, you can adapt this approach for your specific case.

+
+

We use Nushell’s http get to make a network request. Nushell will +automatically detect and parse the JSON reponse into a Nushell table.

+

In my case, Zen Browser frequently publishes prerelease “twilight” builds which +we don’t want to update to. So, we ignore any releases tagged “twilight” or +marked “prerelease” by filtering them out with the where selector.

+

Finally, we retrieve the tag name of the item at the first index, which would +be the latest release (since the JSON array was chronologically sorted).

+
#!/usr/bin/env -S nix shell nixpkgs#nushell --command nu
+
+# get the latest tag of the latest release that isn't a prerelease
+def get_latest_release [repo: string] {
+  try {
+	http get $"https://api.github.com/repos/($repo)/releases"
+	  | where prerelease == false
+	  | where tag_name != "twilight"
+	  | get tag_name
+	  | get 0
+  } catch { |err| $"Failed to fetch latest release, aborting: ($err.msg)" }
+}
+

Prefetching SRI hashes

+

Now that we have the latest tags, we can easily obtain the latest download URLs, which are of the form:

+
https://github.com/zen-browser/desktop/releases/download/$tag/zen.linux-x86_64.tar.bz2
+https://github.com/zen-browser/desktop/releases/download/$tag/zen.aarch64-x86_64.tar.bz2
+

However, we still need the corresponding SRI hashes to pass to Nix.

+
src = fetchurl {
+   url = "https://github.com/zen-browser/desktop/releases/download/1.0.2-b.5/zen.linux-x86_64.tar.bz2";
+   hash = "sha256-00000000000000000000000000000000000000000000";
+};
+

The easiest way to obtain these new hashes is to update the URL and then set +the hash property to an empty string (""). Nix will spit out an hash mismatch +error with the correct hash. However, this is inconvenient for automated +command line scripting.

+

The Nix documentation mentions +nix-prefetch-url +as a way to obtain these hashes, but as usual, it doesn’t work quite right and +has also been replaced by a more powerful but underdocumented experimental +feature instead.

+

The nix store +prefetch-file +command does what nix-prefetch-url is supposed to do, but handles the caveats +that lead to the wrong hash being produced automatically.

+

Let’s write a Nushell function that outputs the SRI hash of the given URL. We +tell prefetch-file to output structured JSON that we can parse.

+

Since Nushell is a shell, we can directly invoke shell commands like usual, +and then process their output with pipes.

+
def get_nix_hash [url: string] {
+  nix store prefetch-file --hash-type sha256 --json $url | from json | get hash
+}
+

Cool! Now get_nix_hash can give us SRI hashes that look like this:

+
sha256-K3zTCLdvg/VYQNsfeohw65Ghk8FAjhOl8hXU6REO4/s=
+

Putting it all together

+

Now that we’re able to fetch the latest release, obtain the download URLs, and +compute their SRI hashes, we have all the information we need to make an +automated update. However, these URLs are typically hardcoded in our Nix +expressions. The question remains as to how to update these values.

+

A common way I’ve seen updates performed is using something like sed to +modify the Nix expressions in place. However, there’s actually a more +maintainable and easy to understand approach.

+

Let’s have our Nushell script generate the URLs and hashes and place them in a +JSON file! Then, we’ll be able to read the JSON file from Nix and obtain the +URL and hash.

+
def generate_sources [] {
+  let tag = get_latest_release "zen-browser/desktop"
+  let prev_sources = open ./sources.json
+
+  if $tag == $prev_sources.version {
+	# everything up to date
+	return $tag
+  }
+
+  # generate the download URLs with the new tag
+  let x86_64_url = $"https://github.com/zen-browser/desktop/releases/download/($tag)/zen.linux-x86_64.tar.bz2"
+  let aarch64_url = $"https://github.com/zen-browser/desktop/releases/download/($tag)/zen.linux-aarch64.tar.bz2"
+
+  # create a Nushell record that maps cleanly to JSON
+  let sources = {
+    # add a version field as well for convenience
+	version: $tag
+
+	x86_64-linux: {
+	  url:  $x86_64_url
+	  hash: (get_nix_hash $x86_64_url)
+	}
+	aarch64-linux: {
+	  url: $aarch64_url
+	  hash: (get_nix_hash $aarch64_url)
+	}
+  }
+
+  echo $sources | save --force "sources.json"
+
+  return $tag
+}
+

Running this script with

+
chmod +x ./update.nu
+./update.nu
+

gives us the file sources.json:

+
{
+  "version": "1.0.2-b.5",
+  "x86_64-linux": {
+    "url": "https://github.com/zen-browser/desktop/releases/download/1.0.2-b.5/zen.linux-x86_64.tar.bz2",
+    "hash": "sha256-K3zTCLdvg/VYQNsfeohw65Ghk8FAjhOl8hXU6REO4/s="
+  },
+  "aarch64-linux": {
+    "url": "https://github.com/zen-browser/desktop/releases/download/1.0.2-b.5/zen.linux-aarch64.tar.bz2",
+    "hash": "sha256-NwIYylGal2QoWhWKtMhMkAAJQ6iNHfQOBZaxTXgvxAk="
+  }
+}
+

Now, let’s read this from Nix. My file organization looks like the following:

+
./
+| flake.nix
+| zen-browser-unwrapped.nix
+| ...other files...
+

zen-browser-unwrapped.nix contains the derivation for Zen Browser. Let’s add +version, url, and hash to its inputs:

+
{
+  stdenv,
+  fetchurl,
+  # add these below
+  version,
+  url,
+  hash,
+  ...
+}:
+stdenv.mkDerivation {
+   # inherit version from inputs
+  inherit version;
+  pname = "zen-browser-unwrapped";
+
+  src = fetchurl {
+    # inherit the URL and hash we obtain from the inputs
+    inherit url hash;
+  };
+}
+

Then in flake.nix, let’s provide the derivation with the data from sources.json:

+
let
+   supportedSystems = [
+     "x86_64-linux"
+     "aarch64-linux"
+   ];
+   forAllSystems = nixpkgs.lib.genAttrs supportedSystems;
+in
+{
+   # rest of file omitted for simplicity
+   packages = forAllSystems (
+     system:
+     let
+       pkgs = import nixpkgs { inherit system; };
+       # parse sources.json into a Nix attrset
+       sources = builtins.fromJSON (builtins.readFile ./sources.json);
+     in
+     rec {
+       zen-browser-unwrapped = pkgs.callPackage ./zen-browser-unwrapped.nix {
+         inherit (sources.${system}) hash url;
+         inherit (sources) version;
+
+         # if the above is difficult to understand, it is equivalent to the following:
+         hash = sources.${system}.hash;
+         url = sources.${system}.url;
+         version = sources.version;
+       };
+}
+

Now, running nix build .#zen-browser-unwrapped will be able to use the hashes +and URLs from sources.json to build the package!

+

Automating it in CI

+

We now have a script that can automatically fetch releases and generate hashes +and URLs, as well as a way for Nix to use the outputted JSON to build +derivations. All that’s left is to fully automate it using CI!

+

We are going to use GitHub actions for this, as it’s free and easy and you’re +probably already hosting on GitHub.

+

Ensure you’ve set up actions for your repo and given it sufficient permissions.

+

We’re gonna run it on a cron timer that checks for updates at 8 PM PST every day.

+

We use DeterminateSystems’ actions to help set up Nix. Then, we simply run our +update script. Since we made the script return the tag it fetched, we can store +it in a variable and then use it in our commit message.

+
name: Update to latest version, and update flake inputs
+
+on:
+  schedule:
+    - cron: "0 4 * * *"
+  workflow_dispatch:
+
+jobs:
+  update:
+    name: Update flake inputs and browser
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout Repository
+        uses: actions/checkout@v4
+
+      - name: Check flake inputs
+        uses: DeterminateSystems/flake-checker-action@v4
+
+      - name: Install Nix
+        uses: DeterminateSystems/nix-installer-action@main
+
+      - name: Set up magic Nix cache
+        uses: DeterminateSystems/magic-nix-cache-action@main
+
+      - name: Check for update and perform update
+        run: |
+          git config --global user.name "github-actions[bot]"
+          git config --global user.email "github-actions[bot]@users.noreply.github.com"
+
+          chmod +x ./update.nu
+          export ZEN_LATEST_VER="$(./update.nu)"
+
+          git add -A
+          git commit -m "github-actions: update to $ZEN_LATEST_VER" || echo "Latest version is $ZEN_LATEST_VER, no updates found"
+
+          nix flake update --commit-lock-file
+
+          git push
+

Now, our repository will automatically check for and perform updates every day!

+ +]]>
+
+ + a haskellian blog + + https://blog.youwen.dev/a-haskellian-blog.html + 2024-05-25T00:00:00Z + 2024-05-25T12:00:00Z + +
+

+ a haskellian blog +

+

+ a purely functional...blog? +

+
2024-05-25
+
+ (last updated: 2024-05-25T12:00:00Z) +
+
+

Welcome! This is the first post on The Involution and also one that tests all +of the features.

+ + + + +
+

A monad is just a monoid in the category of endofunctors, what’s the problem?

+
+

haskell?

+

This entire blog is generated with hakyll. It’s +a library for generating static sites for Haskell, a purely functional +programming language. It’s a library because it doesn’t come with as many +batteries included as tools like Hugo or Astro. You set up most of the site +yourself by calling the library from Haskell.

+

Here’s a brief excerpt:

+
main :: IO ()
+main = hakyllWith config $ do
+    forM_
+        [ "CNAME"
+        , "favicon.ico"
+        , "robots.txt"
+        , "_config.yml"
+        , "images/*"
+        , "out/*"
+        , "fonts/*"
+        ]
+        $ \f -> match f $ do
+            route idRoute
+            compile copyFileCompiler
+

The code highlighting is also generated by hakyll.

+
+

why?

+

Haskell is a purely functional language with no mutable state. Its syntax +actually makes it pretty elegant for declaring routes and “rendering” pipelines.

+
    +
  1. Haskell is cool.
  2. +
  3. It comes with enough features that I don’t feel like I have to build +everything from scratch.
  4. +
  5. It comes with Pandoc, a Haskell library for converting between markdown +formats. It’s probably more powerful than anything you could do in nodejs. +It renders all of the markdown to HTML as well as the math. +
      +
    1. It supports KaTeX as well as MathML. I’m a little disappointed with the +KaTeX though. It doesn’t directly render it, but simply injects the KaTeX +files and renders it client-side.
    2. +
  6. +
+

speaking of math

+

We can have math inline, like so: +ex2dx=π\int_{-\infty}^\infty \, e^{-x^2}\,dx = \sqrt{\pi}. This site ships semantic +MathML math with its HTML, and the MathJax script to the client.

+

It’d be nice if MathML could just be used and supported across all browsers, but +unfortunately we still aren’t quite there yet. Firefox is the only one where +everything looks 80% of the way to LaTeX. On Safari and Chrome, even simple +equations like π\sqrt{\pi} render improperly.

+

Pros of MathML:

+
    +
  • A little more accessible
  • +
  • Can be rendered without additional stylesheets. I just installed the Latin +Modern font, but this isn’t even really necessary
  • +
  • Built-in to most browsers (#UseThePlatform)
  • +
+

Cons:

+
    +
  • Isn’t fully standardized. Might look different on different browsers
  • +
  • Rendering quality isn’t as good as KaTeX
  • +
+

This site has MathJax render all of the math so it looks nice and standardized +across browsers, but the math still displays regardless (like say if MathJax +couldn’t load due to slow network) because of MathML. Best of both worlds.

+

Let’s try it now. Here’s a simple theorem:

+

an+bncn{a,b,c}n3 +a^n + b^n \ne c^n \, \forall\,\left\{ a,\,b,\,c \right\} \in \mathbb{Z} \land n \ge 3 +

+

The proof is trivial and will be left as an exercise to the reader.

+

seems a little overengineered

+

Probably is. Not as much as the old one, though.

+ +]]>
+
+ +
diff --git a/css/code.css b/css/code.css new file mode 100644 index 0000000..fe0682d --- /dev/null +++ b/css/code.css @@ -0,0 +1 @@ +pre>code.sourceCode{white-space:pre;position:relative}pre>code.sourceCode>span{line-height:1.25}pre>code.sourceCode>span:empty{height:1.2em}.sourceCode{overflow:visible}code.sourceCode>span{color:inherit;text-decoration:inherit}div.sourceCode{margin:1em 0}pre.sourceCode{margin:0}@media screen{div.sourceCode{overflow:auto}}@media print{pre>code.sourceCode{white-space:pre-wrap}pre>code.sourceCode>span{display:inline-block;text-indent:-5em;padding-left:5em}}pre.numberSource code{counter-reset:source-line 0}pre.numberSource code>span{position:relative;left:-4em;counter-increment:source-line}pre.numberSource code>span>a:first-child::before{content:counter(source-line);position:relative;left:-1em;text-align:right;vertical-align:baseline;border:none;display:inline-block;-webkit-touch-callout:none;-webkit-user-select:none;-khtml-user-select:none;-moz-user-select:none;-ms-user-select:none;user-select:none;padding:0 4px;width:4em;background-color:#232629;color:#7a7c7d}pre.numberSource{margin-left:3em;border-left:1px solid #7a7c7d;padding-left:4px}div.sourceCode{color:#cfcfc2;background-color:#232629}@media screen{pre>code.sourceCode>span>a:first-child::before{text-decoration:underline}}code span{color:#cfcfc2}code span.al{color:#95da4c;background-color:#4d1f24;font-weight:bold}code span.an{color:#3f8058}code span.at{color:#2980b9}code span.bn{color:#f67400}code span.bu{color:#7f8c8d}code span.cf{color:#fdbc4b;font-weight:bold}code span.ch{color:#3daee9}code span.cn{color:#27aeae;font-weight:bold}code span.co{color:#7a7c7d}code span.cv{color:#7f8c8d}code span.do{color:#a43340}code span.dt{color:#2980b9}code span.dv{color:#f67400}code span.er{color:#da4453;text-decoration:underline}code span.ex{color:#0099ff;font-weight:bold}code span.fl{color:#f67400}code span.fu{color:#8e44ad}code span.im{color:#27ae60}code span.in{color:#c45b00}code span.kw{color:#cfcfc2;font-weight:bold}code span.op{color:#cfcfc2}code span.ot{color:#27ae60}code span.pp{color:#27ae60}code span.re{color:#2980b9;background-color:#153042}code span.sc{color:#3daee9}code span.ss{color:#da4453}code span.st{color:#f44f4f}code span.va{color:#27aeae}code span.vs{color:#da4453}code span.wa{color:#da4453} \ No newline at end of file diff --git a/favicon.ico b/favicon.ico new file mode 100644 index 0000000..2096d33 Binary files /dev/null and b/favicon.ico differ diff --git a/images/conditional-finality.png b/images/conditional-finality.png new file mode 100644 index 0000000..6673b58 Binary files /dev/null and b/images/conditional-finality.png differ diff --git a/index.html b/index.html new file mode 100644 index 0000000..6fd791f --- /dev/null +++ b/index.html @@ -0,0 +1,260 @@ + + + + youwen wu | The Involution + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+

+ The Involution. +

+
+
+

+ a web-log about computers and math and hacking. +

+ by Youwen Wu + | + + | + +
+
+
+

Latest

+
+ +
+
+ + + + + diff --git a/nix-automatic-hash-updates-made-easy.html b/nix-automatic-hash-updates-made-easy.html new file mode 100644 index 0000000..cb67922 --- /dev/null +++ b/nix-automatic-hash-updates-made-easy.html @@ -0,0 +1,435 @@ + + + + Nix automatic hash updates made easy | The Involution + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+

+ The Involution. +

+
+
+

+ a web-log about computers and math and hacking. +

+ by Youwen Wu + | + + | + +
+
+
+

+ Nix automatic hash updates made easy +

+

+ keep your flakes up to date +

+
2024-12-28
+
+ +
+
+

Nix users often create flakes to package software out of tree, like this Zen +Browser flake I’ve been +maintaining. Keeping them up to date is a hassle though, since you have to +update the Subresource Integrity (SRI) hashes that Nix uses to ensure +reproducibility.

+

Here’s a neat method I’ve been using to cleanly handle automatic hash updates. +I use Nushell to easily work with data, prefetch +some hashes, and put it all in a JSON file that can be read by Nix at build +time.

+

First, let’s create a file called update.nu. At the top, place this shebang:

+
#!/usr/bin/env -S nix shell nixpkgs#nushell --command nu
+

This will execute the script in a Nushell environment, which is fetched by Nix.

+

Get the up to date URLs

+

We need to obtain the latest version of whatever software we want to update. +In this case, I’ll use GitHub releases as my source of truth.

+

You can use the GitHub API to fetch metadata about all the releases of a repository.

+
https://api.github.com/repos/($repo)/releases
+

Roughly speaking, the raw JSON returned by the GitHub releases API looks something like:

+
[
+   {tag_name: "foo", prerelease: false, ...},
+   {tag_name: "bar", prerelease: true, ...},
+   {tag_name: "foobar", prerelease: false, ...},
+]
+
+

Note that the ordering of the objects in the array is chronological.

+
+

Even if you aren’t using GitHub releases, as long as there is a reliable way to +programmatically fetch the latest download URLs of whatever software you’re +packaging, you can adapt this approach for your specific case.

+
+

We use Nushell’s http get to make a network request. Nushell will +automatically detect and parse the JSON reponse into a Nushell table.

+

In my case, Zen Browser frequently publishes prerelease “twilight” builds which +we don’t want to update to. So, we ignore any releases tagged “twilight” or +marked “prerelease” by filtering them out with the where selector.

+

Finally, we retrieve the tag name of the item at the first index, which would +be the latest release (since the JSON array was chronologically sorted).

+
#!/usr/bin/env -S nix shell nixpkgs#nushell --command nu
+
+# get the latest tag of the latest release that isn't a prerelease
+def get_latest_release [repo: string] {
+  try {
+	http get $"https://api.github.com/repos/($repo)/releases"
+	  | where prerelease == false
+	  | where tag_name != "twilight"
+	  | get tag_name
+	  | get 0
+  } catch { |err| $"Failed to fetch latest release, aborting: ($err.msg)" }
+}
+

Prefetching SRI hashes

+

Now that we have the latest tags, we can easily obtain the latest download URLs, which are of the form:

+
https://github.com/zen-browser/desktop/releases/download/$tag/zen.linux-x86_64.tar.bz2
+https://github.com/zen-browser/desktop/releases/download/$tag/zen.aarch64-x86_64.tar.bz2
+

However, we still need the corresponding SRI hashes to pass to Nix.

+
src = fetchurl {
+   url = "https://github.com/zen-browser/desktop/releases/download/1.0.2-b.5/zen.linux-x86_64.tar.bz2";
+   hash = "sha256-00000000000000000000000000000000000000000000";
+};
+

The easiest way to obtain these new hashes is to update the URL and then set +the hash property to an empty string (""). Nix will spit out an hash mismatch +error with the correct hash. However, this is inconvenient for automated +command line scripting.

+

The Nix documentation mentions +nix-prefetch-url +as a way to obtain these hashes, but as usual, it doesn’t work quite right and +has also been replaced by a more powerful but underdocumented experimental +feature instead.

+

The nix store +prefetch-file +command does what nix-prefetch-url is supposed to do, but handles the caveats +that lead to the wrong hash being produced automatically.

+

Let’s write a Nushell function that outputs the SRI hash of the given URL. We +tell prefetch-file to output structured JSON that we can parse.

+

Since Nushell is a shell, we can directly invoke shell commands like usual, +and then process their output with pipes.

+
def get_nix_hash [url: string] {
+  nix store prefetch-file --hash-type sha256 --json $url | from json | get hash
+}
+

Cool! Now get_nix_hash can give us SRI hashes that look like this:

+
sha256-K3zTCLdvg/VYQNsfeohw65Ghk8FAjhOl8hXU6REO4/s=
+

Putting it all together

+

Now that we’re able to fetch the latest release, obtain the download URLs, and +compute their SRI hashes, we have all the information we need to make an +automated update. However, these URLs are typically hardcoded in our Nix +expressions. The question remains as to how to update these values.

+

A common way I’ve seen updates performed is using something like sed to +modify the Nix expressions in place. However, there’s actually a more +maintainable and easy to understand approach.

+

Let’s have our Nushell script generate the URLs and hashes and place them in a +JSON file! Then, we’ll be able to read the JSON file from Nix and obtain the +URL and hash.

+
def generate_sources [] {
+  let tag = get_latest_release "zen-browser/desktop"
+  let prev_sources = open ./sources.json
+
+  if $tag == $prev_sources.version {
+	# everything up to date
+	return $tag
+  }
+
+  # generate the download URLs with the new tag
+  let x86_64_url = $"https://github.com/zen-browser/desktop/releases/download/($tag)/zen.linux-x86_64.tar.bz2"
+  let aarch64_url = $"https://github.com/zen-browser/desktop/releases/download/($tag)/zen.linux-aarch64.tar.bz2"
+
+  # create a Nushell record that maps cleanly to JSON
+  let sources = {
+    # add a version field as well for convenience
+	version: $tag
+
+	x86_64-linux: {
+	  url:  $x86_64_url
+	  hash: (get_nix_hash $x86_64_url)
+	}
+	aarch64-linux: {
+	  url: $aarch64_url
+	  hash: (get_nix_hash $aarch64_url)
+	}
+  }
+
+  echo $sources | save --force "sources.json"
+
+  return $tag
+}
+

Running this script with

+
chmod +x ./update.nu
+./update.nu
+

gives us the file sources.json:

+
{
+  "version": "1.0.2-b.5",
+  "x86_64-linux": {
+    "url": "https://github.com/zen-browser/desktop/releases/download/1.0.2-b.5/zen.linux-x86_64.tar.bz2",
+    "hash": "sha256-K3zTCLdvg/VYQNsfeohw65Ghk8FAjhOl8hXU6REO4/s="
+  },
+  "aarch64-linux": {
+    "url": "https://github.com/zen-browser/desktop/releases/download/1.0.2-b.5/zen.linux-aarch64.tar.bz2",
+    "hash": "sha256-NwIYylGal2QoWhWKtMhMkAAJQ6iNHfQOBZaxTXgvxAk="
+  }
+}
+

Now, let’s read this from Nix. My file organization looks like the following:

+
./
+| flake.nix
+| zen-browser-unwrapped.nix
+| ...other files...
+

zen-browser-unwrapped.nix contains the derivation for Zen Browser. Let’s add +version, url, and hash to its inputs:

+
{
+  stdenv,
+  fetchurl,
+  # add these below
+  version,
+  url,
+  hash,
+  ...
+}:
+stdenv.mkDerivation {
+   # inherit version from inputs
+  inherit version;
+  pname = "zen-browser-unwrapped";
+
+  src = fetchurl {
+    # inherit the URL and hash we obtain from the inputs
+    inherit url hash;
+  };
+}
+

Then in flake.nix, let’s provide the derivation with the data from sources.json:

+
let
+   supportedSystems = [
+     "x86_64-linux"
+     "aarch64-linux"
+   ];
+   forAllSystems = nixpkgs.lib.genAttrs supportedSystems;
+in
+{
+   # rest of file omitted for simplicity
+   packages = forAllSystems (
+     system:
+     let
+       pkgs = import nixpkgs { inherit system; };
+       # parse sources.json into a Nix attrset
+       sources = builtins.fromJSON (builtins.readFile ./sources.json);
+     in
+     rec {
+       zen-browser-unwrapped = pkgs.callPackage ./zen-browser-unwrapped.nix {
+         inherit (sources.${system}) hash url;
+         inherit (sources) version;
+
+         # if the above is difficult to understand, it is equivalent to the following:
+         hash = sources.${system}.hash;
+         url = sources.${system}.url;
+         version = sources.version;
+       };
+}
+

Now, running nix build .#zen-browser-unwrapped will be able to use the hashes +and URLs from sources.json to build the package!

+

Automating it in CI

+

We now have a script that can automatically fetch releases and generate hashes +and URLs, as well as a way for Nix to use the outputted JSON to build +derivations. All that’s left is to fully automate it using CI!

+

We are going to use GitHub actions for this, as it’s free and easy and you’re +probably already hosting on GitHub.

+

Ensure you’ve set up actions for your repo and given it sufficient permissions.

+

We’re gonna run it on a cron timer that checks for updates at 8 PM PST every day.

+

We use DeterminateSystems’ actions to help set up Nix. Then, we simply run our +update script. Since we made the script return the tag it fetched, we can store +it in a variable and then use it in our commit message.

+
name: Update to latest version, and update flake inputs
+
+on:
+  schedule:
+    - cron: "0 4 * * *"
+  workflow_dispatch:
+
+jobs:
+  update:
+    name: Update flake inputs and browser
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout Repository
+        uses: actions/checkout@v4
+
+      - name: Check flake inputs
+        uses: DeterminateSystems/flake-checker-action@v4
+
+      - name: Install Nix
+        uses: DeterminateSystems/nix-installer-action@main
+
+      - name: Set up magic Nix cache
+        uses: DeterminateSystems/magic-nix-cache-action@main
+
+      - name: Check for update and perform update
+        run: |
+          git config --global user.name "github-actions[bot]"
+          git config --global user.email "github-actions[bot]@users.noreply.github.com"
+
+          chmod +x ./update.nu
+          export ZEN_LATEST_VER="$(./update.nu)"
+
+          git add -A
+          git commit -m "github-actions: update to $ZEN_LATEST_VER" || echo "Latest version is $ZEN_LATEST_VER, no updates found"
+
+          nix flake update --commit-lock-file
+
+          git push
+

Now, our repository will automatically check for and perform updates every day!

+
+ + + + + diff --git a/out/bundle.css b/out/bundle.css new file mode 100644 index 0000000..33c1e8b --- /dev/null +++ b/out/bundle.css @@ -0,0 +1 @@ +@import url("https://fonts.googleapis.com/css2?family=Merriweather:ital,wght@0,300;0,400;0,700;0,900;1,300;1,400;1,700;1,900&display=swap");@import url("https://fonts.googleapis.com/css2?family=Open+Sans:ital,wght@0,300..800;1,300..800&display=swap");*,::before,::after{--tw-border-spacing-x:0;--tw-border-spacing-y:0;--tw-translate-x:0;--tw-translate-y:0;--tw-rotate:0;--tw-skew-x:0;--tw-skew-y:0;--tw-scale-x:1;--tw-scale-y:1;--tw-pan-x:;--tw-pan-y:;--tw-pinch-zoom:;--tw-scroll-snap-strictness:proximity;--tw-gradient-from-position:;--tw-gradient-via-position:;--tw-gradient-to-position:;--tw-ordinal:;--tw-slashed-zero:;--tw-numeric-figure:;--tw-numeric-spacing:;--tw-numeric-fraction:;--tw-ring-inset:;--tw-ring-offset-width:0px;--tw-ring-offset-color:#fff;--tw-ring-color:rgb(59 130 246/0.5);--tw-ring-offset-shadow:0 0 #0000;--tw-ring-shadow:0 0 #0000;--tw-shadow:0 0 #0000;--tw-shadow-colored:0 0 #0000;--tw-blur:;--tw-brightness:;--tw-contrast:;--tw-grayscale:;--tw-hue-rotate:;--tw-invert:;--tw-saturate:;--tw-sepia:;--tw-drop-shadow:;--tw-backdrop-blur:;--tw-backdrop-brightness:;--tw-backdrop-contrast:;--tw-backdrop-grayscale:;--tw-backdrop-hue-rotate:;--tw-backdrop-invert:;--tw-backdrop-opacity:;--tw-backdrop-saturate:;--tw-backdrop-sepia:;--tw-contain-size:;--tw-contain-layout:;--tw-contain-paint:;--tw-contain-style:}::backdrop{--tw-border-spacing-x:0;--tw-border-spacing-y:0;--tw-translate-x:0;--tw-translate-y:0;--tw-rotate:0;--tw-skew-x:0;--tw-skew-y:0;--tw-scale-x:1;--tw-scale-y:1;--tw-pan-x:;--tw-pan-y:;--tw-pinch-zoom:;--tw-scroll-snap-strictness:proximity;--tw-gradient-from-position:;--tw-gradient-via-position:;--tw-gradient-to-position:;--tw-ordinal:;--tw-slashed-zero:;--tw-numeric-figure:;--tw-numeric-spacing:;--tw-numeric-fraction:;--tw-ring-inset:;--tw-ring-offset-width:0px;--tw-ring-offset-color:#fff;--tw-ring-color:rgb(59 130 246/0.5);--tw-ring-offset-shadow:0 0 #0000;--tw-ring-shadow:0 0 #0000;--tw-shadow:0 0 #0000;--tw-shadow-colored:0 0 #0000;--tw-blur:;--tw-brightness:;--tw-contrast:;--tw-grayscale:;--tw-hue-rotate:;--tw-invert:;--tw-saturate:;--tw-sepia:;--tw-drop-shadow:;--tw-backdrop-blur:;--tw-backdrop-brightness:;--tw-backdrop-contrast:;--tw-backdrop-grayscale:;--tw-backdrop-hue-rotate:;--tw-backdrop-invert:;--tw-backdrop-opacity:;--tw-backdrop-saturate:;--tw-backdrop-sepia:;--tw-contain-size:;--tw-contain-layout:;--tw-contain-paint:;--tw-contain-style:}/* ! tailwindcss v3.4.17 | MIT License | https://tailwindcss.com */*,::before,::after{box-sizing:border-box;border-width:0;border-style:solid;border-color:#e5e7eb}::before,::after{--tw-content:''}html,:host{line-height:1.5;-webkit-text-size-adjust:100%;-moz-tab-size:4;-o-tab-size:4;tab-size:4;font-family:Open Sans,sans-serif;font-feature-settings:normal;font-variation-settings:normal;-webkit-tap-highlight-color:transparent}body{margin:0;line-height:inherit}hr{height:0;color:inherit;border-top-width:1px}abbr:where([title]){-webkit-text-decoration:underline dotted;text-decoration:underline dotted}h1,h2,h3,h4,h5,h6{font-size:inherit;font-weight:inherit}a{color:inherit;text-decoration:inherit}b,strong{font-weight:bolder}code,kbd,samp,pre{font-family:ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",monospace;font-feature-settings:normal;font-variation-settings:normal;font-size:1em}small{font-size:80%}sub,sup{font-size:75%;line-height:0;position:relative;vertical-align:baseline}sub{bottom:-0.25em}sup{top:-0.5em}table{text-indent:0;border-color:inherit;border-collapse:collapse}button,input,optgroup,select,textarea{font-family:inherit;font-feature-settings:inherit;font-variation-settings:inherit;font-size:100%;font-weight:inherit;line-height:inherit;letter-spacing:inherit;color:inherit;margin:0;padding:0}button,select{text-transform:none}button,input:where([type='button']),input:where([type='reset']),input:where([type='submit']){-webkit-appearance:button;background-color:transparent;background-image:none}:-moz-focusring{outline:auto}:-moz-ui-invalid{box-shadow:none}progress{vertical-align:baseline}::-webkit-inner-spin-button,::-webkit-outer-spin-button{height:auto}[type='search']{-webkit-appearance:textfield;outline-offset:-2px}::-webkit-search-decoration{-webkit-appearance:none}::-webkit-file-upload-button{-webkit-appearance:button;font:inherit}summary{display:list-item}blockquote,dl,dd,h1,h2,h3,h4,h5,h6,hr,figure,p,pre{margin:0}fieldset{margin:0;padding:0}legend{padding:0}ol,ul,menu{list-style:none;margin:0;padding:0}dialog{padding:0}textarea{resize:vertical}input::-moz-placeholder, textarea::-moz-placeholder{opacity:1;color:#9ca3af}input::placeholder,textarea::placeholder{opacity:1;color:#9ca3af}button,[role="button"]{cursor:pointer}:disabled{cursor:default}img,svg,video,canvas,audio,iframe,embed,object{display:block;vertical-align:middle}img,video{max-width:100%;height:auto}[hidden]:where(:not([hidden="until-found"])){display:none}body{--tw-bg-opacity:1;background-color:rgb(250 244 237/var(--tw-bg-opacity,1));font-family:Merriweather,serif;--tw-text-opacity:1;color:rgb(68 64 60/var(--tw-text-opacity,1))}body:where(.dark,.dark *){--tw-bg-opacity:1;background-color:rgb(25 23 36/var(--tw-bg-opacity,1));--tw-text-opacity:1;color:rgb(224 222 244/var(--tw-text-opacity,1))}.container{width:100%}@media (min-width:640px){.container{max-width:640px}}@media (min-width:768px){.container{max-width:768px}}@media (min-width:1024px){.container{max-width:1024px}}@media (min-width:1280px){.container{max-width:1280px}}@media (min-width:1536px){.container{max-width:1536px}}.mx-4{margin-left:1rem;margin-right:1rem}.mx-auto{margin-left:auto;margin-right:auto}.my-1{margin-top:0.25rem;margin-bottom:0.25rem}.mb-1{margin-bottom:0.25rem}.mb-14{margin-bottom:3.5rem}.mb-3{margin-bottom:0.75rem}.mb-4{margin-bottom:1rem}.ml-2{margin-left:0.5rem}.mt-1{margin-top:0.25rem}.mt-14{margin-top:3.5rem}.mt-2{margin-top:0.5rem}.mt-4{margin-top:1rem}.mt-6{margin-top:1.5rem}.mt-8{margin-top:2rem}.inline-flex{display:inline-flex}.h-0\.5{height:0.125rem}.h-1{height:0.25rem}.w-fit{width:-moz-fit-content;width:fit-content}.w-full{width:100%}.max-w-3xl{max-width:48rem}.max-w-\[200px\]{max-width:200px}.max-w-sm{max-width:24rem}.flex-shrink{flex-shrink:1}.flex-grow{flex-grow:1}.items-center{align-items:center}.space-y-4>:not([hidden])~:not([hidden]){--tw-space-y-reverse:0;margin-top:calc(1rem * calc(1 - var(--tw-space-y-reverse)));margin-bottom:calc(1rem * var(--tw-space-y-reverse))}.text-nowrap{text-wrap:nowrap}.rounded-lg{border-radius:0.5rem}.rounded-md{border-radius:0.375rem}.rounded-xl{border-radius:0.75rem}.border-0{border-width:0px}.bg-accent-light{--tw-bg-opacity:1;background-color:rgb(121 117 147/var(--tw-bg-opacity,1))}.bg-muted-light{--tw-bg-opacity:1;background-color:rgb(152 147 165/var(--tw-bg-opacity,1))}.p-2{padding:0.5rem}.px-1{padding-left:0.25rem;padding-right:0.25rem}.px-2{padding-left:0.5rem;padding-right:0.5rem}.px-4{padding-left:1rem;padding-right:1rem}.pb-12{padding-bottom:3rem}.font-sans{font-family:Open Sans,sans-serif}.font-serif{font-family:Merriweather,serif}.text-2xl{font-size:1.5rem;line-height:2rem}.text-3xl{font-size:1.875rem;line-height:2.25rem}.text-4xl{font-size:2.25rem;line-height:2.5rem}.text-lg{font-size:1.125rem;line-height:1.75rem}.text-sm{font-size:0.875rem;line-height:1.25rem}.font-light{font-weight:300}.font-medium{font-weight:500}.italic{font-style:italic}.leading-relaxed{line-height:1.625}.tracking-wide{letter-spacing:0.025em}.text-accent-light{--tw-text-opacity:1;color:rgb(121 117 147/var(--tw-text-opacity,1))}.text-iris-light{--tw-text-opacity:1;color:rgb(144 122 169/var(--tw-text-opacity,1))}.transition-all{transition-property:all;transition-timing-function:cubic-bezier(0.4,0,0.2,1);transition-duration:150ms}.transition-colors{transition-property:color,background-color,border-color,text-decoration-color,fill,stroke;transition-timing-function:cubic-bezier(0.4,0,0.2,1);transition-duration:150ms}.duration-500{transition-duration:500ms}.duration-\[2s\]{transition-duration:2s}.external-link-muted{text-decoration-line:underline;text-decoration-color:#797593;text-decoration-color:#908caa;text-decoration-style:solid;text-decoration-thickness:2px;text-underline-offset:2px}.external-link-muted:hover{--tw-text-opacity:1;color:rgb(180 99 122/var(--tw-text-opacity,1))}.external-link-muted:hover:where(.dark,.dark *){--tw-text-opacity:1;color:rgb(235 111 146/var(--tw-text-opacity,1))}.post{h1{font-size:1.5rem;line-height:2rem}h1{font-weight:700}h2{position:relative}h2{margin-top:2rem}h2{width:-moz-fit-content;width:fit-content}h2{font-size:1.5rem;line-height:2rem}h2{font-weight:500}h2::after{position:absolute}h2::after{left:0px}h2::after{margin-top:0.5rem}h2::after{display:block}h2::after{height:0.25rem}h2::after{width:3rem}h2::after{border-radius:0.125rem}h2::after{--tw-bg-opacity:1;background-color:rgb(152 147 165/var(--tw-bg-opacity,1))}h2:where(.dark,.dark *)::after{--tw-bg-opacity:1;background-color:rgb(110 106 134/var(--tw-bg-opacity,1))}h2::after{content:""}h3,h4,h5,h6{margin-top:2rem}h3,h4,h5,h6{font-size:1.25rem;line-height:1.75rem}h3,h4,h5,h6{font-weight:500}h3,h4,h5,h6{--tw-text-opacity:1;color:rgb(121 117 147/var(--tw-text-opacity,1))}h3:where(.dark,.dark *),h4:where(.dark,.dark *),h5:where(.dark,.dark *),h6:where(.dark,.dark *){--tw-text-opacity:1;color:rgb(144 140 170/var(--tw-text-opacity,1))}p{margin-top:1rem;margin-bottom:1rem}p{overflow-x:auto}p{font-weight:300}p{line-height:2}@media (min-width:640px){p{font-size:1.125rem;line-height:1.75rem}}@media (min-width:640px){p{line-height:2}}img{margin-left:auto;margin-right:auto}img{margin-top:1.5rem;margin-bottom:1.5rem}img{border-radius:0.5rem}div.sourceCode{border-radius:0.5rem}div.sourceCode{padding:1rem}ol,ul{list-style-position:inside}ol,ul{font-weight:300}ol,ul{line-height:2}@media (min-width:640px){ol,ul{font-size:1.125rem;line-height:1.75rem}}@media (min-width:640px){ol,ul{line-height:2}}ul{list-style-type:disc}ol{list-style-type:decimal}ol ol{margin-left:1rem}ol ol{list-style-type:disc}ol ol ol{margin-left:1rem}ol ol ol{list-style-type:"-"}li{margin-top:0.25rem;margin-bottom:0.25rem}hr{margin-top:2.5rem;margin-bottom:2.5rem}hr{margin-left:auto;margin-right:auto}hr{height:0.5rem}hr{max-width:3rem}hr{border-radius:0.75rem}hr{border-width:0px}hr{--tw-bg-opacity:1;background-color:rgb(152 147 165/var(--tw-bg-opacity,1))}hr:where(.dark,.dark *){--tw-bg-opacity:1;background-color:rgb(110 106 134/var(--tw-bg-opacity,1))}blockquote{margin-top:1rem;margin-bottom:1rem}blockquote{height:-moz-fit-content;height:fit-content}blockquote{border-left-width:4px}blockquote{--tw-border-opacity:1;border-color:rgb(121 117 147/var(--tw-border-opacity,1))}blockquote{padding-left:1rem;padding-right:1rem}blockquote{padding-top:0.125rem;padding-bottom:0.125rem}blockquote{font-style:italic}blockquote{--tw-text-opacity:1;color:rgb(121 117 147/var(--tw-text-opacity,1))}blockquote:where(.dark,.dark *){--tw-border-opacity:1;border-color:rgb(144 140 170/var(--tw-border-opacity,1))}blockquote:where(.dark,.dark *){--tw-text-opacity:1;color:rgb(144 140 170/var(--tw-text-opacity,1))}blockquote>p{margin:0px}a:not(code a){text-decoration-line:underline}a:not(code a){text-decoration-color:#b4637a}a:not(code a){text-decoration-color:#eb6f92}a:not(code a){text-decoration-style:solid}a:not(code a){text-decoration-thickness:2px}a:not(code a){text-underline-offset:2px}a:not(code a):hover{--tw-text-opacity:1;color:rgb(180 99 122/var(--tw-text-opacity,1))}a:not(code a):hover:where(.dark,.dark *){--tw-text-opacity:1;color:rgb(235 111 146/var(--tw-text-opacity,1))}figure{margin-top:0.5rem;margin-bottom:0.5rem}figure{display:inline-block}figure img{margin-bottom:0.5rem}figure img{vertical-align:top}figure figcaption{text-align:center}details{margin-top:1rem;margin-bottom:1rem}details{overflow-x:auto}details{font-weight:300}details{line-height:2}@media (min-width:640px){details{font-size:1.125rem;line-height:1.75rem}}@media (min-width:640px){details{line-height:2}}details summary{margin-bottom:0.25rem}details summary{cursor:pointer}}.hover\:text-accent-light:hover{--tw-text-opacity:1;color:rgb(121 117 147/var(--tw-text-opacity,1))}.hover\:text-love-light:hover{--tw-text-opacity:1;color:rgb(180 99 122/var(--tw-text-opacity,1))}.hover\:text-muted-light:hover{--tw-text-opacity:1;color:rgb(152 147 165/var(--tw-text-opacity,1))}.group:hover .group-hover\:max-w-\[250px\]{max-width:250px}.group:hover .group-hover\:bg-iris-light{--tw-bg-opacity:1;background-color:rgb(144 122 169/var(--tw-bg-opacity,1))}.group:hover .group-hover\:text-iris-light{--tw-text-opacity:1;color:rgb(144 122 169/var(--tw-text-opacity,1))}.group:hover .group-hover\:text-muted-light{--tw-text-opacity:1;color:rgb(152 147 165/var(--tw-text-opacity,1))}@media (min-width:640px){.sm\:mr-2{margin-right:0.5rem}}@media (min-width:768px){.md\:mt-24{margin-top:6rem}.md\:space-y-8>:not([hidden])~:not([hidden]){--tw-space-y-reverse:0;margin-top:calc(2rem * calc(1 - var(--tw-space-y-reverse)));margin-bottom:calc(2rem * var(--tw-space-y-reverse))}.md\:text-5xl{font-size:3rem;line-height:1}}.dark\:bg-accent-dark:where(.dark,.dark *){--tw-bg-opacity:1;background-color:rgb(144 140 170/var(--tw-bg-opacity,1))}.dark\:bg-muted-dark:where(.dark,.dark *){--tw-bg-opacity:1;background-color:rgb(110 106 134/var(--tw-bg-opacity,1))}.dark\:text-accent-dark:where(.dark,.dark *){--tw-text-opacity:1;color:rgb(144 140 170/var(--tw-text-opacity,1))}.dark\:text-iris-dark:where(.dark,.dark *){--tw-text-opacity:1;color:rgb(196 167 231/var(--tw-text-opacity,1))}.dark\:hover\:text-love-dark:hover:where(.dark,.dark *){--tw-text-opacity:1;color:rgb(235 111 146/var(--tw-text-opacity,1))}.dark\:hover\:text-muted-dark:hover:where(.dark,.dark *){--tw-text-opacity:1;color:rgb(110 106 134/var(--tw-text-opacity,1))}.group:hover .dark\:group-hover\:bg-iris-dark:where(.dark,.dark *){--tw-bg-opacity:1;background-color:rgb(196 167 231/var(--tw-bg-opacity,1))}.group:hover .dark\:group-hover\:text-iris-dark:where(.dark,.dark *){--tw-text-opacity:1;color:rgb(196 167 231/var(--tw-text-opacity,1))}.group:hover .group-hover\:dark\:text-muted-dark:where(.dark,.dark *){--tw-text-opacity:1;color:rgb(110 106 134/var(--tw-text-opacity,1))} diff --git a/out/bundle.js b/out/bundle.js new file mode 100644 index 0000000..4602406 --- /dev/null +++ b/out/bundle.js @@ -0,0 +1 @@ +const e=window.matchMedia("(prefers-color-scheme: dark)").matches,t=()=>{document.documentElement.classList.remove("dark")},s=()=>{document.documentElement.classList.add("dark")};let o="dark"===localStorage.getItem("theme")?2:"light"===localStorage.getItem("theme")?1:0;const a=document.getElementById("theme-toggle");a.addEventListener("click",(()=>{switch(o=(o+1)%3,o){case 0:localStorage.removeItem("theme"),e?document.documentElement.classList.add("dark"):document.documentElement.classList.remove("dark"),a.innerText="theme: system";break;case 1:e?(localStorage.setItem("theme","light"),t(),a.innerText="theme: light"):(localStorage.setItem("theme","dark"),s(),a.innerText="theme: dark");break;case 2:e?(localStorage.setItem("theme","dark"),s(),a.innerText="theme: dark"):(localStorage.setItem("theme","light"),t(),a.innerText="theme: light")}}));const n=()=>{document.body.classList.remove("font-sans"),document.body.classList.remove("font-serif")},m=e=>{e&&"serif"===e&&(n(),document.body.classList.add("font-serif")),e&&"sans"===e&&(n(),document.body.classList.add("font-sans")),e||n()};let c=localStorage.getItem("font");m();const l=document.getElementById("font-toggle");l.addEventListener("click",(()=>{c=localStorage.getItem("font"),"sans"===c?(c="serif",l.innerText="serif",localStorage.setItem("font","serif")):(c="sans",l.innerText="sans",localStorage.setItem("font","sans")),m(c)})); diff --git a/random-variables-distributions-and-probability-theory.html b/random-variables-distributions-and-probability-theory.html new file mode 100644 index 0000000..c26c85d --- /dev/null +++ b/random-variables-distributions-and-probability-theory.html @@ -0,0 +1,1101 @@ + + + + Random variables, distributions, and probability theory | The Involution + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+

+ The Involution. +

+
+
+

+ a web-log about computers and math and hacking. +

+ by Youwen Wu + | + + | + +
+
+
+

+ Random variables, distributions, and probability theory +

+

+ An overview of discrete and continuous random variables and their distributions and moment generating functions +

+
2025-02-16
+
+ +
+
+

These are some notes I’ve been collecting on random variables, their +distributions, expected values, and moment generating functions. I +thought I’d write them down somewhere useful.

+

These are almost extracted verbatim from my in-class notes, which I take +in real time using Typst. I simply wrote a tiny compatibility shim to +allow Pandoc to render them to the web.

+
+

Random variables

+

First, some brief exposition on random variables. Quixotically, a random +variable is actually a function.

+

Standard notation: Ω\Omega is a sample space, ωΩ\omega \in \Omega is an +event.

+

Definition.

+

A random variable XX is a function +X:ΩX:\Omega \rightarrow {\mathbb{R}} that takes the set of possible +outcomes in a sample space, and maps it to a measurable +space, typically (as in +our case) a subset of \mathbb{R}.

+

Definition.

+

The state space of a random variable XX is all of the values XX +can take.

+

Example.

+

Let XX be a random variable that takes on the values +{0,1,2,3}\left\{ 0,1,2,3 \right\}. Then the state space of XX is the set +{0,1,2,3}\left\{ 0,1,2,3 \right\}.

+

Discrete random variables

+

A random variable XX is discrete if there is countable AA such that +P(XA)=1P(X \in A) = 1. kk is a possible value if P(X=k)>0P(X = k) > 0. We discuss +continuous random variables later.

+

The probability distribution of XX gives its important probabilistic +information. The probability distribution is a description of the +probabilities P(XB)P(X \in B) for subsets BB \in {\mathbb{R}}. We describe +the probability density function and the cumulative distribution +function.

+

A discrete random variable has probability distribution entirely +determined by its probability mass function (hereafter abbreviated p.m.f +or PMF) p(k)=P(X=k)p(k) = P(X = k). The p.m.f. is a function from the set of +possible values of XX into [0,1]\lbrack 0,1\rbrack. Labeling the p.m.f. +with the random variable is done by pX(k)p_{X}(k).

+

pX: State space of X[0,1]p_{X}:\text{ State space of }X \rightarrow \lbrack 0,1\rbrack

+

By the axioms of probability,

+

kpX(k)=kP(X=k)=1\sum_{k}p_{X}(k) = \sum_{k}P(X = k) = 1

+

For a subset BB \subset {\mathbb{R}},

+

P(XB)=kBpX(k)P(X \in B) = \sum_{k \in B}p_{X}(k)

+

Continuous random variables

+

Now as promised we introduce another major class of random variables.

+

Definition.

+

Let XX be a random variable. If ff satisfies

+

P(Xb)=bf(x)dxP(X \leq b) = \int_{- \infty}^{b}f(x)dx

+

for all bb \in {\mathbb{R}}, then ff is the probability density +function (hereafter abbreviated p.d.f. or PDF) of XX.

+

We immediately see that the p.d.f. is analogous to the p.m.f. of the +discrete case.

+

The probability that X(,b]X \in ( - \infty,b\rbrack is equal to the area +under the graph of ff from - \infty to bb.

+

A corollary is the following.

+

Fact.

+

P(XB)=Bf(x)dxP(X \in B) = \int_{B}f(x)dx

+

for any BB \subset {\mathbb{R}} where integration makes sense.

+

The set can be bounded or unbounded, or any collection of intervals.

+

Fact.

+

P(aXb)=abf(x)dxP(a \leq X \leq b) = \int_{a}^{b}f(x)dx +P(X>a)=af(x)dxP(X > a) = \int_{a}^{\infty}f(x)dx

+

Fact.

+

If a random variable XX has density function ff then individual point +values have probability zero:

+

P(X=c)=ccf(x)dx=0,cP(X = c) = \int_{c}^{c}f(x)dx = 0,\forall c \in {\mathbb{R}}

+

Remark.

+

It follows a random variable with a density function is not discrete. An +immediate corollary of this is that the probabilities of intervals are +not changed by including or excluding endpoints. So P(Xk)P(X \leq k) and +P(X<k)P(X < k) are equivalent.

+

How to determine which functions are p.d.f.s? Since +P(<X<)=1P( - \infty < X < \infty) = 1, a p.d.f. ff must satisfy

+

f(x)0xf(x)dx=1\begin{array}{r} +f(x) \geq 0\forall x \in {\mathbb{R}} \\ +\int_{- \infty}^{\infty}f(x)dx = 1 +\end{array}

+

Fact.

+

Random variables with density functions are called continuous random +variables. This does not imply that the random variable is a continuous +function on Ω\Omega but it is standard terminology.

+

Discrete distributions

+

Recall that the probability distribution of XX gives its important +probabilistic information. Let us discuss some of these distributions.

+

In general we first consider the experiment’s properties and theorize +about the distribution that its random variable takes. We can then apply +the distribution to find out various pieces of probabilistic +information.

+

Bernoulli trials

+

A Bernoulli trial is the original “experiment.” It’s simply a single +trial with a binary “success” or “failure” outcome. Encode this T/F, 0 +or 1, or however you’d like. It becomes immediately useful in defining +more complex distributions, so let’s analyze its properties.

+

The setup: the experiment has exactly two outcomes:

+ +

Additionally: P(S)=p,(0<p<1)P(F)=1p=q\begin{array}{r} +P(S) = p,(0 < p < 1) \\ +P(F) = 1 - p = q +\end{array}

+

Construct the probability mass function:

+

P(X=1)=pP(X=0)=1p\begin{array}{r} +P(X = 1) = p \\ +P(X = 0) = 1 - p +\end{array}

+

Write it as:

+

px(k)=pk(1p)1kp_{x(k)} = p^{k}(1 - p)^{1 - k}

+

for k=1k = 1 and k=0k = 0.

+

Binomial distribution

+

The setup: very similar to Bernoulli, trials have exactly 2 outcomes. A +bunch of Bernoulli trials in a row.

+

Importantly: pp and qq are defined exactly the same in all trials.

+

This ties the binomial distribution to the sampling with replacement +model, since each trial does not affect the next.

+

We conduct nn independent trials of this experiment. Example with +coins: each flip independently has a 12\frac{1}{2} chance of heads or +tails (holds same for die, rigged coin, etc).

+

nn is fixed, i.e. known ahead of time.

+

Binomial random variable

+

Let’s consider the random variable characterized by the binomial +distribution now.

+

Let X=#X = \# of successes in nn independent trials. For any particular +sequence of nn trials, it takes the form +Ω={ω} where ω=SFFF\Omega = \left\{ \omega \right\}\text{ where }\omega = SFF\cdots F and +is of length nn.

+

Then X(ω)=0,1,2,,nX(\omega) = 0,1,2,\ldots,n can take n+1n + 1 possible values. The +probability of any particular sequence is given by the product of the +individual trial probabilities.

+

Example.

+

ω=SFFSFS=(pqqpqp)\omega = SFFSF\cdots S = (pqqpq\cdots p)

+

So P(x=0)=P(FFFF)=qqq=qnP(x = 0) = P(FFF\cdots F) = q \cdot q \cdot \cdots \cdot q = q^{n}.

+

And P(X=1)=P(SFFF)+P(FSFFF)++P(FFFFS)=n possible outcomesp1pn1=(n1)p1pn1=np1pn1\begin{array}{r} +P(X = 1) = P(SFF\cdots F) + P(FSFF\cdots F) + \cdots + P(FFF\cdots FS) \\ + = \underset{\text{ possible outcomes}}{\underbrace{n}} \cdot p^{1} \cdot p^{n - 1} \\ + = \begin{pmatrix} +n \\ +1 +\end{pmatrix} \cdot p^{1} \cdot p^{n - 1} \\ + = n \cdot p^{1} \cdot p^{n - 1} +\end{array}

+

Now we can generalize

+

P(X=2)=(n2)p2qn2P(X = 2) = \begin{pmatrix} +n \\ +2 +\end{pmatrix}p^{2}q^{n - 2}

+

How about all successes?

+

P(X=n)=P(SSS)=pnP(X = n) = P(SS\cdots S) = p^{n}

+

We see that for all failures we have qnq^{n} and all successes we have +pnp^{n}. Otherwise we use our method above.

+

In general, here is the probability mass function for the binomial +random variable

+

P(X=k)=(nk)pkqnk, for k=0,1,2,,nP(X = k) = \begin{pmatrix} +n \\ +k +\end{pmatrix}p^{k}q^{n - k},\text{ for }k = 0,1,2,\ldots,n

+

Binomial distribution is very powerful. Choosing between two things, +what are the probabilities?

+

To summarize the characterization of the binomial random variable:

+ +

with X=#X = \# successes in fixed nn trials.

+

X Bin(n,p)X\sim\text{ Bin}(n,p)

+

with probability mass function

+

P(X=x)=(nx)px(1p)nx=p(x) for x=0,1,2,,nP(X = x) = \begin{pmatrix} +n \\ +x +\end{pmatrix}p^{x}(1 - p)^{n - x} = p(x)\text{ for }x = 0,1,2,\ldots,n

+

We see this is in fact the binomial theorem!

+

p(x)0,x=0np(x)=x=0n(nx)pxqnx=(p+q)np(x) \geq 0,\sum_{x = 0}^{n}p(x) = \sum_{x = 0}^{n}\begin{pmatrix} +n \\ +x +\end{pmatrix}p^{x}q^{n - x} = (p + q)^{n}

+

In fact, (p+q)n=(p+(1p))n=1(p + q)^{n} = \left( p + (1 - p) \right)^{n} = 1

+

Example.

+

What is the probability of getting exactly three aces (1’s) out of 10 +throws of a fair die?

+

Seems a little trickier but we can still write this as well defined +SS/FF. Let SS be getting an ace and FF being anything else.

+

Then p=16p = \frac{1}{6} and n=10n = 10. We want P(X=3)P(X = 3). So

+

P(X=3)=(103)p3q7=(103)(16)3(56)70.15505\begin{array}{r} +P(X = 3) = \begin{pmatrix} +10 \\ +3 +\end{pmatrix}p^{3}q^{7} = \begin{pmatrix} +10 \\ +3 +\end{pmatrix}\left( \frac{1}{6} \right)^{3}\left( \frac{5}{6} \right)^{7} \\ + \approx 0.15505 +\end{array}

+

With or without replacement?

+

I place particular emphasis on the fact that the binomial distribution +generally applies to cases where you’re sampling with replacement. +Consider the following: Example.

+

Suppose we have two types of candy, red and black. Select nn candies. +Let XX be the number of red candies among nn selected.

+

2 cases.

+ +

P(X=2)=(n2)(aa+b)2(ba+b)n2P(X = 2) = \begin{pmatrix} +n \\ +2 +\end{pmatrix}\left( \frac{a}{a + b} \right)^{2}\left( \frac{b}{a + b} \right)^{n - 2}

+ +

P(X=x)=(ax)(bnx)(a+bn)=p(x)P(X = x) = \frac{\begin{pmatrix} +a \\ +x +\end{pmatrix}\begin{pmatrix} +b \\ +n - x +\end{pmatrix}}{\begin{pmatrix} +a + b \\ +n +\end{pmatrix}} = p(x)

+

In case 2, we used the elementary counting techniques we are already +familiar with. Immediately we see a distinct case similar to the +binomial but when sampling without replacement. Let’s formalize this as +a random variable!

+

Hypergeometric distribution

+

Let’s introduce a random variable to represent a situation like case 2 +above.

+

Definition.

+

P(X=x)=(ax)(bnx)(a+bn)=p(x)P(X = x) = \frac{\begin{pmatrix} +a \\ +x +\end{pmatrix}\begin{pmatrix} +b \\ +n - x +\end{pmatrix}}{\begin{pmatrix} +a + b \\ +n +\end{pmatrix}} = p(x)

+

is known as a Hypergeometric distribution.

+

Abbreviate this by:

+

X Hypergeom(# total,# successes, sample size)X\sim\text{ Hypergeom}\left( \#\text{ total},\#\text{ successes},\text{ sample size} \right)

+

For example,

+

X Hypergeom(N,Na,n)X\sim\text{ Hypergeom}\left( N,N_{a},n \right)

+

Remark.

+

If xx is very small relative to a+ba + b, then both cases give similar +(approx. the same) answers.

+

For instance, if we’re sampling for blood types from UCSB, and we take a +student out without replacement, we don’t really change the sample size +substantially. So both answers give a similar result.

+

Suppose we have two types of items, type AA and type BB. Let NAN_{A} +be #\# type AA, NBN_{B} #\# type BB. N=NA+NBN = N_{A} + N_{B} is the +total number of objects.

+

We sample nn items without replacement (nNn \leq N) with order not +mattering. Denote by XX the number of type AA objects in our sample.

+

Definition.

+

Let 0NAN0 \leq N_{A} \leq N and 1nN1 \leq n \leq N be integers. A random +variable XX has the hypergeometric distribution with parameters +(N,NA,n)\left( N,N_{A},n \right) if XX takes values in the set +{0,1,,n}\left\{ 0,1,\ldots,n \right\} and has p.m.f.

+

P(X=k)=(NAk)(NNAnk)(Nn)=p(k)P(X = k) = \frac{\begin{pmatrix} +N_{A} \\ +k +\end{pmatrix}\begin{pmatrix} +N - N_{A} \\ +n - k +\end{pmatrix}}{\begin{pmatrix} +N \\ +n +\end{pmatrix}} = p(k)

+

Example.

+

Let NA=10N_{A} = 10 defectives. Let NB=90N_{B} = 90 non-defectives. We select +n=5n = 5 without replacement. What is the probability that 2 of the 5 +selected are defective?

+

X Hypergeom (N=100,NA=10,n=5)X\sim\text{ Hypergeom }\left( N = 100,N_{A} = 10,n = 5 \right)

+

We want P(X=2)P(X = 2).

+

P(X=2)=(102)(903)(1005)0.0702P(X = 2) = \frac{\begin{pmatrix} +10 \\ +2 +\end{pmatrix}\begin{pmatrix} +90 \\ +3 +\end{pmatrix}}{\begin{pmatrix} +100 \\ +5 +\end{pmatrix}} \approx 0.0702

+

Remark.

+

Make sure you can distinguish when a problem is binomial or when it is +hypergeometric. This is very important on exams.

+

Recall that both ask about number of successes, in a fixed number of +trials. But binomial is sample with replacement (each trial is +independent) and sampling without replacement is hypergeometric.

+

Geometric distribution

+

Consider an infinite sequence of independent trials. e.g. number of +attempts until I make a basket.

+

In fact we can think of this as a variation on the binomial +distribution. But in this case we don’t sample nn times and ask how +many successes we have, we sample as many times as we need for one +success. Later on we’ll see this is really a specific case of another +distribution, the negative binomial.

+

Let XiX_{i} denote the outcome of the ithi^{\text{th}} trial, where +success is 1 and failure is 0. Let NN be the number of trials needed to +observe the first success in a sequence of independent trials with +probability of success pp. Then

+

We fail k1k - 1 times and succeed on the kthk^{\text{th}} try. Then:

+

P(N=k)=P(X1=0,X2=0,,Xk1=0,Xk=1)=(1p)k1pP(N = k) = P\left( X_{1} = 0,X_{2} = 0,\ldots,X_{k - 1} = 0,X_{k} = 1 \right) = (1 - p)^{k - 1}p

+

This is the probability of failures raised to the amount of failures, +times probability of success.

+

The key characteristic in these trials, we keep going until we succeed. +There’s no nn choose kk in front like the binomial distribution +because there’s exactly one sequence that gives us success.

+

Definition.

+

Let 0<p10 < p \leq 1. A random variable XX has the geometric distribution +with success parameter pp if the possible values of XX are +{1,2,3,}\left\{ 1,2,3,\ldots \right\} and XX satisfies

+

P(X=k)=(1p)k1pP(X = k) = (1 - p)^{k - 1}p

+

for positive integers kk. Abbreviate this by X Geom(p)X\sim\text{ Geom}(p).

+

Example.

+

What is the probability it takes more than seven rolls of a fair die to +roll a six?

+

Let XX be the number of rolls of a fair die until the first six. Then +X Geom(16)X\sim\text{ Geom}\left( \frac{1}{6} \right). Now we just want +P(X>7)P(X > 7).

+

P(X>7)=k=8P(X=k)=k=8(56)k116P(X > 7) = \sum_{k = 8}^{\infty}P(X = k) = \sum_{k = 8}^{\infty}\left( \frac{5}{6} \right)^{k - 1}\frac{1}{6}

+

Re-indexing,

+

k=8(56)k116=16(56)7j=0(56)j\sum_{k = 8}^{\infty}\left( \frac{5}{6} \right)^{k - 1}\frac{1}{6} = \frac{1}{6}\left( \frac{5}{6} \right)^{7}\sum_{j = 0}^{\infty}\left( \frac{5}{6} \right)^{j}

+

Now we calculate by standard methods:

+

16(56)7j=0(56)j=16(56)71156=(56)7\frac{1}{6}\left( \frac{5}{6} \right)^{7}\sum_{j = 0}^{\infty}\left( \frac{5}{6} \right)^{j} = \frac{1}{6}\left( \frac{5}{6} \right)^{7} \cdot \frac{1}{1 - \frac{5}{6}} = \left( \frac{5}{6} \right)^{7}

+

Negative binomial

+

As promised, here’s the negative binomial.

+

Consider a sequence of Bernoulli trials with the following +characteristics:

+ +

Then if XX is the number of trials necessary until rr successes are +observed, we say XX is a negative binomial random variable.

+

Immediately we see that the geometric distribution is just the negative +binomial with r=1r = 1.

+

Definition.

+

Let k+k \in {\mathbb{Z}}^{+} and 0<p10 < p \leq 1. A random variable XX +has the negative binomial distribution with parameters +{k,p}\left\{ k,p \right\} if the possible values of XX are the integers +{k,k+1,k+2,}\left\{ k,k + 1,k + 2,\ldots \right\} and the p.m.f. is

+

P(X=n)=(n1k1)pk(1p)nk for nkP(X = n) = \begin{pmatrix} +n - 1 \\ +k - 1 +\end{pmatrix}p^{k}(1 - p)^{n - k}\text{ for }n \geq k

+

Abbreviate this by X Negbin(k,p)X\sim\text{ Negbin}(k,p).

+

Example.

+

Steph Curry has a three point percentage of approx. 43%43\%. What is the +probability that Steph makes his third three-point basket on his +5th5^{\text{th}} attempt?

+

Let XX be number of attempts required to observe the 3rd success. Then,

+

X Negbin(k=3,p=0.43)X\sim\text{ Negbin}(k = 3,p = 0.43)

+

So, P(X=5)=(5131)(0.43)3(10.43)53=(42)(0.43)3(0.57)20.155\begin{aligned} +P(X = 5) & = {\begin{pmatrix} +5 - 1 \\ +3 - 1 +\end{pmatrix}(0.43)}^{3}(1 - 0.43)^{5 - 3} \\ + & = \begin{pmatrix} +4 \\ +2 +\end{pmatrix}(0.43)^{3}(0.57)^{2} \\ + & \approx 0.155 +\end{aligned}

+

Poisson distribution

+

This p.m.f. follows from the Taylor expansion

+

eλ=k=0λkk!e^{\lambda} = \sum_{k = 0}^{\infty}\frac{\lambda^{k}}{k!}

+

which implies that

+

k=0eλλkk!=eλeλ=1\sum_{k = 0}^{\infty}e^{- \lambda}\frac{\lambda^{k}}{k!} = e^{- \lambda}e^{\lambda} = 1

+

Definition.

+

For an integer valued random variable XX, we say +X Poisson(λ)X\sim\text{ Poisson}(\lambda) if it has p.m.f.

+

P(X=k)=eλλkk!P(X = k) = e^{- \lambda}\frac{\lambda^{k}}{k!}

+

for k{0,1,2,}k \in \left\{ 0,1,2,\ldots \right\} for λ>0\lambda > 0 and

+

k=0P(X=k)=1\sum_{k = 0}^{\infty}P(X = k) = 1

+

The Poisson arises from the Binomial. It applies in the binomial context +when nn is very large (n100n \geq 100) and pp is very small +p0.05p \leq 0.05, such that npnp is a moderate number (np<10np < 10).

+

Then XX follows a Poisson distribution with λ=np\lambda = np.

+

P(Bin(n,p)=k)P(Poisson(λ=np)=k)P\left( \text{Bin}(n,p) = k \right) \approx P\left( \text{Poisson}(\lambda = np) = k \right)

+

for k=0,1,,nk = 0,1,\ldots,n.

+

The Poisson distribution is useful for finding the probabilities of rare +events over a continuous interval of time. By knowing λ=np\lambda = np for +small nn and pp, we can calculate many probabilities.

+

Example.

+

The number of typing errors in the page of a textbook.

+

Let

+ +

What is the probability of exactly 1 error?

+

We can approximate the distribution of XX with a +Poisson(λ=0.1)\text{Poisson}(\lambda = 0.1) distribution

+

P(X=1)=e0.1(0.1)11!=0.09048P(X = 1) = \frac{e^{- 0.1}(0.1)^{1}}{1!} = 0.09048

+

Continuous distributions

+

All of the distributions we’ve been analyzing have been discrete, that +is, they apply to random variables with a +countable state space. +Even when the state space is infinite, as in the negative binomial, it +is countable. We can think of it as indexing each trial with a natural +number 0,1,2,3,0,1,2,3,\ldots.

+

Now we turn our attention to continuous random variables that operate on +uncountably infinite state spaces. For example, if we sample uniformly +inside of the interval [0,1]\lbrack 0,1\rbrack, there are an uncountably +infinite number of possible values we could obtain. We cannot index +these values by the natural numbers, by some theorems of set theory we +in fact know that the interval [0,1]\lbrack 0,1\rbrack has a bijection to +\mathbb{R} and has cardinality א1א_{1}.

+

Additionally we notice that asking for the probability that we pick a +certain point in the interval [0,1]\lbrack 0,1\rbrack makes no sense, there +are an infinite amount of sample points! Intuitively we should think +that the probability of choosing any particular point is 0. However, we +should be able to make statements about whether we can choose a point +that lies within a subset, like [0,0.5]\lbrack 0,0.5\rbrack.

+

Let’s formalize these ideas.

+

Definition.

+

Let XX be a random variable. If we have a function ff such that

+

P(Xb)=bf(x)dxP(X \leq b) = \int_{- \infty}^{b}f(x)dx for all +bb \in {\mathbb{R}}, then ff is the probability density function +of XX.

+

The probability that the value of XX lies in (,b]( - \infty,b\rbrack +equals the area under the curve of ff from - \infty to bb.

+

If ff satisfies this definition, then for any BB \subset {\mathbb{R}} +for which integration makes sense,

+

P(XB)=Bf(x)dxP(X \in B) = \int_{B}f(x)dx

+

Remark.

+

Recall from our previous discussion of random variables that the PDF is +the analogue of the PMF for discrete random variables.

+

Properties of a CDF:

+

Any CDF F(x)=P(Xx)F(x) = P(X \leq x) satisfies

+
    +
  1. Integrates to unity: F()=0F( - \infty) = 0, F()=1F(\infty) = 1

  2. +
  3. F(x)F(x) is non-decreasing in xx (monotonically increasing)

  4. +
+

s<tF(s)F(t)s < t \Rightarrow F(s) \leq F(t)

+
    +
  1. P(a<Xb)=P(Xb)P(Xa)=F(b)F(a)P(a < X \leq b) = P(X \leq b) - P(X \leq a) = F(b) - F(a)
  2. +
+

Like we mentioned before, we can only ask about things like +P(Xk)P(X \leq k), but not P(X=k)P(X = k). In fact P(X=k)=0P(X = k) = 0 for all kk. +An immediate corollary of this is that we can freely interchange \leq +and << and likewise for \geq and >>, since P(Xk)=P(X<k)P(X \leq k) = P(X < k) +if P(X=k)=0P(X = k) = 0.

+

Example.

+

Let XX be a continuous random variable with density (pdf)

+

f(x)={cx2for 0<x<20otherwise f(x) = \begin{cases} +cx^{2} & \text{for }0 < x < 2 \\ +0 & \text{otherwise } +\end{cases}

+
    +
  1. What is cc?
  2. +
+

cc is such that +1=f(x)dx=02cx2dx1 = \int_{- \infty}^{\infty}f(x)dx = \int_{0}^{2}cx^{2}dx

+
    +
  1. Find the probability that XX is between 1 and 1.4.
  2. +
+

Integrate the curve between 1 and 1.4.

+

11.438x2dx=(x38)|11.4=0.218\begin{array}{r} +\int_{1}^{1.4}\frac{3}{8}x^{2}dx = \left( \frac{x^{3}}{8} \right)|_{1}^{1.4} \\ + = 0.218 +\end{array}

+

This is the probability that XX lies between 1 and 1.4.

+
    +
  1. Find the probability that XX is between 1 and 3.
  2. +
+

Idea: integrate between 1 and 3, be careful after 2.

+

1238x2dx+230dx=\int_{1}^{2}\frac{3}{8}x^{2}dx + \int_{2}^{3}0dx =

+
    +
  1. What is the CDF for P(Xx)P(X \leq x)? Integrate the curve to xx.
  2. +
+

F(x)=P(Xx)=xf(t)dt=0x38t2dt=x38\begin{array}{r} +F(x) = P(X \leq x) = \int_{- \infty}^{x}f(t)dt \\ + = \int_{0}^{x}\frac{3}{8}t^{2}dt \\ + = \frac{x^{3}}{8} +\end{array}

+

Important: include the range!

+

F(x)={0for x0x38for 0<x<21for x2F(x) = \begin{cases} +0 & \text{for }x \leq 0 \\ +\frac{x^{3}}{8} & \text{for }0 < x < 2 \\ +1 & \text{for }x \geq 2 +\end{cases}

+
    +
  1. Find a point aa such that you integrate up to the point to find +exactly 12\frac{1}{2}
  2. +
+

the area.

+

We want to find 12=P(Xa)\frac{1}{2} = P(X \leq a).

+

12=P(Xa)=F(a)=a38a=43\frac{1}{2} = P(X \leq a) = F(a) = \frac{a^{3}}{8} \Rightarrow a = \sqrt[3]{4}

+

Now let us discuss some named continuous distributions.

+

The (continuous) uniform distribution

+

The most simple and the best of the named distributions!

+

Definition.

+

Let [a,b]\lbrack a,b\rbrack be a bounded interval on the real line. A +random variable XX has the uniform distribution on the interval +[a,b]\lbrack a,b\rbrack if XX has the density function

+

f(x)={1bafor x[a,b]0for x[a,b]f(x) = \begin{cases} +\frac{1}{b - a} & \text{for }x \in \lbrack a,b\rbrack \\ +0 & \text{for }x \notin \lbrack a,b\rbrack +\end{cases}

+

Abbreviate this by X Unif [a,b]X\sim\text{ Unif }\lbrack a,b\rbrack.

+

The graph of Unif [a,b]\text{Unif }\lbrack a,b\rbrack is a constant line at +height 1ba\frac{1}{b - a} defined across [a,b]\lbrack a,b\rbrack. The +integral is just the area of a rectangle, and we can check it is 1.

+

Fact.

+

For X Unif [a,b]X\sim\text{ Unif }\lbrack a,b\rbrack, its cumulative distribution +function (CDF) is given by:

+

Fx(x)={0for x<axabafor x[a,b]1for x>bF_{x}(x) = \begin{cases} +0 & \text{for }x < a \\ +\frac{x - a}{b - a} & \text{for }x \in \lbrack a,b\rbrack \\ +1 & \text{for }x > b +\end{cases}

+

Fact.

+

If X Unif [a,b]X\sim\text{ Unif }\lbrack a,b\rbrack, and +[c,d][a,b]\lbrack c,d\rbrack \subset \lbrack a,b\rbrack, then +P(cXd)=cd1badx=dcbaP(c \leq X \leq d) = \int_{c}^{d}\frac{1}{b - a}dx = \frac{d - c}{b - a}

+

Example.

+

Let YY be a uniform random variable on [2,5]\lbrack - 2,5\rbrack. Find the +probability that its absolute value is at least 1.

+

YY takes values in the interval [2,5]\lbrack - 2,5\rbrack, so the absolute +value is at least 1 iff. +Y[2,1][1,5]Y \in \lbrack - 2,1\rbrack \cup \lbrack 1,5\rbrack.

+

The density function of YY is +f(x)=15(2)=17f(x) = \frac{1}{5 - ( - 2)} = \frac{1}{7} on [2,5]\lbrack - 2,5\rbrack +and 0 everywhere else.

+

So,

+

P(|Y|1)=P(Y[2,1][1,5])=P(2Y1)+P(1Y5)=57\begin{aligned} +P\left( |Y| \geq 1 \right) & = P\left( Y \in \lbrack - 2, - 1\rbrack \cup \lbrack 1,5\rbrack \right) \\ + & = P( - 2 \leq Y \leq - 1) + P(1 \leq Y \leq 5) \\ + & = \frac{5}{7} +\end{aligned}

+

The exponential distribution

+

The geometric distribution can be viewed as modeling waiting times, in a +discrete setting, i.e. we wait for n1n - 1 failures to arrive at the +nthn^{\text{th}} success.

+

The exponential distribution is the continuous analogue to the geometric +distribution, in that we often use it to model waiting times in the +continuous sense. For example, the first custom to enter the barber +shop.

+

Definition.

+

Let 0<λ<0 < \lambda < \infty. A random variable XX has the exponential +distribution with parameter λ\lambda if XX has PDF

+

f(x)={λeλxfor x00for x<0f(x) = \begin{cases} +\lambda e^{- \lambda x} & \text{for }x \geq 0 \\ +0 & \text{for }x < 0 +\end{cases}

+

Abbreviate this by X Exp(λ)X\sim\text{ Exp}(\lambda), the exponential +distribution with rate λ\lambda.

+

The CDF of the Exp(λ)\text{Exp}(\lambda) distribution is given by:

+

F(t)+{0if t<01eλtif t0F(t) + \begin{cases} +0 & \text{if }t < 0 \\ +1 - e^{- \lambda t} & \text{if }t \geq 0 +\end{cases}

+

Example.

+

Suppose the length of a phone call, in minutes, is well modeled by an +exponential random variable with a rate λ=110\lambda = \frac{1}{10}.

+
    +
  1. What is the probability that a call takes more than 8 minutes?

  2. +
  3. What is the probability that a call takes between 8 and 22 minutes?

  4. +
+

Let XX be the length of the phone call, so that +X Exp(110)X\sim\text{ Exp}\left( \frac{1}{10} \right). Then we can find the +desired probability by:

+

P(X>8)=1P(X8)=1Fx(8)=1(1e(110)8)=e8100.4493\begin{aligned} +P(X > 8) & = 1 - P(X \leq 8) \\ + & = 1 - F_{x}(8) \\ + & = 1 - \left( 1 - e^{- \left( \frac{1}{10} \right) \cdot 8} \right) \\ + & = e^{- \frac{8}{10}} \approx 0.4493 +\end{aligned}

+

Now to find P(8<X<22)P(8 < X < 22), we can take the difference in CDFs:

+

P(X>8)P(X22)=e810e22100.3385\begin{aligned} + & P(X > 8) - P(X \geq 22) \\ + & = e^{- \frac{8}{10}} - e^{- \frac{22}{10}} \\ + & \approx 0.3385 +\end{aligned}

+

Fact (Memoryless property of the exponential distribution).

+

Suppose that X Exp(λ)X\sim\text{ Exp}(\lambda). Then for any s,t>0s,t > 0, we +have P(X>t+s|X>t)=P(X>s)P\left( X > t + s~|~X > t \right) = P(X > s)

+

This is like saying if I’ve been waiting 5 minutes and then 3 minutes +for the bus, what is the probability that I’m gonna wait more than 5 + 3 +minutes, given that I’ve already waited 5 minutes? And that’s precisely +equal to just the probability I’m gonna wait more than 3 minutes.

+

Proof.

+

P(X>t+s|X>t)=P(X>t+sX>t)P(X>t)=P(X>t+s)P(X>t)=eλ(t+s)eλt=eλsP(X>s)\begin{array}{r} +P\left( X > t + s~|~X > t \right) = \frac{P(X > t + s \cap X > t)}{P(X > t)} \\ + = \frac{P(X > t + s)}{P(X > t)} = \frac{e^{- \lambda(t + s)}}{e^{- \lambda t}} = e^{- \lambda s} \\ + \equiv P(X > s) +\end{array}

+

Gamma distribution

+

Definition.

+

Let r,λ>0r,\lambda > 0. A random variable XX has the gamma +distribution with parameters (r,λ)(r,\lambda) if XX is nonnegative and +has probability density function

+

f(x)={λrxr2Γ(r)eλxfor x00for x<0f(x) = \begin{cases} +\frac{\lambda^{r}x^{r - 2}}{\Gamma(r)}e^{- \lambda x} & \text{for }x \geq 0 \\ +0 & \text{for }x < 0 +\end{cases}

+

Abbreviate this by X Gamma(r,λ)X\sim\text{ Gamma}(r,\lambda).

+

The gamma function Γ(r)\Gamma(r) generalizes the factorial function and is +defined as

+

Γ(r)=0xr1exdx, for r>0\Gamma(r) = \int_{0}^{\infty}x^{r - 1}e^{- x}dx,\text{ for }r > 0

+

Special case: Γ(n)=(n1)!\Gamma(n) = (n - 1)! if n+n \in {\mathbb{Z}}^{+}.

+

Remark.

+

The Exp(λ)\text{Exp}(\lambda) distribution is a special case of the gamma +distribution, with parameter r=1r = 1.

+

The normal distribution

+

Also known as the Gaussian distribution, this is so important it gets +its own section.

+

Definition.

+

A random variable ZZ has the standard normal distribution if ZZ +has density function

+

φ(x)=12πex22\varphi(x) = \frac{1}{\sqrt{2\pi}}e^{- \frac{x^{2}}{2}} on the real +line. Abbreviate this by ZN(0,1)Z\sim N(0,1).

+

Fact (CDF of a standard normal random variable).

+

Let ZN(0,1)Z\sim N(0,1) be normally distributed. Then its CDF is given by +Φ(x)=xφ(s)ds=x12πe(s2)2ds\Phi(x) = \int_{- \infty}^{x}\varphi(s)ds = \int_{- \infty}^{x}\frac{1}{\sqrt{2\pi}}e^{\frac{- \left( - s^{2} \right)}{2}}ds

+

The normal distribution is so important, instead of the standard +fZ(x)f_{Z(x)} and Fz(x)F_{z(x)}, we use the special φ(x)\varphi(x) and +Φ(x)\Phi(x).

+

Fact.

+

es22ds=2π\int_{- \infty}^{\infty}e^{- \frac{s^{2}}{2}}ds = \sqrt{2\pi}

+

No closed form of the standard normal CDF Φ\Phi exists, so we are left +to either:

+ +

To evaluate negative values, we can use the symmetry of the normal +distribution to apply the following identity:

+

Φ(x)=1Φ(x)\Phi( - x) = 1 - \Phi(x)

+

General normal distributions

+

We can compute any other parameters of the normal distribution using the +standard normal.

+

The general family of normal distributions is obtained by linear or +affine transformations of ZZ. Let μ\mu be real, and σ>0\sigma > 0, then

+

X=σZ+μX = \sigma Z + \mu is also a normally distributed random variable +with parameters (μ,σ2)\left( \mu,\sigma^{2} \right). The CDF of XX in terms +of Φ()\Phi( \cdot ) can be expressed as

+

FX(x)=P(Xx)=P(σZ+μx)=P(Zxμσ)=Φ(xμσ)\begin{aligned} +F_{X}(x) & = P(X \leq x) \\ + & = P(\sigma Z + \mu \leq x) \\ + & = P\left( Z \leq \frac{x - \mu}{\sigma} \right) \\ + & = \Phi(\frac{x - \mu}{\sigma}) +\end{aligned}

+

Also,

+

f(x)=F(x)=ddx[Φ(xuσ)]=1σφ(xuσ)=12πσ2e((xμ)2)2σ2f(x) = F\prime(x) = \frac{d}{dx}\left\lbrack \Phi(\frac{x - u}{\sigma}) \right\rbrack = \frac{1}{\sigma}\varphi(\frac{x - u}{\sigma}) = \frac{1}{\sqrt{2\pi\sigma^{2}}}e^{\frac{- \left( (x - \mu)^{2} \right)}{2\sigma^{2}}}

+

Definition.

+

Let μ\mu be real and σ>0\sigma > 0. A random variable XX has the +normal distribution with mean μ\mu and variance σ2\sigma^{2} if XX +has density function

+

f(x)=12πσ2e((xμ)2)2σ2f(x) = \frac{1}{\sqrt{2\pi\sigma^{2}}}e^{\frac{- \left( (x - \mu)^{2} \right)}{2\sigma^{2}}}

+

on the real line. Abbreviate this by +XN(μ,σ2)X\sim N\left( \mu,\sigma^{2} \right).

+

Fact.

+

Let XN(μ,σ2)X\sim N\left( \mu,\sigma^{2} \right) and Y=aX+bY = aX + b. Then +YN(aμ+b,a2σ2)Y\sim N\left( a\mu + b,a^{2}\sigma^{2} \right)

+

That is, YY is normally distributed with parameters +(aμ+b,a2σ2)\left( a\mu + b,a^{2}\sigma^{2} \right). In particular, +Z=XμσN(0,1)Z = \frac{X - \mu}{\sigma}\sim N(0,1) is a standard normal variable.

+

Expectation

+

Let’s discuss the expectation of a random variable, which is a similar +idea to the basic concept of mean.

+

Definition.

+

The expectation or mean of a discrete random variable XX is the +weighted average, with weights assigned by the corresponding +probabilities.

+

E(X)=all xixip(xi)E(X) = \sum_{\text{all }x_{i}}x_{i} \cdot p\left( x_{i} \right)

+

Example.

+

Find the expected value of a single roll of a fair die.

+ +

E[x]=116+216+616E\lbrack x\rbrack = 1 \cdot \frac{1}{6} + 2 \cdot \frac{1}{6}\ldots + 6 \cdot \frac{1}{6}

+

Binomial expected value

+

E[x]=npE\lbrack x\rbrack = np

+

Bernoulli expected value

+

Bernoulli is just binomial with one trial.

+

Recall that P(X=1)=pP(X = 1) = p and P(X=0)=1pP(X = 0) = 1 - p.

+

E[X]=1P(X=1)+0P(X=0)=pE\lbrack X\rbrack = 1 \cdot P(X = 1) + 0 \cdot P(X = 0) = p

+

Let AA be an event on Ω\Omega. Its indicator random variable IAI_{A} +is defined for ωΩ\omega \in \Omega by

+

IA(ω)={1, if ωA0, if ωAI_{A}(\omega) = \begin{cases} +1\text{, if } & \omega \in A \\ +0\text{, if } & \omega \notin A +\end{cases}

+

E[IA]=1P(A)=P(A)E\left\lbrack I_{A} \right\rbrack = 1 \cdot P(A) = P(A)

+

Geometric expected value

+

Let p[0,1]p \in \lbrack 0,1\rbrack and X Geom[p]X\sim\text{ Geom}\lbrack p\rbrack +be a geometric RV with probability of success pp. Recall that the +p.m.f. is pqk1pq^{k - 1}, where prob. of failure is defined by +q1pq ≔ 1 - p.

+

Then

+

E[X]=k=1kpqk1=pk=1kqk1\begin{aligned} +E\lbrack X\rbrack & = \sum_{k = 1}^{\infty}kpq^{k - 1} \\ + & = p \cdot \sum_{k = 1}^{\infty}k \cdot q^{k - 1} +\end{aligned}

+

Now recall from calculus that you can differentiate a power series term +by term inside its radius of convergence. So for |t|<1|t| < 1,

+

k=1ktk1=k=1ddttk=ddtk=1tk=ddt(11t)=1(1t)2E[x]=k=1kpqk1=pk=1kqk1=p(1(1q)2)=1p\begin{array}{r} +\sum_{k = 1}^{\infty}kt^{k - 1} = \sum_{k = 1}^{\infty}\frac{d}{dt}t^{k} = \frac{d}{dt}\sum_{k = 1}^{\infty}t^{k} = \frac{d}{dt}\left( \frac{1}{1 - t} \right) = \frac{1}{(1 - t)^{2}} \\ +\therefore E\lbrack x\rbrack = \sum_{k = 1}^{\infty}kpq^{k - 1} = p\sum_{k = 1}^{\infty}kq^{k - 1} = p\left( \frac{1}{(1 - q)^{2}} \right) = \frac{1}{p} +\end{array}

+

Expected value of a continuous RV

+

Definition.

+

The expectation or mean of a continuous random variable XX with density +function ff is

+

E[x]=xf(x)dxE\lbrack x\rbrack = \int_{- \infty}^{\infty}x \cdot f(x)dx

+

An alternative symbol is μ=E[x]\mu = E\lbrack x\rbrack.

+

μ\mu is the “first moment” of XX, analogous to physics, it’s the +“center of gravity” of XX.

+

Remark.

+

In general when moving between discrete and continuous RV, replace sums +with integrals, p.m.f. with p.d.f., and vice versa.

+

Example.

+

Suppose XX is a continuous RV with p.d.f.

+

fX(x)={2x, 0<x<10, elsewheref_{X}(x) = \begin{cases} +2x\text{, } & 0 < x < 1 \\ +0\text{, } & \text{elsewhere} +\end{cases}

+

E[X]=xf(x)dx=01x2xdx=23E\lbrack X\rbrack = \int_{- \infty}^{\infty}x \cdot f(x)dx = \int_{0}^{1}x \cdot 2xdx = \frac{2}{3}

+

Example (Uniform expectation).

+

Let XX be a uniform random variable on the interval +[a,b]\lbrack a,b\rbrack with X Unif[a,b]X\sim\text{ Unif}\lbrack a,b\rbrack. Find +the expected value of XX.

+

E[X]=xf(x)dx=abxbadx=1baabxdx=1bab2a22=b+a2 midpoint formula\begin{array}{r} +E\lbrack X\rbrack = \int_{- \infty}^{\infty}x \cdot f(x)dx = \int_{a}^{b}\frac{x}{b - a}dx \\ + = \frac{1}{b - a}\int_{a}^{b}xdx = \frac{1}{b - a} \cdot \frac{b^{2} - a^{2}}{2} = \underset{\text{ midpoint formula}}{\underbrace{\frac{b + a}{2}}} +\end{array}

+

Example (Exponential expectation).

+

Find the expected value of an exponential RV, with p.d.f.

+

fX(x)={λeλx, x>00, elsewheref_{X}(x) = \begin{cases} +\lambda e^{- \lambda x}\text{, } & x > 0 \\ +0\text{, } & \text{elsewhere} +\end{cases}

+

E[x]=xf(x)dx=0xλeλxdx=λ0xeλxdx=λ[x1λeλx|x=0x=01λeλxdx]=1λ\begin{array}{r} +E\lbrack x\rbrack = \int_{- \infty}^{\infty}x \cdot f(x)dx = \int_{0}^{\infty}x \cdot \lambda e^{- \lambda x}dx \\ + = \lambda \cdot \int_{0}^{\infty}x \cdot e^{- \lambda x}dx \\ + = \lambda \cdot \left\lbrack \left. -x\frac{1}{\lambda}e^{- \lambda x} \right|_{x = 0}^{x = \infty} - \int_{0}^{\infty} - \frac{1}{\lambda}e^{- \lambda x}dx \right\rbrack \\ + = \frac{1}{\lambda} +\end{array}

+

Example (Uniform dartboard).

+

Our dartboard is a disk of radius r0r_{0} and the dart lands uniformly +at random on the disk when thrown. Let RR be the distance of the dart +from the center of the disk. Find E[R]E\lbrack R\rbrack given density +function

+

fR(t)={2tr02, 0tr00, t<0 or t>r0f_{R}(t) = \begin{cases} +\frac{2t}{r_{0}^{2}}\text{, } & 0 \leq t \leq r_{0} \\ +0\text{, } & t < 0\text{ or }t > r_{0} +\end{cases}

+

E[R]=tfR(t)dt=0r0t2tr02dt=23r0\begin{array}{r} +E\lbrack R\rbrack = \int_{- \infty}^{\infty}tf_{R}(t)dt \\ + = \int_{0}^{r_{0}}t \cdot \frac{2t}{r_{0}^{2}}dt \\ + = \frac{2}{3}r_{0} +\end{array}

+

Expectation of derived values

+

If we can find the expected value of XX, can we find the expected value +of X2X^{2}? More precisely, can we find +E[X2]E\left\lbrack X^{2} \right\rbrack?

+

If the distribution is easy to see, then this is trivial. Otherwise we +have the following useful property:

+

E[X2]=all xx2fX(x)dxE\left\lbrack X^{2} \right\rbrack = \int_{\text{all }x}x^{2}f_{X}(x)dx

+

(for continuous RVs).

+

And in the discrete case,

+

E[X2]=all xx2pX(x)E\left\lbrack X^{2} \right\rbrack = \sum_{\text{all }x}x^{2}p_{X}(x)

+

In fact E[X2]E\left\lbrack X^{2} \right\rbrack is so important that we call +it the mean square.

+

Fact.

+

More generally, a real valued function g(X)g(X) defined on the range of +XX is itself a random variable (with its own distribution).

+

We can find expected value of g(X)g(X) by

+

E[g(x)]=g(x)f(x)dxE\left\lbrack g(x) \right\rbrack = \int_{- \infty}^{\infty}g(x)f(x)dx

+

or

+

E[g(x)]=all xg(x)f(x)E\left\lbrack g(x) \right\rbrack = \sum_{\text{all }x}g(x)f(x)

+

Example.

+

You roll a fair die to determine the winnings (or losses) WW of a +player as follows:

+

W={1,iftherollis1,2,or31,iftherollisa43,iftherollis5or6W = \begin{cases} + - 1,\ if\ the\ roll\ is\ 1,\ 2,\ or\ 3 \\ +1,\ if\ the\ roll\ is\ a\ 4 \\ +3,\ if\ the\ roll\ is\ 5\ or\ 6 +\end{cases}

+

What is the expected winnings/losses for the player during 1 roll of the +die?

+

Let XX denote the outcome of the roll of the die. Then we can define +our random variable as W=g(X)W = g(X) where the function gg is defined by +g(1)=g(2)=g(3)=1g(1) = g(2) = g(3) = - 1 and so on.

+

Note that P(W=1)=P(X=1X=2X=3)=12P(W = - 1) = P(X = 1 \cup X = 2 \cup X = 3) = \frac{1}{2}. +Likewise P(W=1)=P(X=4)=16P(W = 1) = P(X = 4) = \frac{1}{6}, and +P(W=3)=P(X=5X=6)=13P(W = 3) = P(X = 5 \cup X = 6) = \frac{1}{3}.

+

Then E[g(X)]=E[W]=(1)P(W=1)+(1)P(W=1)+(3)P(W=3)=12+16+1=23\begin{array}{r} +E\left\lbrack g(X) \right\rbrack = E\lbrack W\rbrack = ( - 1) \cdot P(W = - 1) + (1) \cdot P(W = 1) + (3) \cdot P(W = 3) \\ + = - \frac{1}{2} + \frac{1}{6} + 1 = \frac{2}{3} +\end{array}

+

Example.

+

A stick of length ll is broken at a uniformly chosen random location. +What is the expected length of the longer piece?

+

Idea: if you break it before the halfway point, then the longer piece +has length given by lxl - x. If you break it after the halfway point, +the longer piece has length xx.

+

Let the interval [0,l]\lbrack 0,l\rbrack represent the stick and let +X Unif[0,l]X\sim\text{ Unif}\lbrack 0,l\rbrack be the location where the stick is +broken. Then XX has density f(x)=1lf(x) = \frac{1}{l} on +[0,l]\lbrack 0,l\rbrack and 0 elsewhere.

+

Let g(x)g(x) be the length of the longer piece when the stick is broken at +xx,

+

g(x)={1x, 0x<l2x, 12xlg(x) = \begin{cases} +1 - x\text{, } & 0 \leq x < \frac{l}{2} \\ +x\text{, } & \frac{1}{2} \leq x \leq l +\end{cases}

+

Then E[g(X)]=g(x)f(x)dx=0l2lxldx+l2lxldx=34l\begin{array}{r} +E\left\lbrack g(X) \right\rbrack = \int_{- \infty}^{\infty}g(x)f(x)dx = \int_{0}^{\frac{l}{2}}\frac{l - x}{l}dx + \int_{\frac{l}{2}}^{l}\frac{x}{l}dx \\ + = \frac{3}{4}l +\end{array}

+

So we expect the longer piece to be 34\frac{3}{4} of the total length, +which is a bit pathological.

+

Moments of a random variable

+

We continue discussing expectation but we introduce new terminology.

+

Fact.

+

The nthn^{\text{th}} moment (or nthn^{\text{th}} raw moment) of a discrete +random variable XX with p.m.f. pX(x)p_{X}(x) is the expectation

+

E[Xn]=kknpX(k)=μnE\left\lbrack X^{n} \right\rbrack = \sum_{k}k^{n}p_{X}(k) = \mu_{n}

+

If XX is continuous, then we have analogously

+

E[Xn]=xnfX(x)=μnE\left\lbrack X^{n} \right\rbrack = \int_{- \infty}^{\infty}x^{n}f_{X}(x) = \mu_{n}

+

The deviation is given by σ\sigma and the variance is given by +σ2\sigma^{2} and

+

σ2=μ2(μ1)2\sigma^{2} = \mu_{2} - \left( \mu_{1} \right)^{2}

+

μ3\mu_{3} is used to measure “skewness” / asymmetry of a distribution. +For example, the normal distribution is very symmetric.

+

μ4\mu_{4} is used to measure kurtosis/peakedness of a distribution.

+

Central moments

+

Previously we discussed “raw moments.” Be careful not to confuse them +with central moments.

+

Fact.

+

The nthn^{\text{th}} central moment of a discrete random variable XX +with p.m.f. pX(x)p_{X}(x) is the expected value of the difference about the +mean raised to the nthn^{\text{th}} power

+

E[(Xμ)n]=k(kμ)npX(k)=μnE\left\lbrack (X - \mu)^{n} \right\rbrack = \sum_{k}(k - \mu)^{n}p_{X}(k) = \mu\prime_{n}

+

And of course in the continuous case,

+

E[(Xμ)n]=(xμ)nfX(x)=μnE\left\lbrack (X - \mu)^{n} \right\rbrack = \int_{- \infty}^{\infty}(x - \mu)^{n}f_{X}(x) = \mu\prime_{n}

+

In particular,

+

μ1=E[(Xμ)1]=(xμ)1fX(x)dx=xfX(x)dx=μfX(x)dx=μμ1=0μ2=E[(Xμ)2]=σX2= Var(X)\begin{array}{r} +\mu\prime_{1} = E\left\lbrack (X - \mu)^{1} \right\rbrack = \int_{- \infty}^{\infty}(x - \mu)^{1}f_{X}(x)dx \\ + = \int_{\infty}^{\infty}xf_{X}(x)dx = \int_{- \infty}^{\infty}\mu f_{X}(x)dx = \mu - \mu \cdot 1 = 0 \\ +\mu\prime_{2} = E\left\lbrack (X - \mu)^{2} \right\rbrack = \sigma_{X}^{2} = \text{ Var}(X) +\end{array}

+

Example.

+

Let YY be a uniformly chosen integer from +{0,1,2,,m}\left\{ 0,1,2,\ldots,m \right\}. Find the first and second moment of +YY.

+

The p.m.f. of YY is pY(k)=1m+1p_{Y}(k) = \frac{1}{m + 1} for +k[0,m]k \in \lbrack 0,m\rbrack. Thus,

+

E[Y]=k=0mk1m+1=1m+1k=0mk=m2\begin{array}{r} +E\lbrack Y\rbrack = \sum_{k = 0}^{m}k\frac{1}{m + 1} = \frac{1}{m + 1}\sum_{k = 0}^{m}k \\ + = \frac{m}{2} +\end{array}

+

Then,

+

E[Y2]=k=0mk21m+1=1m+1=m(2m+1)6E\left\lbrack Y^{2} \right\rbrack = \sum_{k = 0}^{m}k^{2}\frac{1}{m + 1} = \frac{1}{m + 1} = \frac{m(2m + 1)}{6}

+

Example.

+

Let c>0c > 0 and let UU be a uniform random variable on the interval +[0,c]\lbrack 0,c\rbrack. Find the nthn^{\text{th}} moment for UU for all +positive integers nn.

+

The density function of UU is

+

f(x)={1c, if x[0,c]0, otherwisef(x) = \begin{cases} +\frac{1}{c}\text{, if } & x \in \lbrack 0,c\rbrack \\ +0\text{, } & \text{otherwise} +\end{cases}

+

Therefore the nthn^{\text{th}} moment of UU is,

+

E[Un]=xnf(x)dxE\left\lbrack U^{n} \right\rbrack = \int_{- \infty}^{\infty}x^{n}f(x)dx

+

Example.

+

Suppose the random variable X Exp(λ)X\sim\text{ Exp}(\lambda). Find the second +moment of XX.

+

E[X2]=0x2λeλxdx=1λ20u2eudu=1λ2Γ(2+1)=2!λ2\begin{array}{r} +E\left\lbrack X^{2} \right\rbrack = \int_{0}^{\infty}x^{2}\lambda e^{- \lambda x}dx \\ + = \frac{1}{\lambda^{2}}\int_{0}^{\infty}u^{2}e^{- u}du \\ + = \frac{1}{\lambda^{2}}\Gamma(2 + 1) = \frac{2!}{\lambda^{2}} +\end{array}

+

Fact.

+

In general, to find teh nthn^{\text{th}} moment of +X Exp(λ)X\sim\text{ Exp}(\lambda), +E[Xn]=0xnλeλxdx=n!λnE\left\lbrack X^{n} \right\rbrack = \int_{0}^{\infty}x^{n}\lambda e^{- \lambda x}dx = \frac{n!}{\lambda^{n}}

+

Median and quartiles

+

When a random variable has rare (abnormal) values, its expectation may +be a bad indicator of where the center of the distribution lies.

+

Definition.

+

The median of a random variable XX is any real value mm that +satisfies

+

P(Xm)12 and P(Xm)12P(X \geq m) \geq \frac{1}{2}\text{ and }P(X \leq m) \geq \frac{1}{2}

+

With half the probability on both {Xm}\left\{ X \leq m \right\} and +{Xm}\left\{ X \geq m \right\}, the median is representative of the +midpoint of the distribution. We say that the median is more robust +because it is less affected by outliers. It is not necessarily unique.

+

Example.

+

Let XX be discretely uniformly distributed in the set +{100,1,2,,3,,9}\left\{ - 100,1,2,,3,\ldots,9 \right\} so XX has probability mass +function pX(100)=pX(1)==pX(9)p_{X}( - 100) = p_{X}(1) = \cdots = p_{X}(9)

+

Find the expected value and median of XX.

+

E[X]=(100)110+(1)110++(9)110=5.5E\lbrack X\rbrack = ( - 100) \cdot \frac{1}{10} + (1) \cdot \frac{1}{10} + \cdots + (9) \cdot \frac{1}{10} = - 5.5

+

While the median is any number m[4,5]m \in \lbrack 4,5\rbrack.

+

The median reflects the fact that 90% of the values and probability is +in the range 1,2,,91,2,\ldots,9 while the mean is heavily influenced by the +100- 100 value.

+
+ + + + + diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..eb05362 --- /dev/null +++ b/robots.txt @@ -0,0 +1,2 @@ +User-agent: * +Disallow: diff --git a/rss.xml b/rss.xml new file mode 100644 index 0000000..0d755cb --- /dev/null +++ b/rss.xml @@ -0,0 +1,1759 @@ + + + + The Involution + https://blog.youwen.dev + + + Sun, 16 Feb 2025 00:00:00 UT + + Random variables, distributions, and probability theory + https://blog.youwen.dev/random-variables-distributions-and-probability-theory.html + +
+

+ Random variables, distributions, and probability theory +

+

+ An overview of discrete and continuous random variables and their distributions and moment generating functions +

+
2025-02-16
+
+ +
+
+

These are some notes I’ve been collecting on random variables, their +distributions, expected values, and moment generating functions. I +thought I’d write them down somewhere useful.

+

These are almost extracted verbatim from my in-class notes, which I take +in real time using Typst. I simply wrote a tiny compatibility shim to +allow Pandoc to render them to the web.

+
+

Random variables

+

First, some brief exposition on random variables. Quixotically, a random +variable is actually a function.

+

Standard notation: Ω\Omega is a sample space, ωΩ\omega \in \Omega is an +event.

+

Definition.

+

A random variable XX is a function +X:ΩX:\Omega \rightarrow {\mathbb{R}} that takes the set of possible +outcomes in a sample space, and maps it to a measurable +space, typically (as in +our case) a subset of \mathbb{R}.

+

Definition.

+

The state space of a random variable XX is all of the values XX +can take.

+

Example.

+

Let XX be a random variable that takes on the values +{0,1,2,3}\left\{ 0,1,2,3 \right\}. Then the state space of XX is the set +{0,1,2,3}\left\{ 0,1,2,3 \right\}.

+

Discrete random variables

+

A random variable XX is discrete if there is countable AA such that +P(XA)=1P(X \in A) = 1. kk is a possible value if P(X=k)>0P(X = k) > 0. We discuss +continuous random variables later.

+

The probability distribution of XX gives its important probabilistic +information. The probability distribution is a description of the +probabilities P(XB)P(X \in B) for subsets BB \in {\mathbb{R}}. We describe +the probability density function and the cumulative distribution +function.

+

A discrete random variable has probability distribution entirely +determined by its probability mass function (hereafter abbreviated p.m.f +or PMF) p(k)=P(X=k)p(k) = P(X = k). The p.m.f. is a function from the set of +possible values of XX into [0,1]\lbrack 0,1\rbrack. Labeling the p.m.f. +with the random variable is done by pX(k)p_{X}(k).

+

pX: State space of X[0,1]p_{X}:\text{ State space of }X \rightarrow \lbrack 0,1\rbrack

+

By the axioms of probability,

+

kpX(k)=kP(X=k)=1\sum_{k}p_{X}(k) = \sum_{k}P(X = k) = 1

+

For a subset BB \subset {\mathbb{R}},

+

P(XB)=kBpX(k)P(X \in B) = \sum_{k \in B}p_{X}(k)

+

Continuous random variables

+

Now as promised we introduce another major class of random variables.

+

Definition.

+

Let XX be a random variable. If ff satisfies

+

P(Xb)=bf(x)dxP(X \leq b) = \int_{- \infty}^{b}f(x)dx

+

for all bb \in {\mathbb{R}}, then ff is the probability density +function (hereafter abbreviated p.d.f. or PDF) of XX.

+

We immediately see that the p.d.f. is analogous to the p.m.f. of the +discrete case.

+

The probability that X(,b]X \in ( - \infty,b\rbrack is equal to the area +under the graph of ff from - \infty to bb.

+

A corollary is the following.

+

Fact.

+

P(XB)=Bf(x)dxP(X \in B) = \int_{B}f(x)dx

+

for any BB \subset {\mathbb{R}} where integration makes sense.

+

The set can be bounded or unbounded, or any collection of intervals.

+

Fact.

+

P(aXb)=abf(x)dxP(a \leq X \leq b) = \int_{a}^{b}f(x)dx +P(X>a)=af(x)dxP(X > a) = \int_{a}^{\infty}f(x)dx

+

Fact.

+

If a random variable XX has density function ff then individual point +values have probability zero:

+

P(X=c)=ccf(x)dx=0,cP(X = c) = \int_{c}^{c}f(x)dx = 0,\forall c \in {\mathbb{R}}

+

Remark.

+

It follows a random variable with a density function is not discrete. An +immediate corollary of this is that the probabilities of intervals are +not changed by including or excluding endpoints. So P(Xk)P(X \leq k) and +P(X<k)P(X < k) are equivalent.

+

How to determine which functions are p.d.f.s? Since +P(<X<)=1P( - \infty < X < \infty) = 1, a p.d.f. ff must satisfy

+

f(x)0xf(x)dx=1\begin{array}{r} +f(x) \geq 0\forall x \in {\mathbb{R}} \\ +\int_{- \infty}^{\infty}f(x)dx = 1 +\end{array}

+

Fact.

+

Random variables with density functions are called continuous random +variables. This does not imply that the random variable is a continuous +function on Ω\Omega but it is standard terminology.

+

Discrete distributions

+

Recall that the probability distribution of XX gives its important +probabilistic information. Let us discuss some of these distributions.

+

In general we first consider the experiment’s properties and theorize +about the distribution that its random variable takes. We can then apply +the distribution to find out various pieces of probabilistic +information.

+

Bernoulli trials

+

A Bernoulli trial is the original “experiment.” It’s simply a single +trial with a binary “success” or “failure” outcome. Encode this T/F, 0 +or 1, or however you’d like. It becomes immediately useful in defining +more complex distributions, so let’s analyze its properties.

+

The setup: the experiment has exactly two outcomes:

+
    +
  • Success – SS or 1

  • +
  • Failure – FF or 0

  • +
+

Additionally: P(S)=p,(0<p<1)P(F)=1p=q\begin{array}{r} +P(S) = p,(0 < p < 1) \\ +P(F) = 1 - p = q +\end{array}

+

Construct the probability mass function:

+

P(X=1)=pP(X=0)=1p\begin{array}{r} +P(X = 1) = p \\ +P(X = 0) = 1 - p +\end{array}

+

Write it as:

+

px(k)=pk(1p)1kp_{x(k)} = p^{k}(1 - p)^{1 - k}

+

for k=1k = 1 and k=0k = 0.

+

Binomial distribution

+

The setup: very similar to Bernoulli, trials have exactly 2 outcomes. A +bunch of Bernoulli trials in a row.

+

Importantly: pp and qq are defined exactly the same in all trials.

+

This ties the binomial distribution to the sampling with replacement +model, since each trial does not affect the next.

+

We conduct nn independent trials of this experiment. Example with +coins: each flip independently has a 12\frac{1}{2} chance of heads or +tails (holds same for die, rigged coin, etc).

+

nn is fixed, i.e. known ahead of time.

+

Binomial random variable

+

Let’s consider the random variable characterized by the binomial +distribution now.

+

Let X=#X = \# of successes in nn independent trials. For any particular +sequence of nn trials, it takes the form +Ω={ω} where ω=SFFF\Omega = \left\{ \omega \right\}\text{ where }\omega = SFF\cdots F and +is of length nn.

+

Then X(ω)=0,1,2,,nX(\omega) = 0,1,2,\ldots,n can take n+1n + 1 possible values. The +probability of any particular sequence is given by the product of the +individual trial probabilities.

+

Example.

+

ω=SFFSFS=(pqqpqp)\omega = SFFSF\cdots S = (pqqpq\cdots p)

+

So P(x=0)=P(FFFF)=qqq=qnP(x = 0) = P(FFF\cdots F) = q \cdot q \cdot \cdots \cdot q = q^{n}.

+

And P(X=1)=P(SFFF)+P(FSFFF)++P(FFFFS)=n possible outcomesp1pn1=(n1)p1pn1=np1pn1\begin{array}{r} +P(X = 1) = P(SFF\cdots F) + P(FSFF\cdots F) + \cdots + P(FFF\cdots FS) \\ + = \underset{\text{ possible outcomes}}{\underbrace{n}} \cdot p^{1} \cdot p^{n - 1} \\ + = \begin{pmatrix} +n \\ +1 +\end{pmatrix} \cdot p^{1} \cdot p^{n - 1} \\ + = n \cdot p^{1} \cdot p^{n - 1} +\end{array}

+

Now we can generalize

+

P(X=2)=(n2)p2qn2P(X = 2) = \begin{pmatrix} +n \\ +2 +\end{pmatrix}p^{2}q^{n - 2}

+

How about all successes?

+

P(X=n)=P(SSS)=pnP(X = n) = P(SS\cdots S) = p^{n}

+

We see that for all failures we have qnq^{n} and all successes we have +pnp^{n}. Otherwise we use our method above.

+

In general, here is the probability mass function for the binomial +random variable

+

P(X=k)=(nk)pkqnk, for k=0,1,2,,nP(X = k) = \begin{pmatrix} +n \\ +k +\end{pmatrix}p^{k}q^{n - k},\text{ for }k = 0,1,2,\ldots,n

+

Binomial distribution is very powerful. Choosing between two things, +what are the probabilities?

+

To summarize the characterization of the binomial random variable:

+
    +
  • nn independent trials

  • +
  • each trial results in binary success or failure

  • +
  • with probability of success pp, identically across trials

  • +
+

with X=#X = \# successes in fixed nn trials.

+

X Bin(n,p)X\sim\text{ Bin}(n,p)

+

with probability mass function

+

P(X=x)=(nx)px(1p)nx=p(x) for x=0,1,2,,nP(X = x) = \begin{pmatrix} +n \\ +x +\end{pmatrix}p^{x}(1 - p)^{n - x} = p(x)\text{ for }x = 0,1,2,\ldots,n

+

We see this is in fact the binomial theorem!

+

p(x)0,x=0np(x)=x=0n(nx)pxqnx=(p+q)np(x) \geq 0,\sum_{x = 0}^{n}p(x) = \sum_{x = 0}^{n}\begin{pmatrix} +n \\ +x +\end{pmatrix}p^{x}q^{n - x} = (p + q)^{n}

+

In fact, (p+q)n=(p+(1p))n=1(p + q)^{n} = \left( p + (1 - p) \right)^{n} = 1

+

Example.

+

What is the probability of getting exactly three aces (1’s) out of 10 +throws of a fair die?

+

Seems a little trickier but we can still write this as well defined +SS/FF. Let SS be getting an ace and FF being anything else.

+

Then p=16p = \frac{1}{6} and n=10n = 10. We want P(X=3)P(X = 3). So

+

P(X=3)=(103)p3q7=(103)(16)3(56)70.15505\begin{array}{r} +P(X = 3) = \begin{pmatrix} +10 \\ +3 +\end{pmatrix}p^{3}q^{7} = \begin{pmatrix} +10 \\ +3 +\end{pmatrix}\left( \frac{1}{6} \right)^{3}\left( \frac{5}{6} \right)^{7} \\ + \approx 0.15505 +\end{array}

+

With or without replacement?

+

I place particular emphasis on the fact that the binomial distribution +generally applies to cases where you’re sampling with replacement. +Consider the following: Example.

+

Suppose we have two types of candy, red and black. Select nn candies. +Let XX be the number of red candies among nn selected.

+

2 cases.

+
    +
  • case 1: with replacement: Binomial Distribution, nn, +p=aa+bp = \frac{a}{a + b}.
  • +
+

P(X=2)=(n2)(aa+b)2(ba+b)n2P(X = 2) = \begin{pmatrix} +n \\ +2 +\end{pmatrix}\left( \frac{a}{a + b} \right)^{2}\left( \frac{b}{a + b} \right)^{n - 2}

+
    +
  • case 2: without replacement: then use counting
  • +
+

P(X=x)=(ax)(bnx)(a+bn)=p(x)P(X = x) = \frac{\begin{pmatrix} +a \\ +x +\end{pmatrix}\begin{pmatrix} +b \\ +n - x +\end{pmatrix}}{\begin{pmatrix} +a + b \\ +n +\end{pmatrix}} = p(x)

+

In case 2, we used the elementary counting techniques we are already +familiar with. Immediately we see a distinct case similar to the +binomial but when sampling without replacement. Let’s formalize this as +a random variable!

+

Hypergeometric distribution

+

Let’s introduce a random variable to represent a situation like case 2 +above.

+

Definition.

+

P(X=x)=(ax)(bnx)(a+bn)=p(x)P(X = x) = \frac{\begin{pmatrix} +a \\ +x +\end{pmatrix}\begin{pmatrix} +b \\ +n - x +\end{pmatrix}}{\begin{pmatrix} +a + b \\ +n +\end{pmatrix}} = p(x)

+

is known as a Hypergeometric distribution.

+

Abbreviate this by:

+

X Hypergeom(# total,# successes, sample size)X\sim\text{ Hypergeom}\left( \#\text{ total},\#\text{ successes},\text{ sample size} \right)

+

For example,

+

X Hypergeom(N,Na,n)X\sim\text{ Hypergeom}\left( N,N_{a},n \right)

+

Remark.

+

If xx is very small relative to a+ba + b, then both cases give similar +(approx. the same) answers.

+

For instance, if we’re sampling for blood types from UCSB, and we take a +student out without replacement, we don’t really change the sample size +substantially. So both answers give a similar result.

+

Suppose we have two types of items, type AA and type BB. Let NAN_{A} +be #\# type AA, NBN_{B} #\# type BB. N=NA+NBN = N_{A} + N_{B} is the +total number of objects.

+

We sample nn items without replacement (nNn \leq N) with order not +mattering. Denote by XX the number of type AA objects in our sample.

+

Definition.

+

Let 0NAN0 \leq N_{A} \leq N and 1nN1 \leq n \leq N be integers. A random +variable XX has the hypergeometric distribution with parameters +(N,NA,n)\left( N,N_{A},n \right) if XX takes values in the set +{0,1,,n}\left\{ 0,1,\ldots,n \right\} and has p.m.f.

+

P(X=k)=(NAk)(NNAnk)(Nn)=p(k)P(X = k) = \frac{\begin{pmatrix} +N_{A} \\ +k +\end{pmatrix}\begin{pmatrix} +N - N_{A} \\ +n - k +\end{pmatrix}}{\begin{pmatrix} +N \\ +n +\end{pmatrix}} = p(k)

+

Example.

+

Let NA=10N_{A} = 10 defectives. Let NB=90N_{B} = 90 non-defectives. We select +n=5n = 5 without replacement. What is the probability that 2 of the 5 +selected are defective?

+

X Hypergeom (N=100,NA=10,n=5)X\sim\text{ Hypergeom }\left( N = 100,N_{A} = 10,n = 5 \right)

+

We want P(X=2)P(X = 2).

+

P(X=2)=(102)(903)(1005)0.0702P(X = 2) = \frac{\begin{pmatrix} +10 \\ +2 +\end{pmatrix}\begin{pmatrix} +90 \\ +3 +\end{pmatrix}}{\begin{pmatrix} +100 \\ +5 +\end{pmatrix}} \approx 0.0702

+

Remark.

+

Make sure you can distinguish when a problem is binomial or when it is +hypergeometric. This is very important on exams.

+

Recall that both ask about number of successes, in a fixed number of +trials. But binomial is sample with replacement (each trial is +independent) and sampling without replacement is hypergeometric.

+

Geometric distribution

+

Consider an infinite sequence of independent trials. e.g. number of +attempts until I make a basket.

+

In fact we can think of this as a variation on the binomial +distribution. But in this case we don’t sample nn times and ask how +many successes we have, we sample as many times as we need for one +success. Later on we’ll see this is really a specific case of another +distribution, the negative binomial.

+

Let XiX_{i} denote the outcome of the ithi^{\text{th}} trial, where +success is 1 and failure is 0. Let NN be the number of trials needed to +observe the first success in a sequence of independent trials with +probability of success pp. Then

+

We fail k1k - 1 times and succeed on the kthk^{\text{th}} try. Then:

+

P(N=k)=P(X1=0,X2=0,,Xk1=0,Xk=1)=(1p)k1pP(N = k) = P\left( X_{1} = 0,X_{2} = 0,\ldots,X_{k - 1} = 0,X_{k} = 1 \right) = (1 - p)^{k - 1}p

+

This is the probability of failures raised to the amount of failures, +times probability of success.

+

The key characteristic in these trials, we keep going until we succeed. +There’s no nn choose kk in front like the binomial distribution +because there’s exactly one sequence that gives us success.

+

Definition.

+

Let 0<p10 < p \leq 1. A random variable XX has the geometric distribution +with success parameter pp if the possible values of XX are +{1,2,3,}\left\{ 1,2,3,\ldots \right\} and XX satisfies

+

P(X=k)=(1p)k1pP(X = k) = (1 - p)^{k - 1}p

+

for positive integers kk. Abbreviate this by X Geom(p)X\sim\text{ Geom}(p).

+

Example.

+

What is the probability it takes more than seven rolls of a fair die to +roll a six?

+

Let XX be the number of rolls of a fair die until the first six. Then +X Geom(16)X\sim\text{ Geom}\left( \frac{1}{6} \right). Now we just want +P(X>7)P(X > 7).

+

P(X>7)=k=8P(X=k)=k=8(56)k116P(X > 7) = \sum_{k = 8}^{\infty}P(X = k) = \sum_{k = 8}^{\infty}\left( \frac{5}{6} \right)^{k - 1}\frac{1}{6}

+

Re-indexing,

+

k=8(56)k116=16(56)7j=0(56)j\sum_{k = 8}^{\infty}\left( \frac{5}{6} \right)^{k - 1}\frac{1}{6} = \frac{1}{6}\left( \frac{5}{6} \right)^{7}\sum_{j = 0}^{\infty}\left( \frac{5}{6} \right)^{j}

+

Now we calculate by standard methods:

+

16(56)7j=0(56)j=16(56)71156=(56)7\frac{1}{6}\left( \frac{5}{6} \right)^{7}\sum_{j = 0}^{\infty}\left( \frac{5}{6} \right)^{j} = \frac{1}{6}\left( \frac{5}{6} \right)^{7} \cdot \frac{1}{1 - \frac{5}{6}} = \left( \frac{5}{6} \right)^{7}

+

Negative binomial

+

As promised, here’s the negative binomial.

+

Consider a sequence of Bernoulli trials with the following +characteristics:

+
    +
  • Each trial success or failure

  • +
  • Prob. of success pp is same on each trial

  • +
  • Trials are independent (notice they are not fixed to specific +number)

  • +
  • Experiment continues until rr successes are observed, where rr is +a given parameter

  • +
+

Then if XX is the number of trials necessary until rr successes are +observed, we say XX is a negative binomial random variable.

+

Immediately we see that the geometric distribution is just the negative +binomial with r=1r = 1.

+

Definition.

+

Let k+k \in {\mathbb{Z}}^{+} and 0<p10 < p \leq 1. A random variable XX +has the negative binomial distribution with parameters +{k,p}\left\{ k,p \right\} if the possible values of XX are the integers +{k,k+1,k+2,}\left\{ k,k + 1,k + 2,\ldots \right\} and the p.m.f. is

+

P(X=n)=(n1k1)pk(1p)nk for nkP(X = n) = \begin{pmatrix} +n - 1 \\ +k - 1 +\end{pmatrix}p^{k}(1 - p)^{n - k}\text{ for }n \geq k

+

Abbreviate this by X Negbin(k,p)X\sim\text{ Negbin}(k,p).

+

Example.

+

Steph Curry has a three point percentage of approx. 43%43\%. What is the +probability that Steph makes his third three-point basket on his +5th5^{\text{th}} attempt?

+

Let XX be number of attempts required to observe the 3rd success. Then,

+

X Negbin(k=3,p=0.43)X\sim\text{ Negbin}(k = 3,p = 0.43)

+

So, P(X=5)=(5131)(0.43)3(10.43)53=(42)(0.43)3(0.57)20.155\begin{aligned} +P(X = 5) & = {\begin{pmatrix} +5 - 1 \\ +3 - 1 +\end{pmatrix}(0.43)}^{3}(1 - 0.43)^{5 - 3} \\ + & = \begin{pmatrix} +4 \\ +2 +\end{pmatrix}(0.43)^{3}(0.57)^{2} \\ + & \approx 0.155 +\end{aligned}

+

Poisson distribution

+

This p.m.f. follows from the Taylor expansion

+

eλ=k=0λkk!e^{\lambda} = \sum_{k = 0}^{\infty}\frac{\lambda^{k}}{k!}

+

which implies that

+

k=0eλλkk!=eλeλ=1\sum_{k = 0}^{\infty}e^{- \lambda}\frac{\lambda^{k}}{k!} = e^{- \lambda}e^{\lambda} = 1

+

Definition.

+

For an integer valued random variable XX, we say +X Poisson(λ)X\sim\text{ Poisson}(\lambda) if it has p.m.f.

+

P(X=k)=eλλkk!P(X = k) = e^{- \lambda}\frac{\lambda^{k}}{k!}

+

for k{0,1,2,}k \in \left\{ 0,1,2,\ldots \right\} for λ>0\lambda > 0 and

+

k=0P(X=k)=1\sum_{k = 0}^{\infty}P(X = k) = 1

+

The Poisson arises from the Binomial. It applies in the binomial context +when nn is very large (n100n \geq 100) and pp is very small +p0.05p \leq 0.05, such that npnp is a moderate number (np<10np < 10).

+

Then XX follows a Poisson distribution with λ=np\lambda = np.

+

P(Bin(n,p)=k)P(Poisson(λ=np)=k)P\left( \text{Bin}(n,p) = k \right) \approx P\left( \text{Poisson}(\lambda = np) = k \right)

+

for k=0,1,,nk = 0,1,\ldots,n.

+

The Poisson distribution is useful for finding the probabilities of rare +events over a continuous interval of time. By knowing λ=np\lambda = np for +small nn and pp, we can calculate many probabilities.

+

Example.

+

The number of typing errors in the page of a textbook.

+

Let

+
    +
  • nn be the number of letters of symbols per page (large)

  • +
  • pp be the probability of error, small enough such that

  • +
  • limnlimp0np=λ=0.1\lim\limits_{n \rightarrow \infty}\lim\limits_{p \rightarrow 0}np = \lambda = 0.1

  • +
+

What is the probability of exactly 1 error?

+

We can approximate the distribution of XX with a +Poisson(λ=0.1)\text{Poisson}(\lambda = 0.1) distribution

+

P(X=1)=e0.1(0.1)11!=0.09048P(X = 1) = \frac{e^{- 0.1}(0.1)^{1}}{1!} = 0.09048

+

Continuous distributions

+

All of the distributions we’ve been analyzing have been discrete, that +is, they apply to random variables with a +countable state space. +Even when the state space is infinite, as in the negative binomial, it +is countable. We can think of it as indexing each trial with a natural +number 0,1,2,3,0,1,2,3,\ldots.

+

Now we turn our attention to continuous random variables that operate on +uncountably infinite state spaces. For example, if we sample uniformly +inside of the interval [0,1]\lbrack 0,1\rbrack, there are an uncountably +infinite number of possible values we could obtain. We cannot index +these values by the natural numbers, by some theorems of set theory we +in fact know that the interval [0,1]\lbrack 0,1\rbrack has a bijection to +\mathbb{R} and has cardinality א1א_{1}.

+

Additionally we notice that asking for the probability that we pick a +certain point in the interval [0,1]\lbrack 0,1\rbrack makes no sense, there +are an infinite amount of sample points! Intuitively we should think +that the probability of choosing any particular point is 0. However, we +should be able to make statements about whether we can choose a point +that lies within a subset, like [0,0.5]\lbrack 0,0.5\rbrack.

+

Let’s formalize these ideas.

+

Definition.

+

Let XX be a random variable. If we have a function ff such that

+

P(Xb)=bf(x)dxP(X \leq b) = \int_{- \infty}^{b}f(x)dx for all +bb \in {\mathbb{R}}, then ff is the probability density function +of XX.

+

The probability that the value of XX lies in (,b]( - \infty,b\rbrack +equals the area under the curve of ff from - \infty to bb.

+

If ff satisfies this definition, then for any BB \subset {\mathbb{R}} +for which integration makes sense,

+

P(XB)=Bf(x)dxP(X \in B) = \int_{B}f(x)dx

+

Remark.

+

Recall from our previous discussion of random variables that the PDF is +the analogue of the PMF for discrete random variables.

+

Properties of a CDF:

+

Any CDF F(x)=P(Xx)F(x) = P(X \leq x) satisfies

+
    +
  1. Integrates to unity: F()=0F( - \infty) = 0, F()=1F(\infty) = 1

  2. +
  3. F(x)F(x) is non-decreasing in xx (monotonically increasing)

  4. +
+

s<tF(s)F(t)s < t \Rightarrow F(s) \leq F(t)

+
    +
  1. P(a<Xb)=P(Xb)P(Xa)=F(b)F(a)P(a < X \leq b) = P(X \leq b) - P(X \leq a) = F(b) - F(a)
  2. +
+

Like we mentioned before, we can only ask about things like +P(Xk)P(X \leq k), but not P(X=k)P(X = k). In fact P(X=k)=0P(X = k) = 0 for all kk. +An immediate corollary of this is that we can freely interchange \leq +and << and likewise for \geq and >>, since P(Xk)=P(X<k)P(X \leq k) = P(X < k) +if P(X=k)=0P(X = k) = 0.

+

Example.

+

Let XX be a continuous random variable with density (pdf)

+

f(x)={cx2for 0<x<20otherwise f(x) = \begin{cases} +cx^{2} & \text{for }0 < x < 2 \\ +0 & \text{otherwise } +\end{cases}

+
    +
  1. What is cc?
  2. +
+

cc is such that +1=f(x)dx=02cx2dx1 = \int_{- \infty}^{\infty}f(x)dx = \int_{0}^{2}cx^{2}dx

+
    +
  1. Find the probability that XX is between 1 and 1.4.
  2. +
+

Integrate the curve between 1 and 1.4.

+

11.438x2dx=(x38)|11.4=0.218\begin{array}{r} +\int_{1}^{1.4}\frac{3}{8}x^{2}dx = \left( \frac{x^{3}}{8} \right)|_{1}^{1.4} \\ + = 0.218 +\end{array}

+

This is the probability that XX lies between 1 and 1.4.

+
    +
  1. Find the probability that XX is between 1 and 3.
  2. +
+

Idea: integrate between 1 and 3, be careful after 2.

+

1238x2dx+230dx=\int_{1}^{2}\frac{3}{8}x^{2}dx + \int_{2}^{3}0dx =

+
    +
  1. What is the CDF for P(Xx)P(X \leq x)? Integrate the curve to xx.
  2. +
+

F(x)=P(Xx)=xf(t)dt=0x38t2dt=x38\begin{array}{r} +F(x) = P(X \leq x) = \int_{- \infty}^{x}f(t)dt \\ + = \int_{0}^{x}\frac{3}{8}t^{2}dt \\ + = \frac{x^{3}}{8} +\end{array}

+

Important: include the range!

+

F(x)={0for x0x38for 0<x<21for x2F(x) = \begin{cases} +0 & \text{for }x \leq 0 \\ +\frac{x^{3}}{8} & \text{for }0 < x < 2 \\ +1 & \text{for }x \geq 2 +\end{cases}

+
    +
  1. Find a point aa such that you integrate up to the point to find +exactly 12\frac{1}{2}
  2. +
+

the area.

+

We want to find 12=P(Xa)\frac{1}{2} = P(X \leq a).

+

12=P(Xa)=F(a)=a38a=43\frac{1}{2} = P(X \leq a) = F(a) = \frac{a^{3}}{8} \Rightarrow a = \sqrt[3]{4}

+

Now let us discuss some named continuous distributions.

+

The (continuous) uniform distribution

+

The most simple and the best of the named distributions!

+

Definition.

+

Let [a,b]\lbrack a,b\rbrack be a bounded interval on the real line. A +random variable XX has the uniform distribution on the interval +[a,b]\lbrack a,b\rbrack if XX has the density function

+

f(x)={1bafor x[a,b]0for x[a,b]f(x) = \begin{cases} +\frac{1}{b - a} & \text{for }x \in \lbrack a,b\rbrack \\ +0 & \text{for }x \notin \lbrack a,b\rbrack +\end{cases}

+

Abbreviate this by X Unif [a,b]X\sim\text{ Unif }\lbrack a,b\rbrack.

+

The graph of Unif [a,b]\text{Unif }\lbrack a,b\rbrack is a constant line at +height 1ba\frac{1}{b - a} defined across [a,b]\lbrack a,b\rbrack. The +integral is just the area of a rectangle, and we can check it is 1.

+

Fact.

+

For X Unif [a,b]X\sim\text{ Unif }\lbrack a,b\rbrack, its cumulative distribution +function (CDF) is given by:

+

Fx(x)={0for x<axabafor x[a,b]1for x>bF_{x}(x) = \begin{cases} +0 & \text{for }x < a \\ +\frac{x - a}{b - a} & \text{for }x \in \lbrack a,b\rbrack \\ +1 & \text{for }x > b +\end{cases}

+

Fact.

+

If X Unif [a,b]X\sim\text{ Unif }\lbrack a,b\rbrack, and +[c,d][a,b]\lbrack c,d\rbrack \subset \lbrack a,b\rbrack, then +P(cXd)=cd1badx=dcbaP(c \leq X \leq d) = \int_{c}^{d}\frac{1}{b - a}dx = \frac{d - c}{b - a}

+

Example.

+

Let YY be a uniform random variable on [2,5]\lbrack - 2,5\rbrack. Find the +probability that its absolute value is at least 1.

+

YY takes values in the interval [2,5]\lbrack - 2,5\rbrack, so the absolute +value is at least 1 iff. +Y[2,1][1,5]Y \in \lbrack - 2,1\rbrack \cup \lbrack 1,5\rbrack.

+

The density function of YY is +f(x)=15(2)=17f(x) = \frac{1}{5 - ( - 2)} = \frac{1}{7} on [2,5]\lbrack - 2,5\rbrack +and 0 everywhere else.

+

So,

+

P(|Y|1)=P(Y[2,1][1,5])=P(2Y1)+P(1Y5)=57\begin{aligned} +P\left( |Y| \geq 1 \right) & = P\left( Y \in \lbrack - 2, - 1\rbrack \cup \lbrack 1,5\rbrack \right) \\ + & = P( - 2 \leq Y \leq - 1) + P(1 \leq Y \leq 5) \\ + & = \frac{5}{7} +\end{aligned}

+

The exponential distribution

+

The geometric distribution can be viewed as modeling waiting times, in a +discrete setting, i.e. we wait for n1n - 1 failures to arrive at the +nthn^{\text{th}} success.

+

The exponential distribution is the continuous analogue to the geometric +distribution, in that we often use it to model waiting times in the +continuous sense. For example, the first custom to enter the barber +shop.

+

Definition.

+

Let 0<λ<0 < \lambda < \infty. A random variable XX has the exponential +distribution with parameter λ\lambda if XX has PDF

+

f(x)={λeλxfor x00for x<0f(x) = \begin{cases} +\lambda e^{- \lambda x} & \text{for }x \geq 0 \\ +0 & \text{for }x < 0 +\end{cases}

+

Abbreviate this by X Exp(λ)X\sim\text{ Exp}(\lambda), the exponential +distribution with rate λ\lambda.

+

The CDF of the Exp(λ)\text{Exp}(\lambda) distribution is given by:

+

F(t)+{0if t<01eλtif t0F(t) + \begin{cases} +0 & \text{if }t < 0 \\ +1 - e^{- \lambda t} & \text{if }t \geq 0 +\end{cases}

+

Example.

+

Suppose the length of a phone call, in minutes, is well modeled by an +exponential random variable with a rate λ=110\lambda = \frac{1}{10}.

+
    +
  1. What is the probability that a call takes more than 8 minutes?

  2. +
  3. What is the probability that a call takes between 8 and 22 minutes?

  4. +
+

Let XX be the length of the phone call, so that +X Exp(110)X\sim\text{ Exp}\left( \frac{1}{10} \right). Then we can find the +desired probability by:

+

P(X>8)=1P(X8)=1Fx(8)=1(1e(110)8)=e8100.4493\begin{aligned} +P(X > 8) & = 1 - P(X \leq 8) \\ + & = 1 - F_{x}(8) \\ + & = 1 - \left( 1 - e^{- \left( \frac{1}{10} \right) \cdot 8} \right) \\ + & = e^{- \frac{8}{10}} \approx 0.4493 +\end{aligned}

+

Now to find P(8<X<22)P(8 < X < 22), we can take the difference in CDFs:

+

P(X>8)P(X22)=e810e22100.3385\begin{aligned} + & P(X > 8) - P(X \geq 22) \\ + & = e^{- \frac{8}{10}} - e^{- \frac{22}{10}} \\ + & \approx 0.3385 +\end{aligned}

+

Fact (Memoryless property of the exponential distribution).

+

Suppose that X Exp(λ)X\sim\text{ Exp}(\lambda). Then for any s,t>0s,t > 0, we +have P(X>t+s|X>t)=P(X>s)P\left( X > t + s~|~X > t \right) = P(X > s)

+

This is like saying if I’ve been waiting 5 minutes and then 3 minutes +for the bus, what is the probability that I’m gonna wait more than 5 + 3 +minutes, given that I’ve already waited 5 minutes? And that’s precisely +equal to just the probability I’m gonna wait more than 3 minutes.

+

Proof.

+

P(X>t+s|X>t)=P(X>t+sX>t)P(X>t)=P(X>t+s)P(X>t)=eλ(t+s)eλt=eλsP(X>s)\begin{array}{r} +P\left( X > t + s~|~X > t \right) = \frac{P(X > t + s \cap X > t)}{P(X > t)} \\ + = \frac{P(X > t + s)}{P(X > t)} = \frac{e^{- \lambda(t + s)}}{e^{- \lambda t}} = e^{- \lambda s} \\ + \equiv P(X > s) +\end{array}

+

Gamma distribution

+

Definition.

+

Let r,λ>0r,\lambda > 0. A random variable XX has the gamma +distribution with parameters (r,λ)(r,\lambda) if XX is nonnegative and +has probability density function

+

f(x)={λrxr2Γ(r)eλxfor x00for x<0f(x) = \begin{cases} +\frac{\lambda^{r}x^{r - 2}}{\Gamma(r)}e^{- \lambda x} & \text{for }x \geq 0 \\ +0 & \text{for }x < 0 +\end{cases}

+

Abbreviate this by X Gamma(r,λ)X\sim\text{ Gamma}(r,\lambda).

+

The gamma function Γ(r)\Gamma(r) generalizes the factorial function and is +defined as

+

Γ(r)=0xr1exdx, for r>0\Gamma(r) = \int_{0}^{\infty}x^{r - 1}e^{- x}dx,\text{ for }r > 0

+

Special case: Γ(n)=(n1)!\Gamma(n) = (n - 1)! if n+n \in {\mathbb{Z}}^{+}.

+

Remark.

+

The Exp(λ)\text{Exp}(\lambda) distribution is a special case of the gamma +distribution, with parameter r=1r = 1.

+

The normal distribution

+

Also known as the Gaussian distribution, this is so important it gets +its own section.

+

Definition.

+

A random variable ZZ has the standard normal distribution if ZZ +has density function

+

φ(x)=12πex22\varphi(x) = \frac{1}{\sqrt{2\pi}}e^{- \frac{x^{2}}{2}} on the real +line. Abbreviate this by ZN(0,1)Z\sim N(0,1).

+

Fact (CDF of a standard normal random variable).

+

Let ZN(0,1)Z\sim N(0,1) be normally distributed. Then its CDF is given by +Φ(x)=xφ(s)ds=x12πe(s2)2ds\Phi(x) = \int_{- \infty}^{x}\varphi(s)ds = \int_{- \infty}^{x}\frac{1}{\sqrt{2\pi}}e^{\frac{- \left( - s^{2} \right)}{2}}ds

+

The normal distribution is so important, instead of the standard +fZ(x)f_{Z(x)} and Fz(x)F_{z(x)}, we use the special φ(x)\varphi(x) and +Φ(x)\Phi(x).

+

Fact.

+

es22ds=2π\int_{- \infty}^{\infty}e^{- \frac{s^{2}}{2}}ds = \sqrt{2\pi}

+

No closed form of the standard normal CDF Φ\Phi exists, so we are left +to either:

+
    +
  • approximate

  • +
  • use technology (calculator)

  • +
  • use the standard normal probability table in the textbook

  • +
+

To evaluate negative values, we can use the symmetry of the normal +distribution to apply the following identity:

+

Φ(x)=1Φ(x)\Phi( - x) = 1 - \Phi(x)

+

General normal distributions

+

We can compute any other parameters of the normal distribution using the +standard normal.

+

The general family of normal distributions is obtained by linear or +affine transformations of ZZ. Let μ\mu be real, and σ>0\sigma > 0, then

+

X=σZ+μX = \sigma Z + \mu is also a normally distributed random variable +with parameters (μ,σ2)\left( \mu,\sigma^{2} \right). The CDF of XX in terms +of Φ()\Phi( \cdot ) can be expressed as

+

FX(x)=P(Xx)=P(σZ+μx)=P(Zxμσ)=Φ(xμσ)\begin{aligned} +F_{X}(x) & = P(X \leq x) \\ + & = P(\sigma Z + \mu \leq x) \\ + & = P\left( Z \leq \frac{x - \mu}{\sigma} \right) \\ + & = \Phi(\frac{x - \mu}{\sigma}) +\end{aligned}

+

Also,

+

f(x)=F(x)=ddx[Φ(xuσ)]=1σφ(xuσ)=12πσ2e((xμ)2)2σ2f(x) = F\prime(x) = \frac{d}{dx}\left\lbrack \Phi(\frac{x - u}{\sigma}) \right\rbrack = \frac{1}{\sigma}\varphi(\frac{x - u}{\sigma}) = \frac{1}{\sqrt{2\pi\sigma^{2}}}e^{\frac{- \left( (x - \mu)^{2} \right)}{2\sigma^{2}}}

+

Definition.

+

Let μ\mu be real and σ>0\sigma > 0. A random variable XX has the +normal distribution with mean μ\mu and variance σ2\sigma^{2} if XX +has density function

+

f(x)=12πσ2e((xμ)2)2σ2f(x) = \frac{1}{\sqrt{2\pi\sigma^{2}}}e^{\frac{- \left( (x - \mu)^{2} \right)}{2\sigma^{2}}}

+

on the real line. Abbreviate this by +XN(μ,σ2)X\sim N\left( \mu,\sigma^{2} \right).

+

Fact.

+

Let XN(μ,σ2)X\sim N\left( \mu,\sigma^{2} \right) and Y=aX+bY = aX + b. Then +YN(aμ+b,a2σ2)Y\sim N\left( a\mu + b,a^{2}\sigma^{2} \right)

+

That is, YY is normally distributed with parameters +(aμ+b,a2σ2)\left( a\mu + b,a^{2}\sigma^{2} \right). In particular, +Z=XμσN(0,1)Z = \frac{X - \mu}{\sigma}\sim N(0,1) is a standard normal variable.

+

Expectation

+

Let’s discuss the expectation of a random variable, which is a similar +idea to the basic concept of mean.

+

Definition.

+

The expectation or mean of a discrete random variable XX is the +weighted average, with weights assigned by the corresponding +probabilities.

+

E(X)=all xixip(xi)E(X) = \sum_{\text{all }x_{i}}x_{i} \cdot p\left( x_{i} \right)

+

Example.

+

Find the expected value of a single roll of a fair die.

+
    +
  • X= score dotsX = \frac{\text{ score }}{\text{ dots}}

  • +
  • x=1,2,3,4,5,6x = 1,2,3,4,5,6

  • +
  • p(x)=16,16,16,16,16,16p(x) = \frac{1}{6},\frac{1}{6},\frac{1}{6},\frac{1}{6},\frac{1}{6},\frac{1}{6}

  • +
+

E[x]=116+216+616E\lbrack x\rbrack = 1 \cdot \frac{1}{6} + 2 \cdot \frac{1}{6}\ldots + 6 \cdot \frac{1}{6}

+

Binomial expected value

+

E[x]=npE\lbrack x\rbrack = np

+

Bernoulli expected value

+

Bernoulli is just binomial with one trial.

+

Recall that P(X=1)=pP(X = 1) = p and P(X=0)=1pP(X = 0) = 1 - p.

+

E[X]=1P(X=1)+0P(X=0)=pE\lbrack X\rbrack = 1 \cdot P(X = 1) + 0 \cdot P(X = 0) = p

+

Let AA be an event on Ω\Omega. Its indicator random variable IAI_{A} +is defined for ωΩ\omega \in \Omega by

+

IA(ω)={1, if ωA0, if ωAI_{A}(\omega) = \begin{cases} +1\text{, if } & \omega \in A \\ +0\text{, if } & \omega \notin A +\end{cases}

+

E[IA]=1P(A)=P(A)E\left\lbrack I_{A} \right\rbrack = 1 \cdot P(A) = P(A)

+

Geometric expected value

+

Let p[0,1]p \in \lbrack 0,1\rbrack and X Geom[p]X\sim\text{ Geom}\lbrack p\rbrack +be a geometric RV with probability of success pp. Recall that the +p.m.f. is pqk1pq^{k - 1}, where prob. of failure is defined by +q1pq ≔ 1 - p.

+

Then

+

E[X]=k=1kpqk1=pk=1kqk1\begin{aligned} +E\lbrack X\rbrack & = \sum_{k = 1}^{\infty}kpq^{k - 1} \\ + & = p \cdot \sum_{k = 1}^{\infty}k \cdot q^{k - 1} +\end{aligned}

+

Now recall from calculus that you can differentiate a power series term +by term inside its radius of convergence. So for |t|<1|t| < 1,

+

k=1ktk1=k=1ddttk=ddtk=1tk=ddt(11t)=1(1t)2E[x]=k=1kpqk1=pk=1kqk1=p(1(1q)2)=1p\begin{array}{r} +\sum_{k = 1}^{\infty}kt^{k - 1} = \sum_{k = 1}^{\infty}\frac{d}{dt}t^{k} = \frac{d}{dt}\sum_{k = 1}^{\infty}t^{k} = \frac{d}{dt}\left( \frac{1}{1 - t} \right) = \frac{1}{(1 - t)^{2}} \\ +\therefore E\lbrack x\rbrack = \sum_{k = 1}^{\infty}kpq^{k - 1} = p\sum_{k = 1}^{\infty}kq^{k - 1} = p\left( \frac{1}{(1 - q)^{2}} \right) = \frac{1}{p} +\end{array}

+

Expected value of a continuous RV

+

Definition.

+

The expectation or mean of a continuous random variable XX with density +function ff is

+

E[x]=xf(x)dxE\lbrack x\rbrack = \int_{- \infty}^{\infty}x \cdot f(x)dx

+

An alternative symbol is μ=E[x]\mu = E\lbrack x\rbrack.

+

μ\mu is the “first moment” of XX, analogous to physics, it’s the +“center of gravity” of XX.

+

Remark.

+

In general when moving between discrete and continuous RV, replace sums +with integrals, p.m.f. with p.d.f., and vice versa.

+

Example.

+

Suppose XX is a continuous RV with p.d.f.

+

fX(x)={2x, 0<x<10, elsewheref_{X}(x) = \begin{cases} +2x\text{, } & 0 < x < 1 \\ +0\text{, } & \text{elsewhere} +\end{cases}

+

E[X]=xf(x)dx=01x2xdx=23E\lbrack X\rbrack = \int_{- \infty}^{\infty}x \cdot f(x)dx = \int_{0}^{1}x \cdot 2xdx = \frac{2}{3}

+

Example (Uniform expectation).

+

Let XX be a uniform random variable on the interval +[a,b]\lbrack a,b\rbrack with X Unif[a,b]X\sim\text{ Unif}\lbrack a,b\rbrack. Find +the expected value of XX.

+

E[X]=xf(x)dx=abxbadx=1baabxdx=1bab2a22=b+a2 midpoint formula\begin{array}{r} +E\lbrack X\rbrack = \int_{- \infty}^{\infty}x \cdot f(x)dx = \int_{a}^{b}\frac{x}{b - a}dx \\ + = \frac{1}{b - a}\int_{a}^{b}xdx = \frac{1}{b - a} \cdot \frac{b^{2} - a^{2}}{2} = \underset{\text{ midpoint formula}}{\underbrace{\frac{b + a}{2}}} +\end{array}

+

Example (Exponential expectation).

+

Find the expected value of an exponential RV, with p.d.f.

+

fX(x)={λeλx, x>00, elsewheref_{X}(x) = \begin{cases} +\lambda e^{- \lambda x}\text{, } & x > 0 \\ +0\text{, } & \text{elsewhere} +\end{cases}

+

E[x]=xf(x)dx=0xλeλxdx=λ0xeλxdx=λ[x1λeλx|x=0x=01λeλxdx]=1λ\begin{array}{r} +E\lbrack x\rbrack = \int_{- \infty}^{\infty}x \cdot f(x)dx = \int_{0}^{\infty}x \cdot \lambda e^{- \lambda x}dx \\ + = \lambda \cdot \int_{0}^{\infty}x \cdot e^{- \lambda x}dx \\ + = \lambda \cdot \left\lbrack \left. -x\frac{1}{\lambda}e^{- \lambda x} \right|_{x = 0}^{x = \infty} - \int_{0}^{\infty} - \frac{1}{\lambda}e^{- \lambda x}dx \right\rbrack \\ + = \frac{1}{\lambda} +\end{array}

+

Example (Uniform dartboard).

+

Our dartboard is a disk of radius r0r_{0} and the dart lands uniformly +at random on the disk when thrown. Let RR be the distance of the dart +from the center of the disk. Find E[R]E\lbrack R\rbrack given density +function

+

fR(t)={2tr02, 0tr00, t<0 or t>r0f_{R}(t) = \begin{cases} +\frac{2t}{r_{0}^{2}}\text{, } & 0 \leq t \leq r_{0} \\ +0\text{, } & t < 0\text{ or }t > r_{0} +\end{cases}

+

E[R]=tfR(t)dt=0r0t2tr02dt=23r0\begin{array}{r} +E\lbrack R\rbrack = \int_{- \infty}^{\infty}tf_{R}(t)dt \\ + = \int_{0}^{r_{0}}t \cdot \frac{2t}{r_{0}^{2}}dt \\ + = \frac{2}{3}r_{0} +\end{array}

+

Expectation of derived values

+

If we can find the expected value of XX, can we find the expected value +of X2X^{2}? More precisely, can we find +E[X2]E\left\lbrack X^{2} \right\rbrack?

+

If the distribution is easy to see, then this is trivial. Otherwise we +have the following useful property:

+

E[X2]=all xx2fX(x)dxE\left\lbrack X^{2} \right\rbrack = \int_{\text{all }x}x^{2}f_{X}(x)dx

+

(for continuous RVs).

+

And in the discrete case,

+

E[X2]=all xx2pX(x)E\left\lbrack X^{2} \right\rbrack = \sum_{\text{all }x}x^{2}p_{X}(x)

+

In fact E[X2]E\left\lbrack X^{2} \right\rbrack is so important that we call +it the mean square.

+

Fact.

+

More generally, a real valued function g(X)g(X) defined on the range of +XX is itself a random variable (with its own distribution).

+

We can find expected value of g(X)g(X) by

+

E[g(x)]=g(x)f(x)dxE\left\lbrack g(x) \right\rbrack = \int_{- \infty}^{\infty}g(x)f(x)dx

+

or

+

E[g(x)]=all xg(x)f(x)E\left\lbrack g(x) \right\rbrack = \sum_{\text{all }x}g(x)f(x)

+

Example.

+

You roll a fair die to determine the winnings (or losses) WW of a +player as follows:

+

W={1,iftherollis1,2,or31,iftherollisa43,iftherollis5or6W = \begin{cases} + - 1,\ if\ the\ roll\ is\ 1,\ 2,\ or\ 3 \\ +1,\ if\ the\ roll\ is\ a\ 4 \\ +3,\ if\ the\ roll\ is\ 5\ or\ 6 +\end{cases}

+

What is the expected winnings/losses for the player during 1 roll of the +die?

+

Let XX denote the outcome of the roll of the die. Then we can define +our random variable as W=g(X)W = g(X) where the function gg is defined by +g(1)=g(2)=g(3)=1g(1) = g(2) = g(3) = - 1 and so on.

+

Note that P(W=1)=P(X=1X=2X=3)=12P(W = - 1) = P(X = 1 \cup X = 2 \cup X = 3) = \frac{1}{2}. +Likewise P(W=1)=P(X=4)=16P(W = 1) = P(X = 4) = \frac{1}{6}, and +P(W=3)=P(X=5X=6)=13P(W = 3) = P(X = 5 \cup X = 6) = \frac{1}{3}.

+

Then E[g(X)]=E[W]=(1)P(W=1)+(1)P(W=1)+(3)P(W=3)=12+16+1=23\begin{array}{r} +E\left\lbrack g(X) \right\rbrack = E\lbrack W\rbrack = ( - 1) \cdot P(W = - 1) + (1) \cdot P(W = 1) + (3) \cdot P(W = 3) \\ + = - \frac{1}{2} + \frac{1}{6} + 1 = \frac{2}{3} +\end{array}

+

Example.

+

A stick of length ll is broken at a uniformly chosen random location. +What is the expected length of the longer piece?

+

Idea: if you break it before the halfway point, then the longer piece +has length given by lxl - x. If you break it after the halfway point, +the longer piece has length xx.

+

Let the interval [0,l]\lbrack 0,l\rbrack represent the stick and let +X Unif[0,l]X\sim\text{ Unif}\lbrack 0,l\rbrack be the location where the stick is +broken. Then XX has density f(x)=1lf(x) = \frac{1}{l} on +[0,l]\lbrack 0,l\rbrack and 0 elsewhere.

+

Let g(x)g(x) be the length of the longer piece when the stick is broken at +xx,

+

g(x)={1x, 0x<l2x, 12xlg(x) = \begin{cases} +1 - x\text{, } & 0 \leq x < \frac{l}{2} \\ +x\text{, } & \frac{1}{2} \leq x \leq l +\end{cases}

+

Then E[g(X)]=g(x)f(x)dx=0l2lxldx+l2lxldx=34l\begin{array}{r} +E\left\lbrack g(X) \right\rbrack = \int_{- \infty}^{\infty}g(x)f(x)dx = \int_{0}^{\frac{l}{2}}\frac{l - x}{l}dx + \int_{\frac{l}{2}}^{l}\frac{x}{l}dx \\ + = \frac{3}{4}l +\end{array}

+

So we expect the longer piece to be 34\frac{3}{4} of the total length, +which is a bit pathological.

+

Moments of a random variable

+

We continue discussing expectation but we introduce new terminology.

+

Fact.

+

The nthn^{\text{th}} moment (or nthn^{\text{th}} raw moment) of a discrete +random variable XX with p.m.f. pX(x)p_{X}(x) is the expectation

+

E[Xn]=kknpX(k)=μnE\left\lbrack X^{n} \right\rbrack = \sum_{k}k^{n}p_{X}(k) = \mu_{n}

+

If XX is continuous, then we have analogously

+

E[Xn]=xnfX(x)=μnE\left\lbrack X^{n} \right\rbrack = \int_{- \infty}^{\infty}x^{n}f_{X}(x) = \mu_{n}

+

The deviation is given by σ\sigma and the variance is given by +σ2\sigma^{2} and

+

σ2=μ2(μ1)2\sigma^{2} = \mu_{2} - \left( \mu_{1} \right)^{2}

+

μ3\mu_{3} is used to measure “skewness” / asymmetry of a distribution. +For example, the normal distribution is very symmetric.

+

μ4\mu_{4} is used to measure kurtosis/peakedness of a distribution.

+

Central moments

+

Previously we discussed “raw moments.” Be careful not to confuse them +with central moments.

+

Fact.

+

The nthn^{\text{th}} central moment of a discrete random variable XX +with p.m.f. pX(x)p_{X}(x) is the expected value of the difference about the +mean raised to the nthn^{\text{th}} power

+

E[(Xμ)n]=k(kμ)npX(k)=μnE\left\lbrack (X - \mu)^{n} \right\rbrack = \sum_{k}(k - \mu)^{n}p_{X}(k) = \mu\prime_{n}

+

And of course in the continuous case,

+

E[(Xμ)n]=(xμ)nfX(x)=μnE\left\lbrack (X - \mu)^{n} \right\rbrack = \int_{- \infty}^{\infty}(x - \mu)^{n}f_{X}(x) = \mu\prime_{n}

+

In particular,

+

μ1=E[(Xμ)1]=(xμ)1fX(x)dx=xfX(x)dx=μfX(x)dx=μμ1=0μ2=E[(Xμ)2]=σX2= Var(X)\begin{array}{r} +\mu\prime_{1} = E\left\lbrack (X - \mu)^{1} \right\rbrack = \int_{- \infty}^{\infty}(x - \mu)^{1}f_{X}(x)dx \\ + = \int_{\infty}^{\infty}xf_{X}(x)dx = \int_{- \infty}^{\infty}\mu f_{X}(x)dx = \mu - \mu \cdot 1 = 0 \\ +\mu\prime_{2} = E\left\lbrack (X - \mu)^{2} \right\rbrack = \sigma_{X}^{2} = \text{ Var}(X) +\end{array}

+

Example.

+

Let YY be a uniformly chosen integer from +{0,1,2,,m}\left\{ 0,1,2,\ldots,m \right\}. Find the first and second moment of +YY.

+

The p.m.f. of YY is pY(k)=1m+1p_{Y}(k) = \frac{1}{m + 1} for +k[0,m]k \in \lbrack 0,m\rbrack. Thus,

+

E[Y]=k=0mk1m+1=1m+1k=0mk=m2\begin{array}{r} +E\lbrack Y\rbrack = \sum_{k = 0}^{m}k\frac{1}{m + 1} = \frac{1}{m + 1}\sum_{k = 0}^{m}k \\ + = \frac{m}{2} +\end{array}

+

Then,

+

E[Y2]=k=0mk21m+1=1m+1=m(2m+1)6E\left\lbrack Y^{2} \right\rbrack = \sum_{k = 0}^{m}k^{2}\frac{1}{m + 1} = \frac{1}{m + 1} = \frac{m(2m + 1)}{6}

+

Example.

+

Let c>0c > 0 and let UU be a uniform random variable on the interval +[0,c]\lbrack 0,c\rbrack. Find the nthn^{\text{th}} moment for UU for all +positive integers nn.

+

The density function of UU is

+

f(x)={1c, if x[0,c]0, otherwisef(x) = \begin{cases} +\frac{1}{c}\text{, if } & x \in \lbrack 0,c\rbrack \\ +0\text{, } & \text{otherwise} +\end{cases}

+

Therefore the nthn^{\text{th}} moment of UU is,

+

E[Un]=xnf(x)dxE\left\lbrack U^{n} \right\rbrack = \int_{- \infty}^{\infty}x^{n}f(x)dx

+

Example.

+

Suppose the random variable X Exp(λ)X\sim\text{ Exp}(\lambda). Find the second +moment of XX.

+

E[X2]=0x2λeλxdx=1λ20u2eudu=1λ2Γ(2+1)=2!λ2\begin{array}{r} +E\left\lbrack X^{2} \right\rbrack = \int_{0}^{\infty}x^{2}\lambda e^{- \lambda x}dx \\ + = \frac{1}{\lambda^{2}}\int_{0}^{\infty}u^{2}e^{- u}du \\ + = \frac{1}{\lambda^{2}}\Gamma(2 + 1) = \frac{2!}{\lambda^{2}} +\end{array}

+

Fact.

+

In general, to find teh nthn^{\text{th}} moment of +X Exp(λ)X\sim\text{ Exp}(\lambda), +E[Xn]=0xnλeλxdx=n!λnE\left\lbrack X^{n} \right\rbrack = \int_{0}^{\infty}x^{n}\lambda e^{- \lambda x}dx = \frac{n!}{\lambda^{n}}

+

Median and quartiles

+

When a random variable has rare (abnormal) values, its expectation may +be a bad indicator of where the center of the distribution lies.

+

Definition.

+

The median of a random variable XX is any real value mm that +satisfies

+

P(Xm)12 and P(Xm)12P(X \geq m) \geq \frac{1}{2}\text{ and }P(X \leq m) \geq \frac{1}{2}

+

With half the probability on both {Xm}\left\{ X \leq m \right\} and +{Xm}\left\{ X \geq m \right\}, the median is representative of the +midpoint of the distribution. We say that the median is more robust +because it is less affected by outliers. It is not necessarily unique.

+

Example.

+

Let XX be discretely uniformly distributed in the set +{100,1,2,,3,,9}\left\{ - 100,1,2,,3,\ldots,9 \right\} so XX has probability mass +function pX(100)=pX(1)==pX(9)p_{X}( - 100) = p_{X}(1) = \cdots = p_{X}(9)

+

Find the expected value and median of XX.

+

E[X]=(100)110+(1)110++(9)110=5.5E\lbrack X\rbrack = ( - 100) \cdot \frac{1}{10} + (1) \cdot \frac{1}{10} + \cdots + (9) \cdot \frac{1}{10} = - 5.5

+

While the median is any number m[4,5]m \in \lbrack 4,5\rbrack.

+

The median reflects the fact that 90% of the values and probability is +in the range 1,2,,91,2,\ldots,9 while the mean is heavily influenced by the +100- 100 value.

+ +]]>
+ Sun, 16 Feb 2025 00:00:00 UT + https://blog.youwen.dev/random-variables-distributions-and-probability-theory.html + Youwen Wu +
+ + An assortment of preliminaries on linear algebra + https://blog.youwen.dev/an-assortment-of-preliminaries-on-linear-algebra.html + +
+

+ An assortment of preliminaries on linear algebra +

+

+ and also a test for pandoc +

+
2025-02-15
+
+ +
+
+

This entire document was written entirely in Typst and +directly translated to this file by Pandoc. It serves as a proof of concept of +a way to do static site generation from Typst files instead of Markdown.

+
+

I figured I should write this stuff down before I forgot it.

+

Basic Notions

+

Vector spaces

+

Before we can understand vectors, we need to first discuss vector +spaces. Thus far, you have likely encountered vectors primarily in +physics classes, generally in the two-dimensional plane. You may +conceptualize them as arrows in space. For vectors of size >3> 3, a hand +waving argument is made that they are essentially just arrows in higher +dimensional spaces.

+

It is helpful to take a step back from this primitive geometric +understanding of the vector. Let us build up a rigorous idea of vectors +from first principles.

+

Vector axioms

+

The so-called axioms of a vector space (which we’ll call the vector +space VV) are as follows:

+
    +
  1. Commutativity: u+v=v+u, u,vVu + v = v + u,\text{ }\forall u,v \in V

  2. +
  3. Associativity: +(u+v)+w=u+(v+w), u,v,wV(u + v) + w = u + (v + w),\text{ }\forall u,v,w \in V

  4. +
  5. Zero vector: \exists a special vector, denoted 00, such that +v+0=v, vVv + 0 = v,\text{ }\forall v \in V

  6. +
  7. Additive inverse: +vV, wV such that v+w=0\forall v \in V,\text{ }\exists w \in V\text{ such that }v + w = 0. +Such an additive inverse is generally denoted v- v

  8. +
  9. Multiplicative identity: 1v=v, vV1v = v,\text{ }\forall v \in V

  10. +
  11. Multiplicative associativity: +(αβ)v=α(βv) vV, scalars α,β(\alpha\beta)v = \alpha(\beta v)\text{ }\forall v \in V,\text{ scalars }\alpha,\beta

  12. +
  13. Distributive property for vectors: +α(u+v)=αu+αv u,vV, scalars α\alpha(u + v) = \alpha u + \alpha v\text{ }\forall u,v \in V,\text{ scalars }\alpha

  14. +
  15. Distributive property for scalars: +(α+β)v=αv+βv vV, scalars α,β(\alpha + \beta)v = \alpha v + \beta v\text{ }\forall v \in V,\text{ scalars }\alpha,\beta

  16. +
+

It is easy to show that the zero vector 00 and the additive inverse +v- v are unique. We leave the proof of this fact as an exercise.

+

These may seem difficult to memorize, but they are essentially the same +familiar algebraic properties of numbers you know from high school. The +important thing to remember is which operations are valid for what +objects. For example, you cannot add a vector and scalar, as it does not +make sense.

+

Remark. For those of you versed in computer science, you may recognize +this as essentially saying that you must ensure your operations are +type-safe. Adding a vector and scalar is not just “wrong” in the same +sense that 1+1=31 + 1 = 3 is wrong, it is an invalid question entirely +because vectors and scalars and different types of mathematical objects. +See [@chen2024digression] for more.

+

Vectors big and small

+

In order to begin your descent into what mathematicians colloquially +recognize as abstract vapid nonsense, let’s discuss which fields +constitute a vector space. We have the familiar field of \mathbb{R} +where all scalars are real numbers, with corresponding vector spaces +n{\mathbb{R}}^{n}, where nn is the length of the vector. We generally +discuss 2D or 3D vectors, corresponding to vectors of length 2 or 3; in +our case, 2{\mathbb{R}}^{2} and 3{\mathbb{R}}^{3}.

+

However, vectors in n{\mathbb{R}}^{n} can really be of any length. +Vectors can be viewed as arbitrary length lists of numbers (for the +computer science folk: think C++ std::vector).

+

Example. (123456789)9\begin{pmatrix} +1 \\ +2 \\ +3 \\ +4 \\ +5 \\ +6 \\ +7 \\ +8 \\ +9 +\end{pmatrix} \in {\mathbb{R}}^{9}

+

Keep in mind that vectors need not be in n{\mathbb{R}}^{n} at all. +Recall that a vector space need only satisfy the aforementioned axioms +of a vector space.

+

Example. The vector space n{\mathbb{C}}^{n} is similar to +n{\mathbb{R}}^{n}, except it includes complex numbers. All complex +vector spaces are real vector spaces (as you can simply restrict them to +only use the real numbers), but not the other way around.

+

From now on, let us refer to vector spaces n{\mathbb{R}}^{n} and +n{\mathbb{C}}^{n} as 𝔽n{\mathbb{F}}^{n}.

+

In general, we can have a vector space where the scalars are in an +arbitrary field, as long as the axioms are satisfied.

+

Example. The vector space of all polynomials of at most degree 3, or +3{\mathbb{P}}^{3}. It is not yet clear what this vector may look like. +We shall return to this example once we discuss basis.

+

Vector addition. Multiplication

+

Vector addition, represented by ++ can be done entrywise.

+

Example.

+

(123)+(456)=(1+42+53+6)=(579)\begin{pmatrix} +1 \\ +2 \\ +3 +\end{pmatrix} + \begin{pmatrix} +4 \\ +5 \\ +6 +\end{pmatrix} = \begin{pmatrix} +1 + 4 \\ +2 + 5 \\ +3 + 6 +\end{pmatrix} = \begin{pmatrix} +5 \\ +7 \\ +9 +\end{pmatrix} (123)(456)=(142536)=(41018)\begin{pmatrix} +1 \\ +2 \\ +3 +\end{pmatrix} \cdot \begin{pmatrix} +4 \\ +5 \\ +6 +\end{pmatrix} = \begin{pmatrix} +1 \cdot 4 \\ +2 \cdot 5 \\ +3 \cdot 6 +\end{pmatrix} = \begin{pmatrix} +4 \\ +10 \\ +18 +\end{pmatrix}

+

This is simple enough to understand. Again, the difficulty is simply +ensuring that you always perform operations with the correct types. +For example, once we introduce matrices, it doesn’t make sense to +multiply or add vectors and matrices in this fashion.

+

Vector-scalar multiplication

+

Multiplying a vector by a scalar simply results in each entry of the +vector being multiplied by the scalar.

+

Example.

+

β(abc)=(βaβbβc)\beta\begin{pmatrix} +a \\ +b \\ +c +\end{pmatrix} = \begin{pmatrix} +\beta \cdot a \\ +\beta \cdot b \\ +\beta \cdot c +\end{pmatrix}

+

Linear combinations

+

Given vector spaces VV and WW and vectors vVv \in V and wWw \in W, +v+wv + w is the linear combination of vv and ww.

+

Spanning systems

+

We say that a set of vectors v1,v2,,vnVv_{1},v_{2},\ldots,v_{n} \in V span VV +if the linear combination of the vectors can represent any arbitrary +vector vVv \in V.

+

Precisely, given scalars α1,α2,,αn\alpha_{1},\alpha_{2},\ldots,\alpha_{n},

+

α1v1+α2v2++αnvn=v,vV\alpha_{1}v_{1} + \alpha_{2}v_{2} + \ldots + \alpha_{n}v_{n} = v,\forall v \in V

+

Note that any scalar αk\alpha_{k} could be 0. Therefore, it is possible +for a subset of a spanning system to also be a spanning system. The +proof of this fact is left as an exercise.

+

Intuition for linear independence and dependence

+

We say that vv and ww are linearly independent if vv cannot be +represented by the scaling of ww, and ww cannot be represented by the +scaling of vv. Otherwise, they are linearly dependent.

+

You may intuitively visualize linear dependence in the 2D plane as two +vectors both pointing in the same direction. Clearly, scaling one vector +will allow us to reach the other vector. Linear independence is +therefore two vectors pointing in different directions.

+

Of course, this definition applies to vectors in any 𝔽n{\mathbb{F}}^{n}.

+

Formal definition of linear dependence and independence

+

Let us formally define linear independence for arbitrary vectors in +𝔽n{\mathbb{F}}^{n}. Given a set of vectors

+

v1,v2,,vnVv_{1},v_{2},\ldots,v_{n} \in V

+

we say they are linearly independent iff. the equation

+

α1v1+α2v2++αnvn=0\alpha_{1}v_{1} + \alpha_{2}v_{2} + \ldots + \alpha_{n}v_{n} = 0

+

has only a unique set of solutions +α1,α2,,αn\alpha_{1},\alpha_{2},\ldots,\alpha_{n} such that all αn\alpha_{n} are +zero.

+

Equivalently,

+

|α1|+|α2|++|αn|=0\left| \alpha_{1} \right| + \left| \alpha_{2} \right| + \ldots + \left| \alpha_{n} \right| = 0

+

More precisely,

+

i=1k|αi|=0\sum_{i = 1}^{k}\left| \alpha_{i} \right| = 0

+

Therefore, a set of vectors v1,v2,,vmv_{1},v_{2},\ldots,v_{m} is linearly +dependent if the opposite is true, that is there exists solution +α1,α2,,αm\alpha_{1},\alpha_{2},\ldots,\alpha_{m} to the equation

+

α1v1+α2v2++αmvm=0\alpha_{1}v_{1} + \alpha_{2}v_{2} + \ldots + \alpha_{m}v_{m} = 0

+

such that

+

i=1k|αi|0\sum_{i = 1}^{k}\left| \alpha_{i} \right| \neq 0

+

Basis

+

We say a system of vectors v1,v2,,vnVv_{1},v_{2},\ldots,v_{n} \in V is a basis +in VV if the system is both linearly independent and spanning. That is, +the system must be able to represent any vector in VV as well as +satisfy our requirements for linear independence.

+

Equivalently, we may say that a system of vectors in VV is a basis in +VV if any vector vVv \in V admits a unique representation as a linear +combination of vectors in the system. This is equivalent to our previous +statement, that the system must be spanning and linearly independent.

+

Standard basis

+

We may define a standard basis for a vector space. By convention, the +standard basis in 2{\mathbb{R}}^{2} is

+

(10)(01)\begin{pmatrix} +1 \\ +0 +\end{pmatrix}\begin{pmatrix} +0 \\ +1 +\end{pmatrix}

+

Verify that the above is in fact a basis (that is, linearly independent +and generating).

+

Recalling the definition of the basis, we can represent any vector in +2{\mathbb{R}}^{2} as the linear combination of the standard basis.

+

Therefore, for any arbitrary vector v2v \in {\mathbb{R}}^{2}, we can +represent it as

+

v=α1(10)+α2(01)v = \alpha_{1}\begin{pmatrix} +1 \\ +0 +\end{pmatrix} + \alpha_{2}\begin{pmatrix} +0 \\ +1 +\end{pmatrix}

+

Let us call α1\alpha_{1} and α2\alpha_{2} the coordinates of the +vector. Then, we can write vv as

+

v=(α1α2)v = \begin{pmatrix} +\alpha_{1} \\ +\alpha_{2} +\end{pmatrix}

+

For example, the vector

+

(12)\begin{pmatrix} +1 \\ +2 +\end{pmatrix}

+

represents

+

1(10)+2(01)1 \cdot \begin{pmatrix} +1 \\ +0 +\end{pmatrix} + 2 \cdot \begin{pmatrix} +0 \\ +1 +\end{pmatrix}

+

Verify that this aligns with your previous intuition of vectors.

+

You may recognize the standard basis in 2{\mathbb{R}}^{2} as the +familiar unit vectors

+

î,ĵ\hat{i},\hat{j}

+

This aligns with the fact that

+

(αβ)=αî+βĵ\begin{pmatrix} +\alpha \\ +\beta +\end{pmatrix} = \alpha\hat{i} + \beta\hat{j}

+

However, we may define a standard basis in any arbitrary vector space. +So, let

+

e1,e2,,ene_{1},e_{2},\ldots,e_{n}

+

be a standard basis in 𝔽n{\mathbb{F}}^{n}. Then, the coordinates +α1,α2,,αn\alpha_{1},\alpha_{2},\ldots,\alpha_{n} of a vector +v𝔽nv \in {\mathbb{F}}^{n} represent the following

+

(α1α2αn)=α1e1+α2+e2+αnen\begin{pmatrix} +\alpha_{1} \\ +\alpha_{2} \\ + \vdots \\ +\alpha_{n} +\end{pmatrix} = \alpha_{1}e_{1} + \alpha_{2} + e_{2} + \alpha_{n}e_{n}

+

Using our new notation, the standard basis in 2{\mathbb{R}}^{2} is

+

e1=(10),e2=(01)e_{1} = \begin{pmatrix} +1 \\ +0 +\end{pmatrix},e_{2} = \begin{pmatrix} +0 \\ +1 +\end{pmatrix}

+

Matrices

+

Before discussing any properties of matrices, let’s simply reiterate +what we learned in class about their notation. We say a matrix with rows +of length mm, and columns of size nn (in less precise terms, a matrix +with length mm and height nn) is a m×nm \times n matrix.

+

Given a matrix

+

A=(123456789)A = \begin{pmatrix} +1 & 2 & 3 \\ +4 & 5 & 6 \\ +7 & 8 & 9 +\end{pmatrix}

+

we refer to the entry in row jj and column kk as Aj,kA_{j,k} .

+

Matrix transpose

+

A formalism that is useful later on is called the transpose, and we +obtain it from a matrix AA by switching all the rows and columns. More +precisely, each row becomes a column instead. We use the notation +ATA^{T} to represent the transpose of AA.

+

(123456)T=(142536)\begin{pmatrix} +1 & 2 & 3 \\ +4 & 5 & 6 +\end{pmatrix}^{T} = \begin{pmatrix} +1 & 4 \\ +2 & 5 \\ +3 & 6 +\end{pmatrix}

+

Formally, we can say (AT)j,k=Ak,j\left( A^{T} \right)_{j,k} = A_{k,j}

+

Linear transformations

+

A linear transformation T:VWT:V \rightarrow W is a mapping between two +vector spaces VWV \rightarrow W, such that the following axioms are +satisfied:

+
    +
  1. T(v+w)=T(v)+T(w),vV,wWT(v + w) = T(v) + T(w),\forall v \in V,\forall w \in W

  2. +
  3. T(αv)+T(βw)=αT(v)+βT(w),vV,wWT(\alpha v) + T(\beta w) = \alpha T(v) + \beta T(w),\forall v \in V,\forall w \in W, +for all scalars α,β\alpha,\beta

  4. +
+

Definition. TT is a linear transformation iff.

+

T(αv+βw)=αT(v)+βT(w)T(\alpha v + \beta w) = \alpha T(v) + \beta T(w)

+

Abuse of notation. From now on, we may elide the parentheses and say +that T(v)=Tv,vVT(v) = Tv,\forall v \in V

+

Remark. A phrase that you may commonly hear is that linear +transformations preserve linearity. Essentially, straight lines remain +straight, parallel lines remain parallel, and the origin remains fixed +at 0. Take a moment to think about why this is true (at least, in lower +dimensional spaces you can visualize).

+

Examples.

+
    +
  1. Rotation for V=W=2V = W = {\mathbb{R}}^{2} (i.e. rotation in 2 +dimensions). Given v,w2v,w \in {\mathbb{R}}^{2}, and their linear +combination v+wv + w, a rotation of γ\gamma radians of v+wv + w is +equivalent to first rotating vv and ww individually by γ\gamma +and then taking their linear combination.

  2. +
  3. Differentiation of polynomials. In this case V=nV = {\mathbb{P}}^{n} +and W=n1W = {\mathbb{P}}^{n - 1}, where n{\mathbb{P}}^{n} is the +field of all polynomials of degree at most nn.

    +

    ddx(αv+βw)=αddxv+βddxw,vV,wW, scalars α,β\frac{d}{dx}(\alpha v + \beta w) = \alpha\frac{d}{dx}v + \beta\frac{d}{dx}w,\forall v \in V,w \in W,\forall\text{ scalars }\alpha,\beta

  4. +
+

Matrices represent linear transformations

+

Suppose we wanted to represent a linear transformation +T:𝔽n𝔽mT:{\mathbb{F}}^{n} \rightarrow {\mathbb{F}}^{m}. I propose that we +need encode how TT acts on the standard basis of 𝔽n{\mathbb{F}}^{n}.

+

Using our intuition from lower dimensional vector spaces, we know that +the standard basis in 2{\mathbb{R}}^{2} is the unit vectors î\hat{i} +and ĵ\hat{j}. Because linear transformations preserve linearity (i.e. +all straight lines remain straight and parallel lines remain parallel), +we can encode any transformation as simply changing î\hat{i} and +ĵ\hat{j}. And indeed, if any vector v2v \in {\mathbb{R}}^{2} can be +represented as the linear combination of î\hat{i} and ĵ\hat{j} (this +is the definition of a basis), it makes sense both symbolically and +geometrically that we can represent all linear transformations as the +transformations of the basis vectors.

+

Example. To reflect all vectors v2v \in {\mathbb{R}}^{2} across the +yy-axis, we can simply change the standard basis to

+

(10)(01)\begin{pmatrix} + - 1 \\ +0 +\end{pmatrix}\begin{pmatrix} +0 \\ +1 +\end{pmatrix}

+

Then, any vector in 2{\mathbb{R}}^{2} using this new basis will be +reflected across the yy-axis. Take a moment to justify this +geometrically.

+

Writing a linear transformation as matrix

+

For any linear transformation +T:𝔽m𝔽nT:{\mathbb{F}}^{m} \rightarrow {\mathbb{F}}^{n}, we can write it as an +n×mn \times m matrix AA. That is, there is a matrix AA with nn rows +and mm columns that can represent any linear transformation from +𝔽m𝔽n{\mathbb{F}}^{m} \rightarrow {\mathbb{F}}^{n}.

+

How should we write this matrix? Naturally, from our previous +discussion, we should write a matrix with each column being one of our +new transformed basis vectors.

+

Example. Our yy-axis reflection transformation from earlier. We write +the bases in a matrix

+

(1001)\begin{pmatrix} + - 1 & 0 \\ +0 & 1 +\end{pmatrix}

+

Matrix-vector multiplication

+

Perhaps you now see why the so-called matrix-vector multiplication is +defined the way it is. Recalling our definition of a basis, given a +basis in VV, any vector vVv \in V can be written as the linear +combination of the vectors in the basis. Then, given a linear +transformation represented by the matrix containing the new basis, we +simply write the linear combination with the new basis instead.

+

Example. Let us first write a vector in the standard basis in +2{\mathbb{R}}^{2} and then show how our matrix-vector multiplication +naturally corresponds to the definition of the linear transformation.

+

(12)2\begin{pmatrix} +1 \\ +2 +\end{pmatrix} \in {\mathbb{R}}^{2}

+

is the same as

+

1(10)+2(01)1 \cdot \begin{pmatrix} +1 \\ +0 +\end{pmatrix} + 2 \cdot \begin{pmatrix} +0 \\ +1 +\end{pmatrix}

+

Then, to perform our reflection, we need only replace the basis vector +(10)\begin{pmatrix} +1 \\ +0 +\end{pmatrix} with (10)\begin{pmatrix} + - 1 \\ +0 +\end{pmatrix}.

+

Then, the reflected vector is given by

+

1(10)+2(01)=(12)1 \cdot \begin{pmatrix} + - 1 \\ +0 +\end{pmatrix} + 2 \cdot \begin{pmatrix} +0 \\ +1 +\end{pmatrix} = \begin{pmatrix} + - 1 \\ +2 +\end{pmatrix}

+

We can clearly see that this is exactly how the matrix multiplication

+

(1001)(12)\begin{pmatrix} + - 1 & 0 \\ +0 & 1 +\end{pmatrix} \cdot \begin{pmatrix} +1 \\ +2 +\end{pmatrix} is defined! The column-by-coordinate rule for +matrix-vector multiplication says that we multiply the nthn^{\text{th}} +entry of the vector by the corresponding nthn^{\text{th}} column of the +matrix and sum them all up (take their linear combination). This +algorithm intuitively follows from our definition of matrices.

+

Matrix-matrix multiplication

+

As you may have noticed, a very similar natural definition arises for +the matrix-matrix multiplication. Multiplying two matrices ABA \cdot B +is essentially just taking each column of BB, and applying the linear +transformation defined by the matrix AA!

+ +]]>
+ Sat, 15 Feb 2025 00:00:00 UT + https://blog.youwen.dev/an-assortment-of-preliminaries-on-linear-algebra.html + Youwen Wu +
+ + Nix automatic hash updates made easy + https://blog.youwen.dev/nix-automatic-hash-updates-made-easy.html + +
+

+ Nix automatic hash updates made easy +

+

+ keep your flakes up to date +

+
2024-12-28
+
+ +
+
+

Nix users often create flakes to package software out of tree, like this Zen +Browser flake I’ve been +maintaining. Keeping them up to date is a hassle though, since you have to +update the Subresource Integrity (SRI) hashes that Nix uses to ensure +reproducibility.

+

Here’s a neat method I’ve been using to cleanly handle automatic hash updates. +I use Nushell to easily work with data, prefetch +some hashes, and put it all in a JSON file that can be read by Nix at build +time.

+

First, let’s create a file called update.nu. At the top, place this shebang:

+
#!/usr/bin/env -S nix shell nixpkgs#nushell --command nu
+

This will execute the script in a Nushell environment, which is fetched by Nix.

+

Get the up to date URLs

+

We need to obtain the latest version of whatever software we want to update. +In this case, I’ll use GitHub releases as my source of truth.

+

You can use the GitHub API to fetch metadata about all the releases of a repository.

+
https://api.github.com/repos/($repo)/releases
+

Roughly speaking, the raw JSON returned by the GitHub releases API looks something like:

+
[
+   {tag_name: "foo", prerelease: false, ...},
+   {tag_name: "bar", prerelease: true, ...},
+   {tag_name: "foobar", prerelease: false, ...},
+]
+
+

Note that the ordering of the objects in the array is chronological.

+
+

Even if you aren’t using GitHub releases, as long as there is a reliable way to +programmatically fetch the latest download URLs of whatever software you’re +packaging, you can adapt this approach for your specific case.

+
+

We use Nushell’s http get to make a network request. Nushell will +automatically detect and parse the JSON reponse into a Nushell table.

+

In my case, Zen Browser frequently publishes prerelease “twilight” builds which +we don’t want to update to. So, we ignore any releases tagged “twilight” or +marked “prerelease” by filtering them out with the where selector.

+

Finally, we retrieve the tag name of the item at the first index, which would +be the latest release (since the JSON array was chronologically sorted).

+
#!/usr/bin/env -S nix shell nixpkgs#nushell --command nu
+
+# get the latest tag of the latest release that isn't a prerelease
+def get_latest_release [repo: string] {
+  try {
+	http get $"https://api.github.com/repos/($repo)/releases"
+	  | where prerelease == false
+	  | where tag_name != "twilight"
+	  | get tag_name
+	  | get 0
+  } catch { |err| $"Failed to fetch latest release, aborting: ($err.msg)" }
+}
+

Prefetching SRI hashes

+

Now that we have the latest tags, we can easily obtain the latest download URLs, which are of the form:

+
https://github.com/zen-browser/desktop/releases/download/$tag/zen.linux-x86_64.tar.bz2
+https://github.com/zen-browser/desktop/releases/download/$tag/zen.aarch64-x86_64.tar.bz2
+

However, we still need the corresponding SRI hashes to pass to Nix.

+
src = fetchurl {
+   url = "https://github.com/zen-browser/desktop/releases/download/1.0.2-b.5/zen.linux-x86_64.tar.bz2";
+   hash = "sha256-00000000000000000000000000000000000000000000";
+};
+

The easiest way to obtain these new hashes is to update the URL and then set +the hash property to an empty string (""). Nix will spit out an hash mismatch +error with the correct hash. However, this is inconvenient for automated +command line scripting.

+

The Nix documentation mentions +nix-prefetch-url +as a way to obtain these hashes, but as usual, it doesn’t work quite right and +has also been replaced by a more powerful but underdocumented experimental +feature instead.

+

The nix store +prefetch-file +command does what nix-prefetch-url is supposed to do, but handles the caveats +that lead to the wrong hash being produced automatically.

+

Let’s write a Nushell function that outputs the SRI hash of the given URL. We +tell prefetch-file to output structured JSON that we can parse.

+

Since Nushell is a shell, we can directly invoke shell commands like usual, +and then process their output with pipes.

+
def get_nix_hash [url: string] {
+  nix store prefetch-file --hash-type sha256 --json $url | from json | get hash
+}
+

Cool! Now get_nix_hash can give us SRI hashes that look like this:

+
sha256-K3zTCLdvg/VYQNsfeohw65Ghk8FAjhOl8hXU6REO4/s=
+

Putting it all together

+

Now that we’re able to fetch the latest release, obtain the download URLs, and +compute their SRI hashes, we have all the information we need to make an +automated update. However, these URLs are typically hardcoded in our Nix +expressions. The question remains as to how to update these values.

+

A common way I’ve seen updates performed is using something like sed to +modify the Nix expressions in place. However, there’s actually a more +maintainable and easy to understand approach.

+

Let’s have our Nushell script generate the URLs and hashes and place them in a +JSON file! Then, we’ll be able to read the JSON file from Nix and obtain the +URL and hash.

+
def generate_sources [] {
+  let tag = get_latest_release "zen-browser/desktop"
+  let prev_sources = open ./sources.json
+
+  if $tag == $prev_sources.version {
+	# everything up to date
+	return $tag
+  }
+
+  # generate the download URLs with the new tag
+  let x86_64_url = $"https://github.com/zen-browser/desktop/releases/download/($tag)/zen.linux-x86_64.tar.bz2"
+  let aarch64_url = $"https://github.com/zen-browser/desktop/releases/download/($tag)/zen.linux-aarch64.tar.bz2"
+
+  # create a Nushell record that maps cleanly to JSON
+  let sources = {
+    # add a version field as well for convenience
+	version: $tag
+
+	x86_64-linux: {
+	  url:  $x86_64_url
+	  hash: (get_nix_hash $x86_64_url)
+	}
+	aarch64-linux: {
+	  url: $aarch64_url
+	  hash: (get_nix_hash $aarch64_url)
+	}
+  }
+
+  echo $sources | save --force "sources.json"
+
+  return $tag
+}
+

Running this script with

+
chmod +x ./update.nu
+./update.nu
+

gives us the file sources.json:

+
{
+  "version": "1.0.2-b.5",
+  "x86_64-linux": {
+    "url": "https://github.com/zen-browser/desktop/releases/download/1.0.2-b.5/zen.linux-x86_64.tar.bz2",
+    "hash": "sha256-K3zTCLdvg/VYQNsfeohw65Ghk8FAjhOl8hXU6REO4/s="
+  },
+  "aarch64-linux": {
+    "url": "https://github.com/zen-browser/desktop/releases/download/1.0.2-b.5/zen.linux-aarch64.tar.bz2",
+    "hash": "sha256-NwIYylGal2QoWhWKtMhMkAAJQ6iNHfQOBZaxTXgvxAk="
+  }
+}
+

Now, let’s read this from Nix. My file organization looks like the following:

+
./
+| flake.nix
+| zen-browser-unwrapped.nix
+| ...other files...
+

zen-browser-unwrapped.nix contains the derivation for Zen Browser. Let’s add +version, url, and hash to its inputs:

+
{
+  stdenv,
+  fetchurl,
+  # add these below
+  version,
+  url,
+  hash,
+  ...
+}:
+stdenv.mkDerivation {
+   # inherit version from inputs
+  inherit version;
+  pname = "zen-browser-unwrapped";
+
+  src = fetchurl {
+    # inherit the URL and hash we obtain from the inputs
+    inherit url hash;
+  };
+}
+

Then in flake.nix, let’s provide the derivation with the data from sources.json:

+
let
+   supportedSystems = [
+     "x86_64-linux"
+     "aarch64-linux"
+   ];
+   forAllSystems = nixpkgs.lib.genAttrs supportedSystems;
+in
+{
+   # rest of file omitted for simplicity
+   packages = forAllSystems (
+     system:
+     let
+       pkgs = import nixpkgs { inherit system; };
+       # parse sources.json into a Nix attrset
+       sources = builtins.fromJSON (builtins.readFile ./sources.json);
+     in
+     rec {
+       zen-browser-unwrapped = pkgs.callPackage ./zen-browser-unwrapped.nix {
+         inherit (sources.${system}) hash url;
+         inherit (sources) version;
+
+         # if the above is difficult to understand, it is equivalent to the following:
+         hash = sources.${system}.hash;
+         url = sources.${system}.url;
+         version = sources.version;
+       };
+}
+

Now, running nix build .#zen-browser-unwrapped will be able to use the hashes +and URLs from sources.json to build the package!

+

Automating it in CI

+

We now have a script that can automatically fetch releases and generate hashes +and URLs, as well as a way for Nix to use the outputted JSON to build +derivations. All that’s left is to fully automate it using CI!

+

We are going to use GitHub actions for this, as it’s free and easy and you’re +probably already hosting on GitHub.

+

Ensure you’ve set up actions for your repo and given it sufficient permissions.

+

We’re gonna run it on a cron timer that checks for updates at 8 PM PST every day.

+

We use DeterminateSystems’ actions to help set up Nix. Then, we simply run our +update script. Since we made the script return the tag it fetched, we can store +it in a variable and then use it in our commit message.

+
name: Update to latest version, and update flake inputs
+
+on:
+  schedule:
+    - cron: "0 4 * * *"
+  workflow_dispatch:
+
+jobs:
+  update:
+    name: Update flake inputs and browser
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout Repository
+        uses: actions/checkout@v4
+
+      - name: Check flake inputs
+        uses: DeterminateSystems/flake-checker-action@v4
+
+      - name: Install Nix
+        uses: DeterminateSystems/nix-installer-action@main
+
+      - name: Set up magic Nix cache
+        uses: DeterminateSystems/magic-nix-cache-action@main
+
+      - name: Check for update and perform update
+        run: |
+          git config --global user.name "github-actions[bot]"
+          git config --global user.email "github-actions[bot]@users.noreply.github.com"
+
+          chmod +x ./update.nu
+          export ZEN_LATEST_VER="$(./update.nu)"
+
+          git add -A
+          git commit -m "github-actions: update to $ZEN_LATEST_VER" || echo "Latest version is $ZEN_LATEST_VER, no updates found"
+
+          nix flake update --commit-lock-file
+
+          git push
+

Now, our repository will automatically check for and perform updates every day!

+ +]]>
+ Sat, 28 Dec 2024 00:00:00 UT + https://blog.youwen.dev/nix-automatic-hash-updates-made-easy.html + Youwen Wu +
+ + a haskellian blog + https://blog.youwen.dev/a-haskellian-blog.html + +
+

+ a haskellian blog +

+

+ a purely functional...blog? +

+
2024-05-25
+
+ (last updated: 2024-05-25T12:00:00Z) +
+
+

Welcome! This is the first post on The Involution and also one that tests all +of the features.

+ + + + +
+

A monad is just a monoid in the category of endofunctors, what’s the problem?

+
+

haskell?

+

This entire blog is generated with hakyll. It’s +a library for generating static sites for Haskell, a purely functional +programming language. It’s a library because it doesn’t come with as many +batteries included as tools like Hugo or Astro. You set up most of the site +yourself by calling the library from Haskell.

+

Here’s a brief excerpt:

+
main :: IO ()
+main = hakyllWith config $ do
+    forM_
+        [ "CNAME"
+        , "favicon.ico"
+        , "robots.txt"
+        , "_config.yml"
+        , "images/*"
+        , "out/*"
+        , "fonts/*"
+        ]
+        $ \f -> match f $ do
+            route idRoute
+            compile copyFileCompiler
+

The code highlighting is also generated by hakyll.

+
+

why?

+

Haskell is a purely functional language with no mutable state. Its syntax +actually makes it pretty elegant for declaring routes and “rendering” pipelines.

+
    +
  1. Haskell is cool.
  2. +
  3. It comes with enough features that I don’t feel like I have to build +everything from scratch.
  4. +
  5. It comes with Pandoc, a Haskell library for converting between markdown +formats. It’s probably more powerful than anything you could do in nodejs. +It renders all of the markdown to HTML as well as the math. +
      +
    1. It supports KaTeX as well as MathML. I’m a little disappointed with the +KaTeX though. It doesn’t directly render it, but simply injects the KaTeX +files and renders it client-side.
    2. +
  6. +
+

speaking of math

+

We can have math inline, like so: +ex2dx=π\int_{-\infty}^\infty \, e^{-x^2}\,dx = \sqrt{\pi}. This site ships semantic +MathML math with its HTML, and the MathJax script to the client.

+

It’d be nice if MathML could just be used and supported across all browsers, but +unfortunately we still aren’t quite there yet. Firefox is the only one where +everything looks 80% of the way to LaTeX. On Safari and Chrome, even simple +equations like π\sqrt{\pi} render improperly.

+

Pros of MathML:

+
    +
  • A little more accessible
  • +
  • Can be rendered without additional stylesheets. I just installed the Latin +Modern font, but this isn’t even really necessary
  • +
  • Built-in to most browsers (#UseThePlatform)
  • +
+

Cons:

+
    +
  • Isn’t fully standardized. Might look different on different browsers
  • +
  • Rendering quality isn’t as good as KaTeX
  • +
+

This site has MathJax render all of the math so it looks nice and standardized +across browsers, but the math still displays regardless (like say if MathJax +couldn’t load due to slow network) because of MathML. Best of both worlds.

+

Let’s try it now. Here’s a simple theorem:

+

an+bncn{a,b,c}n3 +a^n + b^n \ne c^n \, \forall\,\left\{ a,\,b,\,c \right\} \in \mathbb{Z} \land n \ge 3 +

+

The proof is trivial and will be left as an exercise to the reader.

+

seems a little overengineered

+

Probably is. Not as much as the old one, though.

+ +]]>
+ Sat, 25 May 2024 00:00:00 UT + https://blog.youwen.dev/a-haskellian-blog.html + Youwen Wu +
+ +
+
diff --git a/sitemap.xml b/sitemap.xml new file mode 100644 index 0000000..68b1c07 --- /dev/null +++ b/sitemap.xml @@ -0,0 +1,37 @@ + + + + https://blog.youwen.dev + daily + 1.0 + + + + https://blog.youwen.dev/random-variables-distributions-and-probability-theory.html + 2025-02-16 + weekly + 0.8 + + + + https://blog.youwen.dev/an-assortment-of-preliminaries-on-linear-algebra.html + 2025-02-15 + weekly + 0.8 + + + + https://blog.youwen.dev/nix-automatic-hash-updates-made-easy.html + 2024-12-28 + weekly + 0.8 + + + + https://blog.youwen.dev/a-haskellian-blog.html + 2024-05-25T12:00:00Z + weekly + 0.8 + + +