feat: add linear algebra post

2025-02-22 10:11:11 -08:00 · 2025-02-15 13:27:45 -08:00 · 2025-02-15 13:27:45 -08:00 · 5dc319a371
commit 5dc319a371
parent cd765ab6bf
1 changed files with 551 additions and 0 deletions
--- a/src/posts/2025-02-15-linear-algebra.md
+++ b/src/posts/2025-02-15-linear-algebra.md
@ -0,0 +1,551 @@
+---
+author: "Youwen Wu"
+authorTwitter: "@youwen"
+image: "https://wallpapercave.com/wp/wp12329537.png"
+keywords: "linear algebra, algebra, math"
+lang: "en"
+title: "An assortment of preliminaries on linear algebra"
+desc: "and also a test for pandoc"
+---
+
+This entire document was written entirely in [Typst](https://typst.app/) and
+directly translated to this file by Pandoc. It serves as a proof of concept of
+a way to do static site generation from Typst files instead of Markdown.
+
+---
+
+I figured I should write this stuff down before I forgot it.
+
+# Basic Notions
+
+## Vector spaces
+
+Before we can understand vectors, we need to first discuss *vector
+spaces*. Thus far, you have likely encountered vectors primarily in
+physics classes, generally in the two-dimensional plane. You may
+conceptualize them as arrows in space. For vectors of size $> 3$, a hand
+waving argument is made that they are essentially just arrows in higher
+dimensional spaces.
+
+It is helpful to take a step back from this primitive geometric
+understanding of the vector. Let us build up a rigorous idea of vectors
+from first principles.
+
+### Vector axioms
+
+The so-called *axioms* of a *vector space* (which we'll call the vector
+space $V$) are as follows:
+
+1.  Commutativity: $u + v = v + u,\text{   }\forall u,v \in V$
+
+2.  Associativity:
+    $(u + v) + w = u + (v + w),\text{   }\forall u,v,w \in V$
+
+3.  Zero vector: $\exists$ a special vector, denoted $0$, such that
+    $v + 0 = v,\text{   }\forall v \in V$
+
+4.  Additive inverse:
+    $\forall v \in V,\text{   }\exists w \in V\text{ such that }v + w = 0$.
+    Such an additive inverse is generally denoted $- v$
+
+5.  Multiplicative identity: $1v = v,\text{   }\forall v \in V$
+
+6.  Multiplicative associativity:
+    $(\alpha\beta)v = \alpha(\beta v)\text{   }\forall v \in V,\text{ scalars }\alpha,\beta$
+
+7.  Distributive property for vectors:
+    $\alpha(u + v) = \alpha u + \alpha v\text{   }\forall u,v \in V,\text{ scalars }\alpha$
+
+8.  Distributive property for scalars:
+    $(\alpha + \beta)v = \alpha v + \beta v\text{   }\forall v \in V,\text{  scalars }\alpha,\beta$
+
+It is easy to show that the zero vector $0$ and the additive inverse
+$- v$ are *unique*. We leave the proof of this fact as an exercise.
+
+These may seem difficult to memorize, but they are essentially the same
+familiar algebraic properties of numbers you know from high school. The
+important thing to remember is which operations are valid for what
+objects. For example, you cannot add a vector and scalar, as it does not
+make sense.
+
+*Remark*. For those of you versed in computer science, you may recognize
+this as essentially saying that you must ensure your operations are
+*type-safe*. Adding a vector and scalar is not just "wrong" in the same
+sense that $1 + 1 = 3$ is wrong, it is an *invalid question* entirely
+because vectors and scalars and different types of mathematical objects.
+See [@chen2024digression] for more.
+
+### Vectors big and small
+
+In order to begin your descent into what mathematicians colloquially
+recognize as *abstract vapid nonsense*, let's discuss which fields
+constitute a vector space. We have the familiar field of $\mathbb{R}$
+where all scalars are real numbers, with corresponding vector spaces
+${\mathbb{R}}^{n}$, where $n$ is the length of the vector. We generally
+discuss 2D or 3D vectors, corresponding to vectors of length 2 or 3; in
+our case, ${\mathbb{R}}^{2}$ and ${\mathbb{R}}^{3}$.
+
+However, vectors in ${\mathbb{R}}^{n}$ can really be of any length.
+Vectors can be viewed as arbitrary length lists of numbers (for the
+computer science folk: think C++ `std::vector`).
+
+*Example*. $$\begin{pmatrix}
+1 \\
+2 \\
+3 \\
+4 \\
+5 \\
+6 \\
+7 \\
+8 \\
+9
+\end{pmatrix} \in {\mathbb{R}}^{9}$$
+
+Keep in mind that vectors need not be in ${\mathbb{R}}^{n}$ at all.
+Recall that a vector space need only satisfy the aforementioned *axioms
+of a vector space*.
+
+*Example*. The vector space ${\mathbb{C}}^{n}$ is similar to
+${\mathbb{R}}^{n}$, except it includes complex numbers. All complex
+vector spaces are real vector spaces (as you can simply restrict them to
+only use the real numbers), but not the other way around.
+
+From now on, let us refer to vector spaces ${\mathbb{R}}^{n}$ and
+${\mathbb{C}}^{n}$ as ${\mathbb{F}}^{n}$.
+
+In general, we can have a vector space where the scalars are in an
+arbitrary field, as long as the axioms are satisfied.
+
+*Example*. The vector space of all polynomials of at most degree 3, or
+${\mathbb{P}}^{3}$. It is not yet clear what this vector may look like.
+We shall return to this example once we discuss *basis*.
+
+## Vector addition. Multiplication
+
+Vector addition, represented by $+$ can be done entrywise.
+
+*Example.*
+
+$$\begin{pmatrix}
+1 \\
+2 \\
+3
+\end{pmatrix} + \begin{pmatrix}
+4 \\
+5 \\
+6
+\end{pmatrix} = \begin{pmatrix}
+1 + 4 \\
+2 + 5 \\
+3 + 6
+\end{pmatrix} = \begin{pmatrix}
+5 \\
+7 \\
+9
+\end{pmatrix}$$ $$\begin{pmatrix}
+1 \\
+2 \\
+3
+\end{pmatrix} \cdot \begin{pmatrix}
+4 \\
+5 \\
+6
+\end{pmatrix} = \begin{pmatrix}
+1 \cdot 4 \\
+2 \cdot 5 \\
+3 \cdot 6
+\end{pmatrix} = \begin{pmatrix}
+4 \\
+10 \\
+18
+\end{pmatrix}$$
+
+This is simple enough to understand. Again, the difficulty is simply
+ensuring that you always perform operations with the correct *types*.
+For example, once we introduce matrices, it doesn't make sense to
+multiply or add vectors and matrices in this fashion.
+
+## Vector-scalar multiplication
+
+Multiplying a vector by a scalar simply results in each entry of the
+vector being multiplied by the scalar.
+
+*Example*.
+
+$$\beta\begin{pmatrix}
+a \\
+b \\
+c
+\end{pmatrix} = \begin{pmatrix}
+\beta \cdot a \\
+\beta \cdot b \\
+\beta \cdot c
+\end{pmatrix}$$
+
+## Linear combinations
+
+Given vector spaces $V$ and $W$ and vectors $v \in V$ and $w \in W$,
+$v + w$ is the *linear combination* of $v$ and $w$.
+
+### Spanning systems
+
+We say that a set of vectors $v_{1},v_{2},\ldots,v_{n} \in V$ *span* $V$
+if the linear combination of the vectors can represent any arbitrary
+vector $v \in V$.
+
+Precisely, given scalars $\alpha_{1},\alpha_{2},\ldots,\alpha_{n}$,
+
+$$\alpha_{1}v_{1} + \alpha_{2}v_{2} + \ldots + \alpha_{n}v_{n} = v,\forall v \in V$$
+
+Note that any scalar $\alpha_{k}$ could be 0. Therefore, it is possible
+for a subset of a spanning system to also be a spanning system. The
+proof of this fact is left as an exercise.
+
+### Intuition for linear independence and dependence
+
+We say that $v$ and $w$ are linearly independent if $v$ cannot be
+represented by the scaling of $w$, and $w$ cannot be represented by the
+scaling of $v$. Otherwise, they are *linearly dependent*.
+
+You may intuitively visualize linear dependence in the 2D plane as two
+vectors both pointing in the same direction. Clearly, scaling one vector
+will allow us to reach the other vector. Linear independence is
+therefore two vectors pointing in different directions.
+
+Of course, this definition applies to vectors in any ${\mathbb{F}}^{n}$.
+
+### Formal definition of linear dependence and independence
+
+Let us formally define linear independence for arbitrary vectors in
+${\mathbb{F}}^{n}$. Given a set of vectors
+
+$$v_{1},v_{2},\ldots,v_{n} \in V$$
+
+we say they are linearly independent iff. the equation
+
+$$\alpha_{1}v_{1} + \alpha_{2}v_{2} + \ldots + \alpha_{n}v_{n} = 0$$
+
+has only a unique set of solutions
+$\alpha_{1},\alpha_{2},\ldots,\alpha_{n}$ such that all $\alpha_{n}$ are
+zero.
+
+Equivalently,
+
+$$\left| \alpha_{1} \right| + \left| \alpha_{2} \right| + \ldots + \left| \alpha_{n} \right| = 0$$
+
+More precisely,
+
+$$\sum_{i = 1}^{k}\left| \alpha_{i} \right| = 0$$
+
+Therefore, a set of vectors $v_{1},v_{2},\ldots,v_{m}$ is linearly
+dependent if the opposite is true, that is there exists solution
+$\alpha_{1},\alpha_{2},\ldots,\alpha_{m}$ to the equation
+
+$$\alpha_{1}v_{1} + \alpha_{2}v_{2} + \ldots + \alpha_{m}v_{m} = 0$$
+
+such that
+
+$$\sum_{i = 1}^{k}\left| \alpha_{i} \right| \neq 0$$
+
+### Basis
+
+We say a system of vectors $v_{1},v_{2},\ldots,v_{n} \in V$ is a *basis*
+in $V$ if the system is both linearly independent and spanning. That is,
+the system must be able to represent any vector in $V$ as well as
+satisfy our requirements for linear independence.
+
+Equivalently, we may say that a system of vectors in $V$ is a basis in
+$V$ if any vector $v \in V$ admits a *unique representation* as a linear
+combination of vectors in the system. This is equivalent to our previous
+statement, that the system must be spanning and linearly independent.
+
+### Standard basis
+
+We may define a *standard basis* for a vector space. By convention, the
+standard basis in ${\mathbb{R}}^{2}$ is
+
+$$\begin{pmatrix}
+1 \\
+0
+\end{pmatrix}\begin{pmatrix}
+0 \\
+1
+\end{pmatrix}$$
+
+Verify that the above is in fact a basis (that is, linearly independent
+and generating).
+
+Recalling the definition of the basis, we can represent any vector in
+${\mathbb{R}}^{2}$ as the linear combination of the standard basis.
+
+Therefore, for any arbitrary vector $v \in {\mathbb{R}}^{2}$, we can
+represent it as
+
+$$v = \alpha_{1}\begin{pmatrix}
+1 \\
+0
+\end{pmatrix} + \alpha_{2}\begin{pmatrix}
+0 \\
+1
+\end{pmatrix}$$
+
+Let us call $\alpha_{1}$ and $\alpha_{2}$ the *coordinates* of the
+vector. Then, we can write $v$ as
+
+$$v = \begin{pmatrix}
+\alpha_{1} \\
+\alpha_{2}
+\end{pmatrix}$$
+
+For example, the vector
+
+$$\begin{pmatrix}
+1 \\
+2
+\end{pmatrix}$$
+
+represents
+
+$$1 \cdot \begin{pmatrix}
+1 \\
+0
+\end{pmatrix} + 2 \cdot \begin{pmatrix}
+0 \\
+1
+\end{pmatrix}$$
+
+Verify that this aligns with your previous intuition of vectors.
+
+You may recognize the standard basis in ${\mathbb{R}}^{2}$ as the
+familiar unit vectors
+
+$$\hat{i},\hat{j}$$
+
+This aligns with the fact that
+
+$$\begin{pmatrix}
+\alpha \\
+\beta
+\end{pmatrix} = \alpha\hat{i} + \beta\hat{j}$$
+
+However, we may define a standard basis in any arbitrary vector space.
+So, let
+
+$$e_{1},e_{2},\ldots,e_{n}$$
+
+be a standard basis in ${\mathbb{F}}^{n}$. Then, the coordinates
+$\alpha_{1},\alpha_{2},\ldots,\alpha_{n}$ of a vector
+$v \in {\mathbb{F}}^{n}$ represent the following
+
+$$\begin{pmatrix}
+\alpha_{1} \\
+\alpha_{2} \\
+ \vdots \\
+\alpha_{n}
+\end{pmatrix} = \alpha_{1}e_{1} + \alpha_{2} + e_{2} + \alpha_{n}e_{n}$$
+
+Using our new notation, the standard basis in ${\mathbb{R}}^{2}$ is
+
+$$e_{1} = \begin{pmatrix}
+1 \\
+0
+\end{pmatrix},e_{2} = \begin{pmatrix}
+0 \\
+1
+\end{pmatrix}$$
+
+## Matrices
+
+Before discussing any properties of matrices, let's simply reiterate
+what we learned in class about their notation. We say a matrix with rows
+of length $m$, and columns of size $n$ (in less precise terms, a matrix
+with length $m$ and height $n$) is a $m \times n$ matrix.
+
+Given a matrix
+
+$$A = \begin{pmatrix}
+1 & 2 & 3 \\
+4 & 5 & 6 \\
+7 & 8 & 9
+\end{pmatrix}$$
+
+we refer to the entry in row $j$ and column $k$ as $A_{j,k}$ .
+
+### Matrix transpose
+
+A formalism that is useful later on is called the *transpose*, and we
+obtain it from a matrix $A$ by switching all the rows and columns. More
+precisely, each row becomes a column instead. We use the notation
+$A^{T}$ to represent the transpose of $A$.
+
+$$\begin{pmatrix}
+1 & 2 & 3 \\
+4 & 5 & 6
+\end{pmatrix}^{T} = \begin{pmatrix}
+1 & 4 \\
+2 & 5 \\
+3 & 6
+\end{pmatrix}$$
+
+Formally, we can say $\left( A^{T} \right)_{j,k} = A_{k,j}$
+
+## Linear transformations
+
+A linear transformation $T:V \rightarrow W$ is a mapping between two
+vector spaces $V \rightarrow W$, such that the following axioms are
+satisfied:
+
+1.  $T(v + w) = T(v) + T(w),\forall v \in V,\forall w \in W$
+
+2.  $T(\alpha v) + T(\beta w) = \alpha T(v) + \beta T(w),\forall v \in V,\forall w \in W$,
+    for all scalars $\alpha,\beta$
+
+*Definition*. $T$ is a linear transformation iff.
+
+$$T(\alpha v + \beta w) = \alpha T(v) + \beta T(w)$$
+
+*Abuse of notation*. From now on, we may elide the parentheses and say
+that $$T(v) = Tv,\forall v \in V$$
+
+*Remark*. A phrase that you may commonly hear is that linear
+transformations preserve *linearity*. Essentially, straight lines remain
+straight, parallel lines remain parallel, and the origin remains fixed
+at 0. Take a moment to think about why this is true (at least, in lower
+dimensional spaces you can visualize).
+
+*Examples*.
+
+1.  Rotation for $V = W = {\mathbb{R}}^{2}$ (i.e. rotation in 2
+    dimensions). Given $v,w \in {\mathbb{R}}^{2}$, and their linear
+    combination $v + w$, a rotation of $\gamma$ radians of $v + w$ is
+    equivalent to first rotating $v$ and $w$ individually by $\gamma$
+    and then taking their linear combination.
+
+2.  Differentiation of polynomials. In this case $V = {\mathbb{P}}^{n}$
+    and $W = {\mathbb{P}}^{n - 1}$, where ${\mathbb{P}}^{n}$ is the
+    field of all polynomials of degree at most $n$.
+
+    $$\frac{d}{dx}(\alpha v + \beta w) = \alpha\frac{d}{dx}v + \beta\frac{d}{dx}w,\forall v \in V,w \in W,\forall\text{ scalars }\alpha,\beta$$
+
+## Matrices represent linear transformations
+
+Suppose we wanted to represent a linear transformation
+$T:{\mathbb{F}}^{n} \rightarrow {\mathbb{F}}^{m}$. I propose that we
+need encode how $T$ acts on the standard basis of ${\mathbb{F}}^{n}$.
+
+Using our intuition from lower dimensional vector spaces, we know that
+the standard basis in ${\mathbb{R}}^{2}$ is the unit vectors $\hat{i}$
+and $\hat{j}$. Because linear transformations preserve linearity (i.e.
+all straight lines remain straight and parallel lines remain parallel),
+we can encode any transformation as simply changing $\hat{i}$ and
+$\hat{j}$. And indeed, if any vector $v \in {\mathbb{R}}^{2}$ can be
+represented as the linear combination of $\hat{i}$ and $\hat{j}$ (this
+is the definition of a basis), it makes sense both symbolically and
+geometrically that we can represent all linear transformations as the
+transformations of the basis vectors.
+
+*Example*. To reflect all vectors $v \in {\mathbb{R}}^{2}$ across the
+$y$-axis, we can simply change the standard basis to
+
+$$\begin{pmatrix}
+ - 1 \\
+0
+\end{pmatrix}\begin{pmatrix}
+0 \\
+1
+\end{pmatrix}$$
+
+Then, any vector in ${\mathbb{R}}^{2}$ using this new basis will be
+reflected across the $y$-axis. Take a moment to justify this
+geometrically.
+
+### Writing a linear transformation as matrix
+
+For any linear transformation
+$T:{\mathbb{F}}^{m} \rightarrow {\mathbb{F}}^{n}$, we can write it as an
+$n \times m$ matrix $A$. That is, there is a matrix $A$ with $n$ rows
+and $m$ columns that can represent any linear transformation from
+${\mathbb{F}}^{m} \rightarrow {\mathbb{F}}^{n}$.
+
+How should we write this matrix? Naturally, from our previous
+discussion, we should write a matrix with each *column* being one of our
+new transformed *basis* vectors.
+
+*Example*. Our $y$-axis reflection transformation from earlier. We write
+the bases in a matrix
+
+$$\begin{pmatrix}
+ - 1 & 0 \\
+0 & 1
+\end{pmatrix}$$
+
+### Matrix-vector multiplication
+
+Perhaps you now see why the so-called matrix-vector multiplication is
+defined the way it is. Recalling our definition of a basis, given a
+basis in $V$, any vector $v \in V$ can be written as the linear
+combination of the vectors in the basis. Then, given a linear
+transformation represented by the matrix containing the new basis, we
+simply write the linear combination with the new basis instead.
+
+*Example*. Let us first write a vector in the standard basis in
+${\mathbb{R}}^{2}$ and then show how our matrix-vector multiplication
+naturally corresponds to the definition of the linear transformation.
+
+$$\begin{pmatrix}
+1 \\
+2
+\end{pmatrix} \in {\mathbb{R}}^{2}$$
+
+is the same as
+
+$$1 \cdot \begin{pmatrix}
+1 \\
+0
+\end{pmatrix} + 2 \cdot \begin{pmatrix}
+0 \\
+1
+\end{pmatrix}$$
+
+Then, to perform our reflection, we need only replace the basis vector
+$\begin{pmatrix}
+1 \\
+0
+\end{pmatrix}$ with $\begin{pmatrix}
+ - 1 \\
+0
+\end{pmatrix}$.
+
+Then, the reflected vector is given by
+
+$$1 \cdot \begin{pmatrix}
+ - 1 \\
+0
+\end{pmatrix} + 2 \cdot \begin{pmatrix}
+0 \\
+1
+\end{pmatrix} = \begin{pmatrix}
+ - 1 \\
+2
+\end{pmatrix}$$
+
+We can clearly see that this is exactly how the matrix multiplication
+
+$$\begin{pmatrix}
+ - 1 & 0 \\
+0 & 1
+\end{pmatrix} \cdot \begin{pmatrix}
+1 \\
+2
+\end{pmatrix}$$ is defined! The *column-by-coordinate* rule for
+matrix-vector multiplication says that we multiply the $n^{\text{th}}$
+entry of the vector by the corresponding $n^{\text{th}}$ column of the
+matrix and sum them all up (take their linear combination). This
+algorithm intuitively follows from our definition of matrices.
+
+### Matrix-matrix multiplication
+
+As you may have noticed, a very similar natural definition arises for
+the *matrix-matrix* multiplication. Multiplying two matrices $A \cdot B$
+is essentially just taking each column of $B$, and applying the linear
+transformation defined by the matrix $A$!