411 lines
15 KiB
Text
411 lines
15 KiB
Text
#import "@preview/unequivocal-ams:0.1.1": ams-article, theorem, proof
|
|
|
|
#show: ams-article.with(
|
|
title: [A Digression on Abstract Linear Algebra],
|
|
authors: (
|
|
(
|
|
name: "Youwen Wu",
|
|
organization: [University of California, Santa Barbara],
|
|
email: "youwen@ucsb.edu",
|
|
url: "https://youwen.dev",
|
|
),
|
|
),
|
|
bibliography: bibliography("refs.bib"),
|
|
)
|
|
|
|
= Introduction
|
|
|
|
Many introductory linear algebra classes focus on _application_. They teach you
|
|
how to perform trivial numerical operations such as the _matrix
|
|
multiplication_, _matrix-vector multiplication_, _row reduction_, and other
|
|
trite tasks better suited for computers.
|
|
|
|
This class is essentially useless. Linear algebra is really a much deeper
|
|
subject, when viewed through the lens of _linear maps_ and _vector spaces_. In
|
|
particular, taking an abstract point-free approach allows the freedom to prove
|
|
theorems that generalize to linear algebra on arbitrary vector spaces, and
|
|
indeed, even infinite vector spaces.
|
|
|
|
|
|
If you are taking this course, you might as well learn linear algebra properly.
|
|
Otherwise, you will have to re-learn it later on, anyways. Completing a math
|
|
course without gaining a theoretical appreciation for the topics at hand is a
|
|
complete and utter waste of time.
|
|
|
|
= Basic Notions
|
|
|
|
== Vector spaces
|
|
|
|
Before we can understand vectors, we need to first discuss _vector spaces_. Thus
|
|
far, you have likely encountered vectors primarily in physics classes,
|
|
generally in the two-dimensional plane. You may conceptualize them as arrows in
|
|
space. For vectors of size $>3$, a hand waving argument is made that they are
|
|
essentially just arrows in higher dimensional spaces.
|
|
|
|
It is helpful to take a step back from this primitive geometric understanding
|
|
of the vector. Let us build up a rigorous idea of vectors from first
|
|
principles.
|
|
|
|
=== Vector axioms
|
|
|
|
The so-called _axioms_ of a _vector space_ (which we'll call the vector space
|
|
$V$) are as follows:
|
|
|
|
#enum[
|
|
Commutativity: $u + v = v + u, " " forall u,v in V$
|
|
][
|
|
Associativity: $(u + v) + w = u + (v + w), " " forall u,v,w in V$
|
|
][
|
|
Zero vector: $exists$ a special vector, denoted $0$, such that $v + 0 = v, " " forall v in V$
|
|
][
|
|
Additive inverse: $forall v in V, " " exists w in V "such that" v + w = 0$. Such an additive inverse is generally denoted $-v$
|
|
][
|
|
Multiplicative identity: $1 v = v, " " forall v in V$
|
|
][
|
|
Multiplicative associativity: $(alpha beta) v = alpha (beta v) " " forall v in V, "scalars" alpha, beta$
|
|
][
|
|
Distributive property for vectors: $alpha (u + v) = alpha u + alpha v " " forall u,v in V, "scalars" alpha$
|
|
][
|
|
Distributive property for scalars: $(alpha + beta) v = alpha v + beta v " " forall v in V, " scalars" alpha, beta$
|
|
]
|
|
|
|
It is easy to show that the zero vector $0$ and the additive inverse $-v$ are
|
|
_unique_. We leave the proof of this fact as an exercise.
|
|
|
|
These may seem difficult to memorize, but they are essentially the same
|
|
familiar algebraic properties of numbers you know from high school. The
|
|
important thing to remember is which operations are valid for what objects. For
|
|
example, you cannot add a vector and scalar, as it does not make sense.
|
|
|
|
_Remark_. For those of you versed in computer science, you may recognize this
|
|
as essentially saying that you must ensure your operations are _type-safe_.
|
|
Adding a vector and scalar is not just "wrong" in the same sense that $1 + 1 =
|
|
3$ is wrong, it is an _invalid question_ entirely because vectors and scalars
|
|
and different types of mathematical objects. See #cite(<chen2024digression>,
|
|
form: "prose") for more.
|
|
|
|
=== Vectors big and small
|
|
|
|
In order to begin your descent into what mathematicians colloquially recognize
|
|
as _abstract vapid nonsense_, let's discuss which fields constitute a vector
|
|
space. We have the familiar field of $RR$ where all scalars are real numbers,
|
|
with corresponding vector spaces $RR^n$, where $n$ is the length of the vector.
|
|
We generally discuss 2D or 3D vectors, corresponding to vectors of length 2 or
|
|
3; in our case, $RR^2$ and $RR^3$.
|
|
|
|
However, vectors in $RR^n$ can really be of any length. Vectors can be viewed
|
|
as arbitrary length lists of numbers (for the computer science folk: think C++
|
|
`std::vector`).
|
|
|
|
_Example_. $ vec(1,2,3,4,5,6,7,8,9) in RR^9 $
|
|
|
|
Keep in mind that vectors need not be in $RR^n$ at all. Recall that a vector
|
|
space need only satisfy the aforementioned _axioms of a vector space_.
|
|
|
|
_Example_. The vector space $CC^n$ is similar to $RR^n$, except it includes
|
|
complex numbers. All complex vector spaces are real vector spaces (as you can
|
|
simply restrict them to only use the real numbers), but not the other way
|
|
around.
|
|
|
|
From now on, let us refer to vector spaces $RR^n$ and $CC^n$ as $FF^n$.
|
|
|
|
In general, we can have a vector space where the scalars are in an arbitrary
|
|
field, as long as the axioms are satisfied.
|
|
|
|
_Example_. The vector space of all polynomials of at most degree 3, or $PP^3$.
|
|
It is not yet clear what this vector may look like. We shall return to this
|
|
example once we discuss _basis_.
|
|
|
|
== Vector addition. Multiplication
|
|
|
|
Vector addition, represented by $+$, and multiplication, represented by the
|
|
$dot$ (dot) operator, can be done entrywise.
|
|
|
|
_Example._
|
|
|
|
$
|
|
vec(1,2,3) + vec(4,5,6) = vec(1 + 4, 2 + 5, 3 + 6) = vec(5,7,9)
|
|
$
|
|
$
|
|
vec(1,2,3) dot vec(4,5,6) = vec(1 dot 4, 2 dot 5, 3 dot 6) = vec(4,10,18)
|
|
$
|
|
|
|
This is simple enough to understand. Again, the difficulty is simply ensuring
|
|
that you always perform operations with the correct _types_. For example, once
|
|
we introduce matrices, it doesn't make sense to multiply or add vectors and
|
|
matrices in this fashion.
|
|
|
|
== Vector-scalar multiplication
|
|
|
|
Multiplying a vector by a scalar simply results in each entry of the vector
|
|
being multiplied by the scalar.
|
|
|
|
_Example_.
|
|
|
|
$ beta vec(a, b, c) = vec(beta dot a, beta dot b, beta dot c) $
|
|
|
|
== Linear combinations
|
|
|
|
Given vector spaces $V$ and $W$ and vectors $v in V$ and $w in W$, $v + w$ is
|
|
the _linear combination_ of $v$ and $w$.
|
|
|
|
=== Spanning systems
|
|
|
|
We say that a set of vectors $v_1, v_2, ..., v_n in V$ _span_ $V$ if the linear
|
|
combination of the vectors can represent any arbitrary vector $v in V$.
|
|
|
|
Precisely, given scalars $alpha_1, alpha_2, ..., alpha_n$,
|
|
|
|
$ alpha_1 v_1 + alpha_2 v_2 + ... + alpha_n v_n = v, forall v in V $
|
|
|
|
Note that any scalar $alpha_k$ could be 0. Therefore, it is possible for a
|
|
subset of a spanning system to also be a spanning system. The proof of this
|
|
fact is left as an exercise.
|
|
|
|
=== Intuition for linear independence and dependence
|
|
|
|
We say that $v$ and $w$ are linearly independent if $v$ cannot be represented by the scaling of $w$, and $w$ cannot be represented by the scaling of $v$. Otherwise, they are _linearly dependent_.
|
|
|
|
You may intuitively visualize linear dependence in the 2D plane as two vectors
|
|
both pointing in the same direction. Clearly, scaling one vector will allow us
|
|
to reach the other vector. Linear independence is therefore two vectors
|
|
pointing in different directions.
|
|
|
|
Of course, this definition applies to vectors in any $FF^n$.
|
|
|
|
=== Formal definition of linear dependence and independence
|
|
|
|
Let us formally define linear independence for arbitrary vectors in $FF^n$. Given a set of vectors
|
|
|
|
$ v_1, v_2, ..., v_n in V $
|
|
|
|
we say they are linearly independent iff. the equation
|
|
|
|
$ alpha_1 v_1 + alpha_2 v_2 + ... + alpha_n v_n = 0 $
|
|
|
|
has only a unique set of solutions $alpha_1, alpha_2, ..., alpha_n$ such that
|
|
all $alpha_n$ are zero.
|
|
|
|
Equivalently,
|
|
|
|
$ abs(alpha_1) + abs(alpha_2) + ... + abs(alpha_n) = 0 $
|
|
|
|
More precisely,
|
|
|
|
$ sum_(i=1)^k abs(alpha_i) = 0 $
|
|
|
|
Therefore, a set of vectors $v_1, v_2, ..., v_m$ is linearly dependent if the opposite is true, that is there exists solution $alpha_1, alpha_2, ..., alpha_m$ to the equation
|
|
|
|
$ alpha_1 v_1 + alpha_2 v_2 + ... + alpha_m v_m = 0 $
|
|
|
|
such that
|
|
|
|
$ sum_(i=1)^k abs(alpha_i) != 0 $
|
|
|
|
=== Basis
|
|
|
|
We say a system of vectors $v_1, v_2, ..., v_n in V$ is a _basis_ in $V$ if the
|
|
system is both linearly independent and spanning. That is, the system must be
|
|
able to represent any vector in $V$ as well as satisfy our requirements for
|
|
linear independence.
|
|
|
|
Equivalently, we may say that a system of vectors in $V$ is a basis in $V$ if
|
|
any vector $v in V$ admits a _unique representation_ as a linear combination of
|
|
vectors in the system. This is equivalent to our previous statement, that the
|
|
system must be spanning and linearly independent.
|
|
|
|
=== Standard basis
|
|
|
|
We may define a _standard basis_ for a vector space. By convention, the
|
|
standard basis in $RR^2$ is
|
|
|
|
$ vec(1, 0) vec(0, 1) $
|
|
|
|
Verify that the above is in fact a basis (that is, linearly independent and
|
|
generating).
|
|
|
|
Recalling the definition of the basis, we can represent any vector in $RR^2$ as
|
|
the linear combination of the standard basis.
|
|
|
|
Therefore, for any arbitrary vector $v in RR^2$, we can represent it as
|
|
|
|
$ v = alpha_1 vec(1, 0) + alpha_2 vec(0,1) $
|
|
|
|
Let us call $alpha_1$ and $alpha_2$ the _coordinates_ of the vector. Then, we can write $v$ as
|
|
|
|
$ v = vec(alpha_1, alpha_2) $
|
|
|
|
For example, the vector
|
|
|
|
$ vec(1, 2) $
|
|
|
|
represents
|
|
|
|
$ 1 dot vec(1, 0) + 2 dot vec(0,1) $
|
|
|
|
Verify that this aligns with your previous intuition of vectors.
|
|
|
|
You may recognize the standard basis in $RR^2$ as the familiar unit vectors
|
|
|
|
$ accent(i, hat), accent(j, hat) $
|
|
|
|
This aligns with the fact that
|
|
|
|
$ vec(alpha, beta) = alpha hat(i) + beta hat(j) $
|
|
|
|
However, we may define a standard basis in any arbitrary vector space. So, let
|
|
|
|
$ e_1, e_2, ..., e_n $
|
|
|
|
be a standard basis in $FF^n$. Then, the coordinates $alpha_1, alpha_2, ..., alpha_n$ of a vector $v in FF^n$ represent the following
|
|
|
|
$
|
|
vec(alpha_1, alpha_2, dots.v, alpha_n) = alpha_1 e_1 + alpha_2 + e_2 + alpha_n e_n
|
|
$
|
|
|
|
Using our new notation, the standard basis in $RR^2$ is
|
|
|
|
$ e_1 = vec(1,0), e_2 = vec(0,1) $
|
|
|
|
== Matrices
|
|
|
|
Before discussing any properties of matrices, let's simply reiterate what we
|
|
learned in class about their notation. We say a matrix with rows of length $m$,
|
|
and columns of size $n$ (in less precise terms, a matrix with length $m$ and
|
|
height $n$) is a $m times n$ matrix.
|
|
|
|
Given a matrix
|
|
|
|
$ A = mat(1,2,3;4,5,6;7,8,9) $
|
|
|
|
we refer to the entry in row $j$ and column $k$ as $A_(j,k)$ .
|
|
|
|
=== Matrix transpose
|
|
|
|
A formalism that is useful later on is called the _transpose_, and we obtain it
|
|
from a matrix $A$ by switching all the rows and columns. More precisely, each
|
|
row becomes a column instead. We use the notation $A^T$ to represent the
|
|
transpose of $A$.
|
|
|
|
$
|
|
mat(1,2,3;4,5,6)^T = mat(1,4;2,5;3,6)
|
|
$
|
|
|
|
Formally, we can say $(A^T)_(j,k) = A_(k,j)$
|
|
|
|
== Linear transformations
|
|
|
|
A linear transformation $T : V -> W$ is a mapping between two vector spaces $V
|
|
-> W$, such that the following axioms are satisfied:
|
|
|
|
+ $T(v + w) = T(v) + T(w), forall v in V, forall w in W$
|
|
+ $T(alpha v) + T(beta w) = alpha T(v) + beta T(w), forall v in V, forall w in W$, for all scalars $alpha, beta$
|
|
|
|
_Definition_. $T$ is a linear transformation iff.
|
|
|
|
$ T(alpha v + beta w) = alpha T(v) + beta T(w) $
|
|
|
|
_Abuse of notation_. From now on, we may elide the parentheses and say that
|
|
$ T(v) = T v, forall v in V $
|
|
|
|
_Remark_. A phrase that you may commonly hear is that linear transformations
|
|
preserve _linearity_. Essentially, straight lines remain straight, parallel
|
|
lines remain parallel, and the origin remains fixed at 0. Take a moment to
|
|
think about why this is true (at least, in lower dimensional spaces you can
|
|
visualize).
|
|
|
|
_Examples_.
|
|
|
|
+ #[Rotation for $V = W = RR^2$ (i.e. rotation in 2 dimensions). Given $v, w in
|
|
RR^2$, and their linear combination $v + w$, a rotation of $gamma$ radians of
|
|
$v + w$ is equivalent to first rotating $v$ and $w$ individually by $gamma$ and
|
|
then taking their linear combination.]
|
|
|
|
+ #[Differentiation of polynomials. In this case $V = PP^n$ and $W = PP^(n -
|
|
1)$, where $PP^n$ is the field of all polynomials of degree at most $n$.
|
|
|
|
$
|
|
dif / (dif x) (
|
|
alpha v + beta w
|
|
) = alpha dif / (dif x) v + beta dif / (dif x) w, forall v in V, w in W, forall "scalars" alpha, beta
|
|
$
|
|
]
|
|
|
|
|
|
|
|
== Matrices represent linear transformations
|
|
|
|
Suppose we wanted to represent a linear transformation $T: FF^n -> FF^m$. I
|
|
propose that we need encode how $T$ acts on the standard basis of $FF^n$.
|
|
|
|
Using our intuition from lower dimensional vector spaces, we know that the
|
|
standard basis in $RR^2$ is the unit vectors $hat(i)$ and $hat(j)$. Because
|
|
linear transformations preserve linearity (i.e. all straight lines remain
|
|
straight and parallel lines remain parallel), we can encode any transformation
|
|
as simply changing $hat(i)$ and $hat(j)$. And indeed, if any vector $v in RR^2$
|
|
can be represented as the linear combination of $hat(i)$ and $hat(j)$ (this is
|
|
the definition of a basis), it makes sense both symbolically and geometrically
|
|
that we can represent all linear transformations as the transformations of the
|
|
basis vectors.
|
|
|
|
_Example_. To reflect all vectors $v in RR^2$ across the $y$-axis, we can simply change the standard basis to
|
|
|
|
$ vec(-1, 0) vec(0,1) $
|
|
|
|
Then, any vector in $RR^2$ using this new basis will be reflected across the
|
|
$y$-axis. Take a moment to justify this geometrically.
|
|
|
|
=== Writing a linear transformation as matrix
|
|
|
|
For any linear transformation $T: FF^m -> FF^n$, we can write it as an $n times
|
|
m$ matrix $A$. That is, there is a matrix $A$ with $n$ rows and $m$ columns
|
|
that can represent any linear transformation from $FF^m -> FF^n$.
|
|
|
|
How should we write this matrix? Naturally, from our previous discussion, we
|
|
should write a matrix with each _column_ being one of our new transformed
|
|
_basis_ vectors.
|
|
|
|
_Example_. Our $y$-axis reflection transformation from earlier. We write the bases in a matrix
|
|
|
|
$ mat(-1,0; 0,1) $
|
|
|
|
=== Matrix-vector multiplication
|
|
|
|
Perhaps you now see why the so-called matrix-vector multiplication is defined
|
|
the way it is. Recalling our definition of a basis, given a basis in $V$, any
|
|
vector $v in V$ can be written as the linear combination of the vectors in the
|
|
basis. Then, given a linear transformation represented by the matrix containing
|
|
the new basis, we simply write the linear combination with the new basis
|
|
instead.
|
|
|
|
_Example_. Let us first write a vector in the standard basis in $RR^2$ and then
|
|
show how our matrix-vector multiplication naturally corresponds to the
|
|
definition of the linear transformation.
|
|
|
|
$ vec(1, 2) in RR^2 $
|
|
|
|
is the same as
|
|
|
|
$ 1 dot vec(1, 0) + 2 dot vec(0, 1) $
|
|
|
|
Then, to perform our reflection, we need only replace the basis vector $vec(1,
|
|
0)$ with $vec(-1, 0)$.
|
|
|
|
Then, the reflected vector is given by
|
|
|
|
$ 1 dot vec(-1, 0) + 2 dot vec(0,1) = vec(-1, 2) $
|
|
|
|
We can clearly see that this is exactly how the matrix multiplication
|
|
|
|
$ mat(-1, 0; 0, 1) dot vec(1, 2) $ is defined! The _column-by-coordinate_ rule
|
|
for matrix-vector multiplication says that we multiply the $n^("th")$ entry of
|
|
the vector by the corresponding $n^("th")$ column of the matrix and sum them
|
|
all up (take their linear combination). This algorithm intuitively follows from
|
|
our definition of matrices.
|
|
|
|
=== Matrix-matrix multiplication
|
|
|
|
As you may have noticed, a very similar natural definition arises for the
|
|
_matrix-matrix_ multiplication. Multiplying two matrices $A dot B$ is
|
|
essentially just taking each column of $B$, and applying the linear
|
|
transformation defined by the matrix $A$!
|