auto-update(nvim): 2025-02-25 04:37:54
Some checks failed
Deploy Quartz site to GitHub Pages using Nix / build (push) Has been cancelled
Deploy Quartz site to GitHub Pages using Nix / deploy (push) Has been cancelled

This commit is contained in:
Youwen Wu 2025-02-25 04:37:54 -08:00
parent 1d31bf8ebe
commit ce5dc50006
Signed by: youwen5
GPG key ID: 865658ED1FE61EC3
2 changed files with 312 additions and 6 deletions

View file

@ -772,5 +772,309 @@ $
(dif) / (dif x) cosh(x) = sinh(x)
$
It's pretty easy to show these using their definitions, and derive the
derivative of $tanh$.
It's pretty easy to show these using their definitions, and derive the derivative of $tanh$.
== End of weeks 1-4
That was all of the content of week 1 to 4. Now we shift to weeks 5-7, where we studied more about vectors and their derivatives.
== Partial derivatives
These are the slopes of tangent lines for the graph in the direction of the
changing variable.
#theorem[Clairaut's theorem][
Suppose $f : RR^2 -> RR$ is defined on a disk $D$ that contains a point
$(a,b)$. If the functions $f_(x y)$ and $f_(y x)$ are ocntinuous on this disk
then they are the same.
]
#theorem[Extended Clairaut][
Suppose $f : RR^2 -> RR$ is defined on a disk $D$ that contains $(a,b)$. If
all of the mixed partial derivatives are continuous anywhere in the disk $D$,
then the mixed partials are equal.
]
== Multivariable chain rule
The product rule actually follows from it. Recall:
$
(f(x) g(x))' = f'(x) g(x) + f(x) g'(x)
$
Then instead let's replace $f$ and $g$ with $x$ and $y$, such that we have something like
$
z = x y
$
Then let $x$ and $y$ be functions of $t$. So
$
z = x(t) y(t)
$
The partial derivatives
$
(diff z) / (diff x) = y, (diff z) / (diff y) = x
$
By the multivariable chain rule,
$
(diff z) / (diff t) = x'(t) (diff z) / (diff x) + (diff z) / (diff y) y'(t) = x'(t) y(t) + x(t) y'(t)
$
== Implicit differentiation
It's similar to the single variable implicit differentiation, but remember to
hold the extraneous variables constant in practice.
#example[
Suppose you have a surface
$
3x^2 + 5 y z + z^3 = 0
$
And you want a partial derivative $(diff y)/(diff z)$ at some point. You can
use implicit differentiation by viewing the surface as a level set of some
larger function $F(x,y,z) = 3x^2 + 5y z + z^3$ where $F(x,y,z) = 0$.
Now we differentiate both sides:
$
(diff F) / (diff z) = diff / (diff z)(3x^2) + diff / (diff z)(5y z) + diff / (diff z)(z^3) \
= 0 + (5 (diff y) / (diff z) + 5y) + 3z^2
$
Then solve for $(diff y)/(diff z)$.
]
== Multivariable chain rule as matrix
Consider $z = f(x,y)$. Then the derivative of $z$ with respect to $x$ and $y$ would be a matrix:
$
mat((diff z)/(diff x), (diff z)/(diff y))
$
Now suppose the coordinate system changes to
$
x = 3u - v \
y = 2v
$
Now suppose we want $z_u$ and $z_v$ at $(x,y) = (3,6)$. Then this is actually
$(u,v) = (2,3)$ in $u v$ coordinates. The partials of $z$ with respect to $u$
and $v$ are just matrix multiplication:
$
mat(z_u,z_v) = mat(z_x, z_y) mat(x_u,x_v;y_u,y_v)
$
== Differentials
Differentials are about linear approximation. Recall in the single variable
case we use the tangent line approximation and differentials to approximate
functions. In the multivariable case it's the tangent plane approximation and
the directional derivative.
Let $f$ be a function with two inputs and one output, say
$
f(x,y) = x^2 + x cos(y)
$
Then the tangent plane at $(x_0,y_0)$ is the plane that best approximates the
function at that point. Now we need two slopes instead of one.
Take a look at $(1,pi/2)$. Then the partial derivatives at that point are 2 and
-1. The idea is we start at $(1,pi/2,1)$ and then move a slight nudge in either
the $x$ or $y$ directions. In the $x$ directions, we move by $Delta x$ and get
an increase in $z$ (height) of $2 dot Delta x$.
If we move a slight nudge in the $y$ direction $Delta y$, then our height
should increase (decrease) by $-1 dot Delta y$.
Then we have a tangent plane approximation of
$
Delta z approx 2 dot Delta x - 1 dot Delta y
$
And like in single variable, we can replace the $Delta$ with $dif$.
$
diff z approx 2 dot diff x - 1 dot diff y
$
Sidenote: how can we actually have the equation of the tangent plane in terms
of $x$,$y$,$z$? Just note that $x = 1 + Delta x$, $y = pi/2 + Delta y$, and $z
= 1 + Delta z$. So just substitute
$
(z - 1) = 2(x-1) - 1(y-pi / 2)
$
So in general, the differential version
$
dif z = m_x dif x + m_y dif y
$
and the tangent plane equation at $(x_0, y_0, z_0)$:
$
z = z_0 + m_x (x-x_0) + m_y (y-y_0)
$
== Directional derivative
Now take another look at the linear approximation in $z$
$
Delta z approx m_x Delta x + m_y Delta y
$
The directional derivative reinterprets this as matrix multiplication
$
Delta z approx mat(m_x,m_y) vec(Delta x, Delta y)
$
This is the same as a dot product, in fact, the dot product of the gradient
given by $vec(m_x,m_y)$ and a tiny movement vector.
== Directional derivative
We mentioned this before, now let's discuss in more detail. If you move by
$Delta x$ and $Delta y$, in $x$ and $y$ directions, then the direction
derivative computes your change in height on the tangent plane.
Consider the question "What is the derivative of $f$" in the direction of the
vector $vec(3,4)$?
Now consider $m_x (3) + m_y (4) = 3m_x + 4m_y$. This is almost the answer, but
really this is the change in $f$ resulting from a movement in the direction of
$vec(3,4)$. If we want the derivative, we're asking for the slope. We have
rise, now run is $sqrt(3^2 + 4^2) = 5$, so the answer is $(3m_x + 4m_y)/5$.
A more intuitive way is to consider a unit vector $arrow(u) = 1/lr(|<3,4>|)
<3,4>$ that points in the same direction. So now the "run" is simply 1. Clearly
we get the same answer, but we have a good formula now
$
"Direction derivative" = nabla arrow(f) dot arrow(u)
$
where $arrow(u)$ is the *unit vector* in our desired direction.
A geometric interpretation is that the direction derivative in a given
direction is just the gradient vector projected in that direction.
To recap:
We discuss two questions: what is the slope in a given direction, and what
direction has the steepest slope?
Let $arrow(F)$ be a gradient vector of our partial derivatives and $arrow(v)$
be the movement vector in the $x y$-plane. Let $arrow(u)$ be a unit vector
pointing in the same direction.
The answer to the first question is $nabla arrow(f) dot arrow(u)$, where
$arrow(u)$ is a unit vector in the given direction
The answer to the second question can be derived.
$
arrow(F) dot arrow(u) = lr(|arrow(F)|) lr(|arrow(u)|) cos(theta)
$
and $theta$ is the angle between $arrow(F)$ and $arrow(u)$. So since $cos
theta$ reaches maximum value at $theta = 0$, the maximum possible slope is
actually in the same direction as $arrow(F)$ with a slope equal the magnitude
of $arrow(F)$.
Takeaways:
- Movement in the direction of the gradient vector gives the "steepest" ascent of the function
- Movement perpendicular to the gradient has slope of 0
- Movement in the opposite direction of the gradient has maximum negative slope (sharpest descent) with same magnitude as gradient
- Any directional derivative in between can be calculated as a projection from the gradient
== Optimization
We spoke previously about the Lagrange multiplier. Now we discuss it in greater
detail.
When optimizing in two dimensions, we either optimize for all of $RR^2$, or on
a constraint in $RR^2$ (such as a curve). For the first case, we use critical
points. For the second, *Lagrange multipliers*.
== Critical points
Critical points occur when the gradient is zero or undefined. Both partials are
zero *or* at least one of them isn't defined.
Essentially, they occur when the tangent plane is flat. We can't just look at
$f''(x)$ like in single variable calculus, but we can take the determinant of
the second order partials for some sort of multivariable concavity. It measures
how much the pure partial derivatives dominate the mixed partial derivatives,
and they need to dominant to a certain extent such that there is consistently
upward or downward curvature in every direction.
We find the critical points when $m_x = 0$ and $m_y = 0$ or either are
undefined. Then we classify them as follows.
Recall second derivative test for single variable function, now consider the
two-variable case.
$
Dif (x_0,y_0) = det mat(f_(x x) (x_0, y_0), f_(x y) (x_0, y_0); f_(y x) (x_0,y_0), f_(y y) (x_0, y_0)) = f_(x x) (x_0, y_0) f_(y y) (x_0, y_0) - f_(x y) (x_0, y_0)^2
$
- If $Dif > 0$ and $f_(x x) (x_0, y_0) > 0$, then $f$ is a relative minimum.
- If $Dif > 0$ and $f_(x x) (x_0, y_0) < 0$, then $f$ is a relative maximum.
- If $Dif < 0$, then$f(x_0, y_0)$ is neither and it's a saddle point.
- If $Dif = 0$, then we don't know
== Lagrange multipliers
We discuss optimization on a restricted curve in our domain.
Idea: we should navigate along the curve and find where the direction
derivative is 0. Recall that this is the same as when the velocity vector is
perpendicular to the gradient. The issue is that we always have to parametrize
the curve.
To avoid this, Lagrange multipliers views the (implicit) constraint equation as
a level curve of another surface. If we take the gradient of that surface
everywhere on the level curve, then that gradient is parallel to the original
function's gradient at critical points.
So, we should be able to take the gradients of both the function and the
constraint function, and look for when one is a scalar multiple of the other.
Consider a function $f$ and a constraint $g$. Then we compute $nabla f$ and
$nabla g$, then solve for when $nabla f = lambda nabla g$. Then we can just plug in
points and figure it out.
#exercise[
Find the highest and lowest points on $f(x,y) = 81x^2 + y^2$ with the
constraint $4x^2 + y^2 = 9$. Let the second function be $g(x,y)$, and keep in
mind our constraint is essentially the level set where $g(x,y) = 9$.
]
Intuition: consider a constraint $g(x,y) = x^2 + y^2$, and our constraint is
the level set where $g(x,y) = 25$. Notice, the gradient of $g$ is perpendicular
to its level set at any given point. So, when optimizing on a function $f$ that
is $g$-constrained, we are really looking for where the gradient of $g$ is
parallel to the gradient $f$. That is why we are using a scalar multiple
$lambda$ to relate them.
Computation:
We have $g(x,y) = 25$. We construct the equation
$
vec(f_x, f_y) = lambda vec(g_x,g_y)
$
This gives three equations
$
f_x = lambda g_x \
f_y = lambda g_y \
g(x,y) = 25
$
We find
$
y = 4 lambda^2 y
$
If $y != 0$, then $lambda = plus.minus 1/2$. So we're looking for points on the
circle, with radius 5, such that $x = plus.minus y$. This gives 4 points to
consider: $(plus.minus 5/sqrt(2), plus.minus 5/sqrt(2))$.
If $y = 0$, then $x$ is forced to be 0, and $(0,0)$ is not on the circle. So we
ignore it.
Now we just compare our four candidates and find the greatest (or least) for
optimization!

View file

@ -920,13 +920,15 @@ nonempty subsets of $A$ whose union is $A$.
== Functions
Let $A$ and $B$ be sets. A relation $R$ from $A$ to $B$ is a subset $R subset.eq A times B$.
#definition[
A *function* $f$ from $A$ to $B$ relates each element of $"Dom"(R)$ to to exactly
one element of $"Rng"(R)$.
A *function* $f$ is a relation from $A$ to $B$ such that
1. $"Dom"(f) = A$.
2. If $(x,y) in f$ and $(x,z) in f$ then $y = z$.
]
That is, every element in $A$ is related to exactly one element in $B$. Note
that (2) is the vertical line test.
#fact[
A function from $A$ to $B$ is written
$