auto-update(nvim): 2025-02-25 04:37:54
This commit is contained in:
parent
1d31bf8ebe
commit
ce5dc50006
2 changed files with 312 additions and 6 deletions
|
@ -772,5 +772,309 @@ $
|
|||
(dif) / (dif x) cosh(x) = sinh(x)
|
||||
$
|
||||
|
||||
It's pretty easy to show these using their definitions, and derive the
|
||||
derivative of $tanh$.
|
||||
It's pretty easy to show these using their definitions, and derive the derivative of $tanh$.
|
||||
|
||||
== End of weeks 1-4
|
||||
|
||||
That was all of the content of week 1 to 4. Now we shift to weeks 5-7, where we studied more about vectors and their derivatives.
|
||||
|
||||
== Partial derivatives
|
||||
|
||||
These are the slopes of tangent lines for the graph in the direction of the
|
||||
changing variable.
|
||||
|
||||
#theorem[Clairaut's theorem][
|
||||
Suppose $f : RR^2 -> RR$ is defined on a disk $D$ that contains a point
|
||||
$(a,b)$. If the functions $f_(x y)$ and $f_(y x)$ are ocntinuous on this disk
|
||||
then they are the same.
|
||||
]
|
||||
|
||||
#theorem[Extended Clairaut][
|
||||
Suppose $f : RR^2 -> RR$ is defined on a disk $D$ that contains $(a,b)$. If
|
||||
all of the mixed partial derivatives are continuous anywhere in the disk $D$,
|
||||
then the mixed partials are equal.
|
||||
]
|
||||
|
||||
== Multivariable chain rule
|
||||
|
||||
The product rule actually follows from it. Recall:
|
||||
$
|
||||
(f(x) g(x))' = f'(x) g(x) + f(x) g'(x)
|
||||
$
|
||||
Then instead let's replace $f$ and $g$ with $x$ and $y$, such that we have something like
|
||||
$
|
||||
z = x y
|
||||
$
|
||||
Then let $x$ and $y$ be functions of $t$. So
|
||||
$
|
||||
z = x(t) y(t)
|
||||
$
|
||||
The partial derivatives
|
||||
$
|
||||
(diff z) / (diff x) = y, (diff z) / (diff y) = x
|
||||
$
|
||||
By the multivariable chain rule,
|
||||
$
|
||||
(diff z) / (diff t) = x'(t) (diff z) / (diff x) + (diff z) / (diff y) y'(t) = x'(t) y(t) + x(t) y'(t)
|
||||
$
|
||||
|
||||
== Implicit differentiation
|
||||
|
||||
It's similar to the single variable implicit differentiation, but remember to
|
||||
hold the extraneous variables constant in practice.
|
||||
|
||||
#example[
|
||||
Suppose you have a surface
|
||||
$
|
||||
3x^2 + 5 y z + z^3 = 0
|
||||
$
|
||||
And you want a partial derivative $(diff y)/(diff z)$ at some point. You can
|
||||
use implicit differentiation by viewing the surface as a level set of some
|
||||
larger function $F(x,y,z) = 3x^2 + 5y z + z^3$ where $F(x,y,z) = 0$.
|
||||
|
||||
Now we differentiate both sides:
|
||||
$
|
||||
(diff F) / (diff z) = diff / (diff z)(3x^2) + diff / (diff z)(5y z) + diff / (diff z)(z^3) \
|
||||
= 0 + (5 (diff y) / (diff z) + 5y) + 3z^2
|
||||
$
|
||||
Then solve for $(diff y)/(diff z)$.
|
||||
]
|
||||
|
||||
== Multivariable chain rule as matrix
|
||||
|
||||
Consider $z = f(x,y)$. Then the derivative of $z$ with respect to $x$ and $y$ would be a matrix:
|
||||
$
|
||||
mat((diff z)/(diff x), (diff z)/(diff y))
|
||||
$
|
||||
|
||||
Now suppose the coordinate system changes to
|
||||
$
|
||||
x = 3u - v \
|
||||
y = 2v
|
||||
$
|
||||
Now suppose we want $z_u$ and $z_v$ at $(x,y) = (3,6)$. Then this is actually
|
||||
$(u,v) = (2,3)$ in $u v$ coordinates. The partials of $z$ with respect to $u$
|
||||
and $v$ are just matrix multiplication:
|
||||
$
|
||||
mat(z_u,z_v) = mat(z_x, z_y) mat(x_u,x_v;y_u,y_v)
|
||||
$
|
||||
|
||||
== Differentials
|
||||
|
||||
Differentials are about linear approximation. Recall in the single variable
|
||||
case we use the tangent line approximation and differentials to approximate
|
||||
functions. In the multivariable case it's the tangent plane approximation and
|
||||
the directional derivative.
|
||||
|
||||
Let $f$ be a function with two inputs and one output, say
|
||||
$
|
||||
f(x,y) = x^2 + x cos(y)
|
||||
$
|
||||
|
||||
Then the tangent plane at $(x_0,y_0)$ is the plane that best approximates the
|
||||
function at that point. Now we need two slopes instead of one.
|
||||
|
||||
Take a look at $(1,pi/2)$. Then the partial derivatives at that point are 2 and
|
||||
-1. The idea is we start at $(1,pi/2,1)$ and then move a slight nudge in either
|
||||
the $x$ or $y$ directions. In the $x$ directions, we move by $Delta x$ and get
|
||||
an increase in $z$ (height) of $2 dot Delta x$.
|
||||
|
||||
If we move a slight nudge in the $y$ direction $Delta y$, then our height
|
||||
should increase (decrease) by $-1 dot Delta y$.
|
||||
|
||||
Then we have a tangent plane approximation of
|
||||
$
|
||||
Delta z approx 2 dot Delta x - 1 dot Delta y
|
||||
$
|
||||
|
||||
And like in single variable, we can replace the $Delta$ with $dif$.
|
||||
$
|
||||
diff z approx 2 dot diff x - 1 dot diff y
|
||||
$
|
||||
Sidenote: how can we actually have the equation of the tangent plane in terms
|
||||
of $x$,$y$,$z$? Just note that $x = 1 + Delta x$, $y = pi/2 + Delta y$, and $z
|
||||
= 1 + Delta z$. So just substitute
|
||||
$
|
||||
(z - 1) = 2(x-1) - 1(y-pi / 2)
|
||||
$
|
||||
|
||||
So in general, the differential version
|
||||
$
|
||||
dif z = m_x dif x + m_y dif y
|
||||
$
|
||||
|
||||
and the tangent plane equation at $(x_0, y_0, z_0)$:
|
||||
$
|
||||
z = z_0 + m_x (x-x_0) + m_y (y-y_0)
|
||||
$
|
||||
|
||||
== Directional derivative
|
||||
|
||||
Now take another look at the linear approximation in $z$
|
||||
$
|
||||
Delta z approx m_x Delta x + m_y Delta y
|
||||
$
|
||||
|
||||
The directional derivative reinterprets this as matrix multiplication
|
||||
$
|
||||
Delta z approx mat(m_x,m_y) vec(Delta x, Delta y)
|
||||
$
|
||||
This is the same as a dot product, in fact, the dot product of the gradient
|
||||
given by $vec(m_x,m_y)$ and a tiny movement vector.
|
||||
|
||||
== Directional derivative
|
||||
|
||||
We mentioned this before, now let's discuss in more detail. If you move by
|
||||
$Delta x$ and $Delta y$, in $x$ and $y$ directions, then the direction
|
||||
derivative computes your change in height on the tangent plane.
|
||||
|
||||
Consider the question "What is the derivative of $f$" in the direction of the
|
||||
vector $vec(3,4)$?
|
||||
|
||||
Now consider $m_x (3) + m_y (4) = 3m_x + 4m_y$. This is almost the answer, but
|
||||
really this is the change in $f$ resulting from a movement in the direction of
|
||||
$vec(3,4)$. If we want the derivative, we're asking for the slope. We have
|
||||
rise, now run is $sqrt(3^2 + 4^2) = 5$, so the answer is $(3m_x + 4m_y)/5$.
|
||||
|
||||
A more intuitive way is to consider a unit vector $arrow(u) = 1/lr(|<3,4>|)
|
||||
<3,4>$ that points in the same direction. So now the "run" is simply 1. Clearly
|
||||
we get the same answer, but we have a good formula now
|
||||
|
||||
$
|
||||
"Direction derivative" = nabla arrow(f) dot arrow(u)
|
||||
$
|
||||
|
||||
where $arrow(u)$ is the *unit vector* in our desired direction.
|
||||
|
||||
A geometric interpretation is that the direction derivative in a given
|
||||
direction is just the gradient vector projected in that direction.
|
||||
|
||||
To recap:
|
||||
|
||||
We discuss two questions: what is the slope in a given direction, and what
|
||||
direction has the steepest slope?
|
||||
|
||||
Let $arrow(F)$ be a gradient vector of our partial derivatives and $arrow(v)$
|
||||
be the movement vector in the $x y$-plane. Let $arrow(u)$ be a unit vector
|
||||
pointing in the same direction.
|
||||
|
||||
The answer to the first question is $nabla arrow(f) dot arrow(u)$, where
|
||||
$arrow(u)$ is a unit vector in the given direction
|
||||
|
||||
The answer to the second question can be derived.
|
||||
|
||||
$
|
||||
arrow(F) dot arrow(u) = lr(|arrow(F)|) lr(|arrow(u)|) cos(theta)
|
||||
$
|
||||
and $theta$ is the angle between $arrow(F)$ and $arrow(u)$. So since $cos
|
||||
theta$ reaches maximum value at $theta = 0$, the maximum possible slope is
|
||||
actually in the same direction as $arrow(F)$ with a slope equal the magnitude
|
||||
of $arrow(F)$.
|
||||
|
||||
Takeaways:
|
||||
|
||||
- Movement in the direction of the gradient vector gives the "steepest" ascent of the function
|
||||
- Movement perpendicular to the gradient has slope of 0
|
||||
- Movement in the opposite direction of the gradient has maximum negative slope (sharpest descent) with same magnitude as gradient
|
||||
- Any directional derivative in between can be calculated as a projection from the gradient
|
||||
|
||||
== Optimization
|
||||
|
||||
We spoke previously about the Lagrange multiplier. Now we discuss it in greater
|
||||
detail.
|
||||
|
||||
When optimizing in two dimensions, we either optimize for all of $RR^2$, or on
|
||||
a constraint in $RR^2$ (such as a curve). For the first case, we use critical
|
||||
points. For the second, *Lagrange multipliers*.
|
||||
|
||||
== Critical points
|
||||
|
||||
Critical points occur when the gradient is zero or undefined. Both partials are
|
||||
zero *or* at least one of them isn't defined.
|
||||
|
||||
Essentially, they occur when the tangent plane is flat. We can't just look at
|
||||
$f''(x)$ like in single variable calculus, but we can take the determinant of
|
||||
the second order partials for some sort of multivariable concavity. It measures
|
||||
how much the pure partial derivatives dominate the mixed partial derivatives,
|
||||
and they need to dominant to a certain extent such that there is consistently
|
||||
upward or downward curvature in every direction.
|
||||
|
||||
We find the critical points when $m_x = 0$ and $m_y = 0$ or either are
|
||||
undefined. Then we classify them as follows.
|
||||
|
||||
Recall second derivative test for single variable function, now consider the
|
||||
two-variable case.
|
||||
|
||||
$
|
||||
Dif (x_0,y_0) = det mat(f_(x x) (x_0, y_0), f_(x y) (x_0, y_0); f_(y x) (x_0,y_0), f_(y y) (x_0, y_0)) = f_(x x) (x_0, y_0) f_(y y) (x_0, y_0) - f_(x y) (x_0, y_0)^2
|
||||
$
|
||||
|
||||
- If $Dif > 0$ and $f_(x x) (x_0, y_0) > 0$, then $f$ is a relative minimum.
|
||||
- If $Dif > 0$ and $f_(x x) (x_0, y_0) < 0$, then $f$ is a relative maximum.
|
||||
- If $Dif < 0$, then$f(x_0, y_0)$ is neither and it's a saddle point.
|
||||
- If $Dif = 0$, then we don't know
|
||||
|
||||
== Lagrange multipliers
|
||||
|
||||
We discuss optimization on a restricted curve in our domain.
|
||||
|
||||
Idea: we should navigate along the curve and find where the direction
|
||||
derivative is 0. Recall that this is the same as when the velocity vector is
|
||||
perpendicular to the gradient. The issue is that we always have to parametrize
|
||||
the curve.
|
||||
|
||||
To avoid this, Lagrange multipliers views the (implicit) constraint equation as
|
||||
a level curve of another surface. If we take the gradient of that surface
|
||||
everywhere on the level curve, then that gradient is parallel to the original
|
||||
function's gradient at critical points.
|
||||
|
||||
So, we should be able to take the gradients of both the function and the
|
||||
constraint function, and look for when one is a scalar multiple of the other.
|
||||
|
||||
Consider a function $f$ and a constraint $g$. Then we compute $nabla f$ and
|
||||
$nabla g$, then solve for when $nabla f = lambda nabla g$. Then we can just plug in
|
||||
points and figure it out.
|
||||
|
||||
#exercise[
|
||||
Find the highest and lowest points on $f(x,y) = 81x^2 + y^2$ with the
|
||||
constraint $4x^2 + y^2 = 9$. Let the second function be $g(x,y)$, and keep in
|
||||
mind our constraint is essentially the level set where $g(x,y) = 9$.
|
||||
]
|
||||
|
||||
Intuition: consider a constraint $g(x,y) = x^2 + y^2$, and our constraint is
|
||||
the level set where $g(x,y) = 25$. Notice, the gradient of $g$ is perpendicular
|
||||
to its level set at any given point. So, when optimizing on a function $f$ that
|
||||
is $g$-constrained, we are really looking for where the gradient of $g$ is
|
||||
parallel to the gradient $f$. That is why we are using a scalar multiple
|
||||
$lambda$ to relate them.
|
||||
|
||||
Computation:
|
||||
|
||||
We have $g(x,y) = 25$. We construct the equation
|
||||
|
||||
$
|
||||
vec(f_x, f_y) = lambda vec(g_x,g_y)
|
||||
$
|
||||
|
||||
This gives three equations
|
||||
|
||||
$
|
||||
f_x = lambda g_x \
|
||||
f_y = lambda g_y \
|
||||
g(x,y) = 25
|
||||
$
|
||||
|
||||
We find
|
||||
$
|
||||
y = 4 lambda^2 y
|
||||
$
|
||||
|
||||
If $y != 0$, then $lambda = plus.minus 1/2$. So we're looking for points on the
|
||||
circle, with radius 5, such that $x = plus.minus y$. This gives 4 points to
|
||||
consider: $(plus.minus 5/sqrt(2), plus.minus 5/sqrt(2))$.
|
||||
|
||||
If $y = 0$, then $x$ is forced to be 0, and $(0,0)$ is not on the circle. So we
|
||||
ignore it.
|
||||
|
||||
Now we just compare our four candidates and find the greatest (or least) for
|
||||
optimization!
|
||||
|
|
|
@ -920,13 +920,15 @@ nonempty subsets of $A$ whose union is $A$.
|
|||
|
||||
== Functions
|
||||
|
||||
Let $A$ and $B$ be sets. A relation $R$ from $A$ to $B$ is a subset $R subset.eq A times B$.
|
||||
|
||||
#definition[
|
||||
A *function* $f$ from $A$ to $B$ relates each element of $"Dom"(R)$ to to exactly
|
||||
one element of $"Rng"(R)$.
|
||||
A *function* $f$ is a relation from $A$ to $B$ such that
|
||||
1. $"Dom"(f) = A$.
|
||||
2. If $(x,y) in f$ and $(x,z) in f$ then $y = z$.
|
||||
]
|
||||
|
||||
That is, every element in $A$ is related to exactly one element in $B$. Note
|
||||
that (2) is the vertical line test.
|
||||
|
||||
#fact[
|
||||
A function from $A$ to $B$ is written
|
||||
$
|
||||
|
|
Loading…
Reference in a new issue