alexandria/documents/by-course/math-6a/course-notes/main.typ

#import "@youwen/zen:0.1.0": *
#import "@preview/cetz:0.3.1"

#set math.equation(numbering: "(1)")
#show math.equation: it => {
  if it.block and not it.has("label") [
    #counter(math.equation).update(v => v - 1)
    #math.equation(it.body, block: true, numbering: none)#label("")
  ] else {
    it
  }
}

#show: zen.with(
  title: "Math 6A Course Notes",
  author: "Youwen Wu",
  date: "Winter 2025",
  subtitle: [Taught by Nathan Schley],
)

#outline()

= Lecture #datetime(day: 7, month: 1, year: 2025).display()

== Review of fundamental concepts

You can parameterize curves.

#example[Unit circle][
  $
    x = cos(t) \
    y = sin(t)
  $
]

For an implicit equation
$ y = f(t) $
Parameterize it by setting
$ x = t \ y = f(t) $

Parameterize a line passing through two points $arrow(p)_1$ and $arrow(p)_2$ by
$ arrow(c)(t) = arrow(p)_1 + t (arrow(p)_2 - arrow(p)_1) $

Take the derivative of each component to find the velocity vector. The
magnitude of velocity is speed.

#example[
  $
    arrow(c)(t) = <5t, sin(t)> \
    arrow(v)(t) = <5, cos(t)>
  $
]

== Polar coordinates

Write a set of Cartesian coordinates in $RR^2$ as polar coordinates instead, by
a distance from origin $r$ and angle about the origin $theta$.

$ (x,y) -> (r, theta) $

= Lecture #datetime(day: 9, month: 1, year: 2025).display()

== Vectors

A dot product of two vectors is a generalization of the sense of size for a
point or vector.

#example[
  How far is the point $x_1, x_2, x_3$ from the origin? \
  Answer: $x_1^2 + x_2^2 + x_3^2$
]

#definition[
  For vectors $u$ and $v$, where
  $ v = vec(v_1, v_2, dots.v, n), u = vec(u_1, u_2, dots.v, n) $
  The dot product is defined as
  $ sum_(i=1)^n v_i dot u_i $
]

#proposition[
  The dot product of two vectors is the product of their magnitudes and the cosine of the angle between.

  $ arrow(v) dot arrow(w) = ||arrow(v)|| dot ||arrow(w)|| cos theta $
]

= Lecture #datetime(day: 23, month: 1, year: 2025).display()

Midterm is next Thursday in class!

== Arclength and curvature

Easy way of finding curvature: reparameterize curve with speed 1, then
curvature is acceleration. If we can't do that then we need some other
technique.

Given $arrow(c)(t) = <2t^(-1), 6, 2t>$, find the curvature $kappa(t)$.
$
  kappa (t) = (||arrow(c)'(t) times arrow(c)''(t)||) / (||arrow(c)'(t)||^3)
$

== Arclength parametrization

Find an arc-length parametrization of $arrow(c)(t) = <e^t sin(t), e^t cos(t), 5e^t>$.

Let $s = 0$ when $t = 0$ and let $s$ be the arc-length that has traveled along
the curve after $t$ seconds, then we can find $s$ by integrating the curve's
speed over $t$.

$
  s(t) = integral^t_0 ||arrow(c)'(u)|| dif u
$

= Lecture #datetime(day: 12, year: 2025, month: 2).display()

== Chain rule for multivariate functions

We find motivation for the chain rule.

Consider a hiker whose path is given by

$
  arrow(c) (t) = <x(t), y(t)>
$

and

$
  f(x,y) = x dot y
$

What does $x'(t)$ represent? Speed in $x$-direction. Likewise for $y'(t)$.

Say $x'(t) = 3$, $y'(t) = 4$. Then how far did we travel in $t$ seconds?

Suppose our slope in the $x$ direction is given by $m_x = 2$. Suppose the slope
in $y$ is $m_y = -2$. In fact $m_x = f_x (x,y)$ and $m_y = f_y (x,y)$ (here
$f_k$ is the partial derivative with respect to $k$).

So each change in $t$ of 1 leads to a change in elevation up 6 meters in
$x$-axis and down 8 meters in $y$-axis.

So the total change $Delta z$ is given by
$
  Delta z = m_x dot Delta x + m_y dot Delta y
$
and analogously in calculus land

$
  (dif z) / (dif t) = (diff f) / (diff x) dot x'(t) + (diff f) / (diff y) dot y'(t)
$<chain-rule>

In fact @chain-rule is the chain rule.

#fact[
  $
    (dif f) / (dif t) = (diff f) / (diff x) dot (diff x) / (diff t) + (diff f) / (diff y) dot (diff y) / (diff t) + (diff f) / (diff z) dot (diff z) / (diff t)
  $
]

#example[
  Consider $f(x) = x^x$. What is $f'(x)$?

  We can do this with logarithmic differentiation but we can also do this with the multivariable chain rule.

  $
    f(x,y) =
  $
]

#example[
  Find the derivative $dif/(dif t) (f(x,y))$, where $f(x,y) = x^y$, $x(t) = t$,
  and $y(t) = 1$. Assume $t > 0$.
]

#example[
  Find the partial derivative $diff/(diff s) f(x,y,z)$ where $f(x,y,z) = x^2 y^2 + z^3$, and

  $
    x(s,t) = s t \
    y(s, t) = s^2 t \
    z(s,t) = s t^2
  $
]

== Implicit differentiation

Review from single variable: given $f(x,y)$ we can differentiate each term with
respect to $x$, then collect all $(dif y)/(dif x)$ terms together and solve for
it as a variable to obtain $(dif y)/(dif x) = f'(x,y)$.

We do something similar for more variables. Main idea: extraneous variables are
held constant in practice.

Example: consider the surface $3x^2 + 5y z + z^3 = 0$. We want $(diff y)/(diff
z)$ at some point. Use implicit differentiation by viewing the surface as a
level set of some larger function $F(x,y,z) = 3x^2 + 5y z + x^3$ (the level set
part is when $F(x,y,z) = 0$).

By applying the product rule (really the chain rule @chain-rule)
$
  (diff F) / (diff x) = diff / (diff z) (3x^2 + 5 y z + z^3) = diff / (diff z) z^3 = 0 + (5 (diff y) / (diff x) z + 5y) + 3z^2 \
  (diff y) / (diff z) = - (5y + 3z^2) / (5z)
$

= Lecture #datetime(day: 18, year: 2025, month: 2).display()

== Critical points

When optimizing in 2D, the strategy depends on whether we're

- optimizing for all of $RR^2$ (or a region in $RR^2$)
- optimizing on a constraint (like a curve through $RR^n$)

We find critical points where the tangent plane is "flat": $m_x = 0$ and $m_y =
0$.

We classify critical points using the determinant of the gradient.

$
  D = f_(x x) f_(y y) = f_(x y)^2
$

- if $D >0$ and $f_(x x) (x_0, y_0) > 0$, then $f(x_0, y_0)$ is a relative minimum.
- if $D > 0$ and $f_(x x) (x_0, y_0) < 0$, $f(x_0, y_0)$ is a relative maximum.
- if $D < 0$ then $f(x_0, y_0)$ is neither and we call it a saddle point.
- if $D = 0$ then we don't know

== Lagrange multipliers

Optimizing constrained curves. Idea: navigate along the curve and look for where
the directional derivative is zero.

#example[
  Find the highest and lowest points on $f(x,y) = 81x^2 + y^2$ with the
  constraint $x^2 + y^2 = 0$.

  For notational purposes, we'll call $g(x,y) = 4x^2 + y^2$ and keep in mind
  we're looking for $g(x,y) = 9$.

  1. Find the gradients of $f$ and $g$.
    $
      vec(f_x,f_g) &= vec(162x,2y) \
      vec(g_x,g_y) &= vec(8x,2y)
    $
  2. We want to find the points where the gradients "align", i.e. we want these
  vectors to be parallel, that is:
  $
    arrow(F) = lambda arrow(G) \
    162x = 8x dot lambda \
    2y = 2y dot lambda
  $
  Remember to keep the constraint!
  $
    4x^2 + y^2 = 9
  $
  Breaking it down into cases,
  $
    162 = 8lambda => lambda = 81 / 4 "for" x != 0
  $
  which implies
  $
    2y (lambda - 1) 0 \
    y = 0
  $
  since $lambda - 1$ is nonzero. So if $x$ is nonzero, $y$ must be zero.

  $
    4x^2 = 9 \
    x = plus.minus 3 / 2
  $
  Now consider when $x = 0$. Then $y = plus.minus 3$. So our critical points
  are $(plus.minus 3/2, plus.minus 3)$. Finally, just plug in these 4
  critical points into $f$ and find the biggest/smallest.

]

= Speedrun

In this chapter I wrote up notes for the entirety of the course, starting from
week 1, ending at week 8, because I skipped 80% of the classes up to the
midterm.

== Vector review

We know about functions $y = f(x)$. We can parameterize functions by expressing
them as a pair of coordinates $(x(t), y(t))$, modeling for example a particle
traveling through space with respect to time.

=== Derivative of parameterized curve

Given $(x(t), y(t))$, the derivative is given
$
  (x'(t), y'(t))
$

=== Parameterizing an ellipse

Consider an ellipse

$
  x^2 / n + y^2 / m = r^2
$

Then we note that this is just a circle with $x$ stretched by a factor of
$sqrt(n)$ and likewise for $y$ by a factor of $sqrt(m)$. Then we can
parameterize the ellipse by

$
  (r sqrt(n) cos(theta), r sqrt(m) sin(theta))
$

A sanity check: when $n = 1$, $m = 1$, $r = 1$, we have a unit circle and the
parametrization reflects that.

#example[
  Consider the ellipse $x^2/9 + y^2/4 = 16$. Then the parametrization is
  $(12cos(theta), 8sin(theta))$, and this is indeed right.
]

To parameterize a line passing through points $arrow(p)_1$ and $arrow(p)_2$,
simply
$
  arrow(c) (t) = arrow(p)_1 + t(arrow(p)_2 - arrow(p)_1)
$
Algebraically we can justify this by noting $t=0$ gives $arrow(p)_1$ and $t=1$ gives $arrow(p)_2$.

=== Polar coordinates

Notation: $(r,theta)$ instead of $(x,y)$. Note that this is just highlighting
that we're parameterizing a curve in terms of a radius $r$ (also called
_modulus_)and argument (angle) $theta$.

To get $r$, see that $r^2 = x^2 + y^2$ and it follows that $theta =
arctan(y/x)$. Blah blah.

To plot in terms of $x,y$, note that
$
  x = r cos(theta) \
  y = r sin(theta)
$

=== Vector properties

Many nonrigorous statements about vectors to add to our toolbox.

Two vectors are parallel if and only if the angle formed between them is 0.
Vectors can be added in linear combinations.

The magnitude of a vector length $n$ is given by the $n$ dimensional
Pythagorean theorem.

$
  sqrt(a_1^2 + a_2^2 + dots.c + a_n^2)
$

A unit vector is a vector with magnitude 1.

=== Dot product

Dot products are useful for seeing properties about orthogonality, parallelism,
and projection.

#definition[
  The dot product takes two vectors and returns a single scalar. It is sometimes
  called the inner product.
]

We can view the dot product algebraically, and geometrically. In the first
sense, it's just the sum of the products of each pair of coordinates in the
vectors.

Let $A,B$ be vectors length $n$, and $a_i, b_i$ be the $i^"th"$ entry of their
respective vectors, then
$
  A dot B = a_1 dot b_1 + a_2 dot b_2 + dots.c + a_n dot b_n
$

Geometrically, it's the product of the magnitudes of the vectors and the cosine
of angle between them.

$
  A dot B = |A| dot |B| dot cos(theta)
$

where $theta$ is the angle between the vectors. It's nontrivial to prove this
and we don't have time.

Therefore, we know two vectors are parallel when $A dot B = |A| dot |B|$,
because $cos(theta) = 1$. We know two vectors are orthogonal if the dot product
is 0, because $cos(pi/2 + pi n)$ is 1.

Dot products are very useful for projection. Pick a normal vector $arrow(u)$.
For any vector $arrow(w)$, $arrow(u) dot arrow(w)$ gives the size of the
parallel part.

Also, note this gives the shortest distance between the tip of $arrow(w)$ and
the line passing in the direction of $arrow(u)$!

=== Normalizing vectors

Normalizing a vector means obtaining a vector pointing in the same direction,
with magnitude 1. For a nonzero vector $v$, its normalized vector is given

$
  v / (|v|)
$

where $|v|$ is the magnitude.

=== Cross product

The cross product $times$ is a binary operation on vectors. For our purposes
it's only defined in $RR^3$ (in fact it's defined in some other dimensions, but
in general it is not defined).

It produces a third vector perpendicular to both original vectors. Its
direction is determined by the right hand rule.

The magnitude of the cross product is given by

$
  |a times b| = |a| |b| sin theta
$

where $theta$ is the angle between $a$ and $b$. So we know two vectors are
parallel if the magnitude of the cross product is 0.

In fact this magnitude is also the area of the parallelegram spanned by the two
vectors. This is an alternative way to view the cross product being 0 implying
the vectors parallel.

We can compute the cross product by taking a determinant.

$
  a times b = det mat(i,j,k; a_1, a_2, a_3; b_1,b_2,b_3)
$

The cross product is anticommutative, so $a times b = -(b times a)$.

== Vector applications, building geometric intuition

Let's build some geometric intuition for working with vectors, especially in
$RR^3$.

=== Moving the equation of a line or plane while maintaining orientation

We consider two cases. If we're working with a parametric equation, then we can
just add a vector to the equation to shift everything by said vector.

Otherwise, with an implicit equation, let's consider only the plane (since we
need two implicit equations to specify a line, and at that point it's better to
solve the system and parameterize).

The plane equation $a x + b y + c z = r$ can be shifted $n$ units in the
positive $x$, $y$, or $z$ direction by replacing all $x$, $y$, or $z$ with $x -
n$ and so on. Then we can just multiply out collect terms.

=== Moving equation of a line/plane to pass through a specific point

We want to do this without changing direction or orientation. We can just use
our technique discussed above for this.

First let's make sure the plane passes through the origin. If we have
$
  3x - 2y + 7z = 12
$
we can just set the right hand to $0$
$
  3x - 2y + 7z = 0
$
Note that by our technique of shifting the plane or line, we see that the
constant on the right side is determined entirely by shifts in space that
preserve direction/orientation. So we are sure that setting it to 0 does
nothing but move the plane/line through 0.

=== Equation of a line through a given point perpendicular to a
plane

Let's say we have a point $(5,2,3)$. Let's consider both parametric planes and
implicitly defined planes.

Suppose the plane is given by
$
  2x + 3y + 4z = 12
$
Then observe that we need the line to be perpendicular to the plane but it
doesn't really matter where the plane is. Recall that we can easily shift a
plane around while preserving orientation. So let's just move the plane through
the origin again.

$
  2x + 3y + 4z = 0
$

Now note that we can obtain a perpendicular vector to the plane by finding a
vector perpendicular to any particular vector on this plane.

Then note that $vec(2,3,4)$ is one such perpendicular vector. See this by
$
  vec(2,3,4) dot vec(x,y,z) = 2x + 3y + 4z = 0
$
Then we can simply scale our perpendicular vector by a parameter $t$ to obtain
a parametric line that's perpendicular to the plane. Now we can just shift it
by our desired point, and it remains orthogonal while passing through the line
(at $t = 0$).

$
  vec(5,2,3) + t vec(2,3,4)
$

Now consider when we have a parametric equation, say
$
  vec(5,0,0) + s vec(2,0,-1) + t vec(0,4,-3)
$
Then as long as we're perpendicular to both of the vectors being multiplied by
$s$ and $t$, we're perpendicular. This is easy to show, just note that the
plane is given by the span of the basis vectors $vec(2,0,-1)$ (shifted by
$vec(5,0,0)$) and $vec(0,4,-3)$, so any vector perpendicular to both is
perpendicular to the entire plane.

So just take their cross product to get a desired vector.

=== Distance between point and a plane

Think geometrically. We really want to move in a perpendicular line from the
plane to the point (because that's the closest distance between them). We
should start on the point, but where do we stop on the plane?

Consider
$
  4x + y + 3z = 1
$
and we want the distance to $(1,1,-5)$. The perpendicular line passing through the point is
$
  vec(1,1,-5) + t vec(4,1,3)
$
The line "starts" at the point $t=0$, so let's find a value of $t$ that makes
it stop precisely on the plane. To do this, simply note that our line is really a parametric equation
$(x,y,z)$ where
$x = 1 + 4t, y = 1 + t, z = -5 + 3t$. Then we can simply plug these into the
equation of the plane and solve for $t$ to get the value of $t$ where the line
meets the plane. Then plug $t$ into our line equation (which gives a vector)
and the magnitude is the distance between the point and plane.

=== Area of a parallelogram formed by two vectors in $RR^3$

In $RR^2$ we can take the determinant. In $RR^3$ the determinant is the volume
of the parallelepiped. So instead we just take the magnitude of the cross
product.

=== Distance from point to line passing through two other points in $RR^2$

Note that there are multiple ways to do this. Let $P = (1,7)$, $A = (1,1)$, and
$B = (3,9)$. We want the distance from $P$ to the line between $A$ and $B$.

We could just find the line between them and then use a 2-dimensional version
of our point to plane technique (solve for a vector orthogonal to the line, in
the direction of $P$, passing through $P$), but since we're in $RR^2$, we can
just project $P$ onto the normalized line and do some stuff.

In particular, note that the magnitude of the cross product of $A times B$ is
$|A| dot |B| dot sin theta$. So if we want the distance from the tip of $B$ to
the line spanned by $A$, we should do $(|A times B|)/(|A|)$.

If instead we want the length of the projection of $B$ onto $A$, we should do
$(A dot B)/(|A|)$. There are multiple ways to interpret this geometrically.

== Derivative of a curve

What is the derivative of a curve? We can view the derivative at some point $x$
as the slope of the tangent line. But that doesn't give the derivative of a
parametric curve traveling through the plane.

However, this is simple. Because our curve is parameterized, each coordinate
$x,y,z$ and so on is independent of each other and given by $t$. Therefore, we
can collect another vector, taking the derivative of each coordinate, which
gives us a vector of the rates at which each coordinate is changing.

== Arclength parametrization

We want to find an arc length parametrization of a curve. That is, we want to
express a curve in terms of how far we've traveled on it.

Idea: let $s=0$ when $t=0$, and let $s$ be the arclength traveled after $t$
seconds. Then we can integrate the curve's speed over $t$ to find the arc
length.

$
  s(t) = integral_0^t ||arrow(c)'(u)|| dif u
$

Then, we can solve for $t$ in terms of $s$, and plug it back into our original
vector in terms of $t$, $arrow(c)(t)$. Then its position will be expressed by
in terms of $s$, $arrow(c)(s)$, and we'll have a parametrization by arc
length.

A key notion here is now the velocity vectors are tangent, but also unit length
(since we should imagine that we are always moving at unit speed along the
curve at any given point).

What direction does the _acceleration_ vector point in, then?

For a parameterized curve $arrow(c)(t)$ with velocity $arrow(v)(t)$ and
acceleration $arrow(a)(t)$, then the speed is magnitude $|arrow(v)(t)|$. When
the speed is constant, $|arrow(v)(t)|$ doesn't change with time.

A prototypical example: consider uniform circular motion. Then the angular
velocity (speed) is always constant, yet there is always an acceleration vector
pointing perpendicular to the tangent velocity vector (the centripetal
acceleration).

== Curvature

The curvature in $RR^2$ is given by the second derivative. But this is just a
lucky coincidence. Let's think about the notion of curvature.

Somehow, the curvature measures the best-fitting second order approximation of
a curve. Curvature is a measure of concavity with respect to the direction
perpendicular to the direction of motion.

We do have a formula

$
  kappa(t) = (|arrow(c)'(t) times arrow(c)'' (t)|) / (|arrow(c)'(t)|^3)
$

== Building intuition for curvature

Curvature is essentially asking how closely our curve resembles a unit circle
at a given point. It follows that a unit circle has a curvature 1, and a
straight line has curvature 0.

Let's consider a parametric curve
$
  arrow(s)(t) = vec(t-sin(t), 1-cos(t))
$
Consider the unit tangent vectors to the curve at some points. We're
essentially asking "how much do these tangent vectors change direction?" and
considering points arbitrarily close. This is the essence of curvature.

Now let's think about some geometric intuition. Suppose you're moving along a
curve. It follows that if the curve is bending very sharply, the tangent
vectors are changing direction very fast, in larger increments. Likewise, if
the curve is straighter, the tangent vectors change direction less often. And
when you're traveling on a straight line, the tangent vectors don't change
direction at all.

We want a mathematical model of this notion of "changing tangent vectors." The
idea is that we can capture this with some sort of derivative, but with respect
to what? If we just want to capture when the tangent vector is changing
directions, we clearly want to ignore any change in the actual magnitude of the
tangent vector itself, since this has no bearing on the directional change. Put
another way, if you're traveling along the curved path, the speed at which you
go (the magnitude of the tangent velocity vector) really doesn't matter with
regard to curvature. We only care when the tangent velocity vector changes
direction!

So suppose $T$ is the unit tangent vector at each point. We want the rate of
change of $T$, its derivative, but _not_ $(dif T)/(dif t)$, with respect to
time. This is because we don't really care how _fast_ the tangent vector
changes with respect to time, curvature is about measuring how much the tangent
vector changes as we move some arbitrary distance on the curve!

Instead, we really want $(dif T)/(dif s)$, where $s$ is the arc length we've
traveled so far (from some arbitrarily chosen starting point). And this makes
intuitive sense, because we just want how fast the tangent vector changes
direction with regards to the distance we travel on the curve.

Now we may note that we can actually find curvature if we can find an
arc-length parametrization of $arrow(s)$! Because an arclength
parametrization always has unit speed, its derivative gives the tangent
vectors at every point, and we can differentiate with respect to arclength and
take the magnitude to obtain the curvature. That is,
$
  kappa = abs((dif T) / (dif s))
$

But if we can't find an arclength parametrization, we're out of luck. Let's
continue investigating.

Consider a prototypical example:

#example[
  Let $arrow(s)(t) = vec(cos(t) R, sin(t) R)$. We're drawing a circle with
  radius $R$.

  Let's differentiate with respect to $t$.
  $
    arrow(s)'(t) = vec(-sin(t) R, cos(t) R)
  $
  But we want unit tangent vectors, so let's normalize it. Call our unit
  tangent $T$.

  $
    T(t) = (arrow(s)'(t)) / (|arrow(s)'(t)|)
  $
  We have
  $
    |arrow(s)'(t)| = lr(|vec(-sin(t) R, cos(t) R)|) \
    = sqrt(sin^2(t) R^2 + cos^2(t) R^2) = R
  $
  Now we have our unit tangent vectors in terms of $t$.
  $
    T(t) = (arrow(s)'(t)) / R
  $
  We take
  $
    (dif T) / (dif t) = vec(-cos(t), -sin(t))
  $
  and the magnitude of this is just 1.

  We should immediately note that not all cases will be so easy. When taking
  $|arrow(s)'(t)|$, in general, we have a very disgusting square root that cannot
  be simplified.

  Now note:
  $
    abs((dif T)/(dif s)) = abs((dif T)/(dif t)) / abs((dif T)/(dif s))
  $

  So in fact $kappa = 1/R$!
]

The key here is this equation:
$
  abs((dif T)/(dif s)) = abs((dif T)/(dif t)) / abs((dif s)/(dif t))
$
Although we didn't have an arclength parametrization of $T$, we note that its
magnitude is essentially given by the magnitude at which it's changing with
respect to time, and divided by the rate the curve is moving to "correct" for
the discrepancies introduced by taking the derivative with respect to time!

Obviously this is very nonrigorous. But I'm running out of time.

Now if we do a bunch more reasoning and nonsense we can obtain the formula
above, but at this point the goal seems to have been reached. We have an
intuitive understanding of what curvature should measure.

So recall the formula:

$
  kappa(t) = (|arrow(c)'(t) times arrow(c)'' (t)|) / (|arrow(c)'(t)|^3)
$

Essentially, it's saying that the area of the parallelogram formed by the
curve's velocity and acceleration vectors, divided by the cube of the speed,
gives us the curvature. Intuitively we see that the more the acceleration
vector diverges from the velocity vector, the sharper the velocity vector is
changing direction, which gives us a notion of curvature. And somehow dividing
by the speed cubed is normalizing out any influence due to speed to give us our
curvature.

== Quadric surfaces

We really only need to know the identities and derivatives to do some integral
hacks.

The quadric surface is the generalization of the conic section to $n$ dimensions.

Now recall that one conic section is the hyperbola. It turns out we can define
analogues of the trigonometric functions that parameterize a so-called unit
hyperbola instead of the unit circle. These functions are

$
  sinh(x) = (e^x -e^(-x)) / 2 \
  cos(x) = (e^x + e^(-x)) / 2 \
  tanh(x) = sinh(x) / cosh(x)
$

The derivatives are

$
  (dif) / (dif x) sinh(x) = cosh(x) \
  (dif) / (dif x) cosh(x) = sinh(x)
$

It's pretty easy to show these using their definitions, and derive the derivative of $tanh$.

== End of weeks 1-4

That was all of the content of week 1 to 4. Now we shift to weeks 5-7, where we studied more about vectors and their derivatives.

== Partial derivatives

These are the slopes of tangent lines for the graph in the direction of the
changing variable.

#theorem[Clairaut's theorem][
  Suppose $f : RR^2 -> RR$ is defined on a disk $D$ that contains a point
  $(a,b)$. If the functions $f_(x y)$ and $f_(y x)$ are ocntinuous on this disk
  then they are the same.
]

#theorem[Extended Clairaut][
  Suppose $f : RR^2 -> RR$ is defined on a disk $D$ that contains $(a,b)$. If
  all of the mixed partial derivatives are continuous anywhere in the disk $D$,
  then the mixed partials are equal.
]

== Multivariable chain rule

The product rule actually follows from it. Recall:
$
  (f(x) g(x))' = f'(x) g(x) + f(x) g'(x)
$
Then instead let's replace $f$ and $g$ with $x$ and $y$, such that we have something like
$
  z = x y
$
Then let $x$ and $y$ be functions of $t$. So
$
  z = x(t) y(t)
$
The partial derivatives
$
  (diff z) / (diff x) = y, (diff z) / (diff y) = x
$
By the multivariable chain rule,
$
  (diff z) / (diff t) = x'(t) (diff z) / (diff x) + (diff z) / (diff y) y'(t) = x'(t) y(t) + x(t) y'(t)
$

== Implicit differentiation

It's similar to the single variable implicit differentiation, but remember to
hold the extraneous variables constant in practice.

#example[
  Suppose you have a surface
  $
    3x^2 + 5 y z + z^3 = 0
  $
  And you want a partial derivative $(diff y)/(diff z)$ at some point. You can
  use implicit differentiation by viewing the surface as a level set of some
  larger function $F(x,y,z) = 3x^2 + 5y z + z^3$ where $F(x,y,z) = 0$.

  Now we differentiate both sides:
  $
    (diff F) / (diff z) = diff / (diff z)(3x^2) + diff / (diff z)(5y z) + diff / (diff z)(z^3) \
    = 0 + (5 (diff y) / (diff z) + 5y) + 3z^2
  $
  Then solve for $(diff y)/(diff z)$.
]

== Multivariable chain rule as matrix

Consider $z = f(x,y)$. Then the derivative of $z$ with respect to $x$ and $y$ would be a matrix:
$
  mat((diff z)/(diff x), (diff z)/(diff y))
$

Now suppose the coordinate system changes to
$
  x = 3u - v \
  y = 2v
$
Now suppose we want $z_u$ and $z_v$ at $(x,y) = (3,6)$. Then this is actually
$(u,v) = (2,3)$ in $u v$ coordinates. The partials of $z$ with respect to $u$
and $v$ are just matrix multiplication:
$
  mat(z_u,z_v) = mat(z_x, z_y) mat(x_u,x_v;y_u,y_v)
$

== Differentials

Differentials are about linear approximation. Recall in the single variable
case we use the tangent line approximation and differentials to approximate
functions. In the multivariable case it's the tangent plane approximation and
the directional derivative.

Let $f$ be a function with two inputs and one output, say
$
  f(x,y) = x^2 + x cos(y)
$

Then the tangent plane at $(x_0,y_0)$ is the plane that best approximates the
function at that point. Now we need two slopes instead of one.

Take a look at $(1,pi/2)$. Then the partial derivatives at that point are 2 and
-1. The idea is we start at $(1,pi/2,1)$ and then move a slight nudge in either
the $x$ or $y$ directions. In the $x$ directions, we move by $Delta x$ and get
an increase in $z$ (height) of $2 dot Delta x$.

If we move a slight nudge in the $y$ direction $Delta y$, then our height
should increase (decrease) by $-1 dot Delta y$.

Then we have a tangent plane approximation of
$
  Delta z approx 2 dot Delta x - 1 dot Delta y
$

And like in single variable, we can replace the $Delta$ with $dif$.
$
  diff z approx 2 dot diff x - 1 dot diff y
$
Sidenote: how can we actually have the equation of the tangent plane in terms
of $x$,$y$,$z$? Just note that $x = 1 + Delta x$, $y = pi/2 + Delta y$, and $z
= 1 + Delta z$. So just substitute
$
  (z - 1) = 2(x-1) - 1(y-pi / 2)
$

So in general, the differential version
$
  dif z = m_x dif x + m_y dif y
$

and the tangent plane equation at $(x_0, y_0, z_0)$:
$
  z = z_0 + m_x (x-x_0) + m_y (y-y_0)
$

== Directional derivative

Now take another look at the linear approximation in $z$
$
  Delta z approx m_x Delta x + m_y Delta y
$

The directional derivative reinterprets this as matrix multiplication
$
  Delta z approx mat(m_x,m_y) vec(Delta x, Delta y)
$
This is the same as a dot product, in fact, the dot product of the gradient
given by $vec(m_x,m_y)$ and a tiny movement vector.

== Directional derivative

We mentioned this before, now let's discuss in more detail. If you move by
$Delta x$ and $Delta y$, in $x$ and $y$ directions, then the direction
derivative computes your change in height on the tangent plane.

Consider the question "What is the derivative of $f$" in the direction of the
vector $vec(3,4)$?

Now consider $m_x (3) + m_y (4) = 3m_x + 4m_y$. This is almost the answer, but
really this is the change in $f$ resulting from a movement in the direction of
$vec(3,4)$. If we want the derivative, we're asking for the slope. We have
rise, now run is $sqrt(3^2 + 4^2) = 5$, so the answer is $(3m_x + 4m_y)/5$.

A more intuitive way is to consider a unit vector $arrow(u) = 1/lr(|<3,4>|)
<3,4>$ that points in the same direction. So now the "run" is simply 1. Clearly
we get the same answer, but we have a good formula now

$
  "Direction derivative" = nabla arrow(f) dot arrow(u)
$

where $arrow(u)$ is the *unit vector* in our desired direction.

A geometric interpretation is that the direction derivative in a given
direction is just the gradient vector projected in that direction.

To recap:

We discuss two questions: what is the slope in a given direction, and what
direction has the steepest slope?

Let $arrow(F)$ be a gradient vector of our partial derivatives and $arrow(v)$
be the movement vector in the $x y$-plane. Let $arrow(u)$ be a unit vector
pointing in the same direction.

The answer to the first question is $nabla arrow(f) dot arrow(u)$, where
$arrow(u)$ is a unit vector in the given direction

The answer to the second question can be derived.

$
  arrow(F) dot arrow(u) = lr(|arrow(F)|) lr(|arrow(u)|) cos(theta)
$
and $theta$ is the angle between $arrow(F)$ and $arrow(u)$. So since $cos
theta$ reaches maximum value at $theta = 0$, the maximum possible slope is
actually in the same direction as $arrow(F)$ with a slope equal the magnitude
of $arrow(F)$.

Takeaways:

- Movement in the direction of the gradient vector gives the "steepest" ascent of the function
- Movement perpendicular to the gradient has slope of 0
- Movement in the opposite direction of the gradient has maximum negative slope (sharpest descent) with same magnitude as gradient
- Any directional derivative in between can be calculated as a projection from the gradient

== Optimization

We spoke previously about the Lagrange multiplier. Now we discuss it in greater
detail.

When optimizing in two dimensions, we either optimize for all of $RR^2$, or on
a constraint in $RR^2$ (such as a curve). For the first case, we use critical
points. For the second, *Lagrange multipliers*.

== Critical points

Critical points occur when the gradient is zero or undefined. Both partials are
zero *or* at least one of them isn't defined.

Essentially, they occur when the tangent plane is flat. We can't just look at
$f''(x)$ like in single variable calculus, but we can take the determinant of
the second order partials for some sort of multivariable concavity. It measures
how much the pure partial derivatives dominate the mixed partial derivatives,
and they need to dominant to a certain extent such that there is consistently
upward or downward curvature in every direction.

We find the critical points when $m_x = 0$ and $m_y = 0$ or either are
undefined. Then we classify them as follows.

Recall second derivative test for single variable function, now consider the
two-variable case.

$
  Dif (x_0,y_0) = det mat(f_(x x) (x_0, y_0), f_(x y) (x_0, y_0); f_(y x) (x_0,y_0), f_(y y) (x_0, y_0)) = f_(x x) (x_0, y_0) f_(y y) (x_0, y_0) - f_(x y) (x_0, y_0)^2
$

- If $Dif > 0$ and $f_(x x) (x_0, y_0) > 0$, then $f$ is a relative minimum.
- If $Dif > 0$ and $f_(x x) (x_0, y_0) < 0$, then $f$ is a relative maximum.
- If $Dif < 0$, then$f(x_0, y_0)$ is neither and it's a saddle point.
- If $Dif = 0$, then we don't know

== Lagrange multipliers

We discuss optimization on a restricted curve in our domain.

Idea: we should navigate along the curve and find where the direction
derivative is 0. Recall that this is the same as when the velocity vector is
perpendicular to the gradient. The issue is that we always have to parametrize
the curve.

To avoid this, Lagrange multipliers views the (implicit) constraint equation as
a level curve of another surface. If we take the gradient of that surface
everywhere on the level curve, then that gradient is parallel to the original
function's gradient at critical points.

So, we should be able to take the gradients of both the function and the
constraint function, and look for when one is a scalar multiple of the other.

Consider a function $f$ and a constraint $g$. Then we compute $nabla f$ and
$nabla g$, then solve for when $nabla f = lambda nabla g$. Then we can just plug in
points and figure it out.

#exercise[
  Find the highest and lowest points on $f(x,y) = 81x^2 + y^2$ with the
  constraint $4x^2 + y^2 = 9$. Let the second function be $g(x,y)$, and keep in
  mind our constraint is essentially the level set where $g(x,y) = 9$.
]

Intuition: consider a constraint $g(x,y) = x^2 + y^2$, and our constraint is
the level set where $g(x,y) = 25$. Notice, the gradient of $g$ is perpendicular
to its level set at any given point. So, when optimizing on a function $f$ that
is $g$-constrained, we are really looking for where the gradient of $g$ is
parallel to the gradient $f$. That is why we are using a scalar multiple
$lambda$ to relate them.

Computation:

We have $g(x,y) = 25$. We construct the equation

$
  vec(f_x, f_y) = lambda vec(g_x,g_y)
$

This gives three equations

$
  f_x = lambda g_x \
  f_y = lambda g_y \
  g(x,y) = 25
$

We find
$
  y = 4 lambda^2 y
$

If $y != 0$, then $lambda = plus.minus 1/2$. So we're looking for points on the
circle, with radius 5, such that $x = plus.minus y$. This gives 4 points to
consider: $(plus.minus 5/sqrt(2), plus.minus 5/sqrt(2))$.

If $y = 0$, then $x$ is forced to be 0, and $(0,0)$ is not on the circle. So we
ignore it.

Now we just compare our four candidates and find the greatest (or least) for
optimization!

=== Notes from Week 7 section

We have a function $f : RR^n -> RR$ that is subject to a constraint $g : RR^n -> RR^c$, where $c$ is our number of constraints. It's really a vector of $c$ constraints,
$
  g = vec(g_1,g_2,dots.v,g_c)
$

Idea: define the so-called *Lagrangian* $cal(L) = f + (g,lambda)$.

#theorem[
  If $f$ and $g$ are "nice" (partials continuous), there are no redundant constraints, and it's not overconstrained ($"Rank" Dif g = c < n$). Then any optimal solution that respects $g = 0$ solves $gradient f = lambda dot Dif g$.
]

= Lecture #datetime(day: 27, year: 2025, month:2).display()

== Volume

Any 3D shape can be built recursively of atomic objects.

#exercise[
  Derive formulae for the volume of a pyramid and cone.
]

Schley what are you doing???

== Signed area and volume