#import "@youwen/zen:0.1.0": * #import "@preview/cetz:0.3.1" #set math.equation(numbering: "(1)") #show math.equation: it => { if it.block and not it.has("label") [ #counter(math.equation).update(v => v - 1) #math.equation(it.body, block: true, numbering: none)#label("") ] else { it } } #show: zen.with( title: "Math 6A Course Notes", author: "Youwen Wu", date: "Winter 2025", subtitle: [Taught by Nathan Schley], ) #outline() = Lecture #datetime(day: 7, month: 1, year: 2025).display() == Review of fundamental concepts You can parameterize curves. #example[Unit circle][ $ x = cos(t) \ y = sin(t) $ ] For an implicit equation $ y = f(t) $ Parameterize it by setting $ x = t \ y = f(t) $ Parameterize a line passing through two points $arrow(p)_1$ and $arrow(p)_2$ by $ arrow(c)(t) = arrow(p)_1 + t (arrow(p)_2 - arrow(p)_1) $ Take the derivative of each component to find the velocity vector. The magnitude of velocity is speed. #example[ $ arrow(c)(t) = <5t, sin(t)> \ arrow(v)(t) = <5, cos(t)> $ ] == Polar coordinates Write a set of Cartesian coordinates in $RR^2$ as polar coordinates instead, by a distance from origin $r$ and angle about the origin $theta$. $ (x,y) -> (r, theta) $ = Lecture #datetime(day: 9, month: 1, year: 2025).display() == Vectors A dot product of two vectors is a generalization of the sense of size for a point or vector. #example[ How far is the point $x_1, x_2, x_3$ from the origin? \ Answer: $x_1^2 + x_2^2 + x_3^2$ ] #definition[ For vectors $u$ and $v$, where $ v = vec(v_1, v_2, dots.v, n), u = vec(u_1, u_2, dots.v, n) $ The dot product is defined as $ sum_(i=1)^n v_i dot u_i $ ] #proposition[ The dot product of two vectors is the product of their magnitudes and the cosine of the angle between. $ arrow(v) dot arrow(w) = ||arrow(v)|| dot ||arrow(w)|| cos theta $ ] = Lecture #datetime(day: 23, month: 1, year: 2025).display() Midterm is next Thursday in class! == Arclength and curvature Easy way of finding curvature: reparameterize curve with speed 1, then curvature is acceleration. If we can't do that then we need some other technique. Given $arrow(c)(t) = <2t^(-1), 6, 2t>$, find the curvature $kappa(t)$. $ kappa (t) = (||arrow(c)'(t) times arrow(c)''(t)||) / (||arrow(c)'(t)||^3) $ == Arclength parametrization Find an arc-length parametrization of $arrow(c)(t) = $. Let $s = 0$ when $t = 0$ and let $s$ be the arc-length that has traveled along the curve after $t$ seconds, then we can find $s$ by integrating the curve's speed over $t$. $ s(t) = integral^t_0 ||arrow(c)'(u)|| dif u $ = Lecture #datetime(day: 12, year: 2025, month: 2).display() == Chain rule for multivariate functions We find motivation for the chain rule. Consider a hiker whose path is given by $ arrow(c) (t) = $ and $ f(x,y) = x dot y $ What does $x'(t)$ represent? Speed in $x$-direction. Likewise for $y'(t)$. Say $x'(t) = 3$, $y'(t) = 4$. Then how far did we travel in $t$ seconds? Suppose our slope in the $x$ direction is given by $m_x = 2$. Suppose the slope in $y$ is $m_y = -2$. In fact $m_x = f_x (x,y)$ and $m_y = f_y (x,y)$ (here $f_k$ is the partial derivative with respect to $k$). So each change in $t$ of 1 leads to a change in elevation up 6 meters in $x$-axis and down 8 meters in $y$-axis. So the total change $Delta z$ is given by $ Delta z = m_x dot Delta x + m_y dot Delta y $ and analogously in calculus land $ (dif z) / (dif t) = (diff f) / (diff x) dot x'(t) + (diff f) / (diff y) dot y'(t) $ In fact @chain-rule is the chain rule. #fact[ $ (dif f) / (dif t) = (diff f) / (diff x) dot (diff x) / (diff t) + (diff f) / (diff y) dot (diff y) / (diff t) + (diff f) / (diff z) dot (diff z) / (diff t) $ ] #example[ Consider $f(x) = x^x$. What is $f'(x)$? We can do this with logarithmic differentiation but we can also do this with the multivariable chain rule. $ f(x,y) = $ ] #example[ Find the derivative $dif/(dif t) (f(x,y))$, where $f(x,y) = x^y$, $x(t) = t$, and $y(t) = 1$. Assume $t > 0$. ] #example[ Find the partial derivative $diff/(diff s) f(x,y,z)$ where $f(x,y,z) = x^2 y^2 + z^3$, and $ x(s,t) = s t \ y(s, t) = s^2 t \ z(s,t) = s t^2 $ ] == Implicit differentiation Review from single variable: given $f(x,y)$ we can differentiate each term with respect to $x$, then collect all $(dif y)/(dif x)$ terms together and solve for it as a variable to obtain $(dif y)/(dif x) = f'(x,y)$. We do something similar for more variables. Main idea: extraneous variables are held constant in practice. Example: consider the surface $3x^2 + 5y z + z^3 = 0$. We want $(diff y)/(diff z)$ at some point. Use implicit differentiation by viewing the surface as a level set of some larger function $F(x,y,z) = 3x^2 + 5y z + x^3$ (the level set part is when $F(x,y,z) = 0$). By applying the product rule (really the chain rule @chain-rule) $ (diff F) / (diff x) = diff / (diff z) (3x^2 + 5 y z + z^3) = diff / (diff z) z^3 = 0 + (5 (diff y) / (diff x) z + 5y) + 3z^2 \ (diff y) / (diff z) = - (5y + 3z^2) / (5z) $ = Lecture #datetime(day: 18, year: 2025, month: 2).display() == Critical points When optimizing in 2D, the strategy depends on whether we're - optimizing for all of $RR^2$ (or a region in $RR^2$) - optimizing on a constraint (like a curve through $RR^n$) We find critical points where the tangent plane is "flat": $m_x = 0$ and $m_y = 0$. We classify critical points using the determinant of the gradient. $ D = f_(x x) f_(y y) = f_(x y)^2 $ - if $D >0$ and $f_(x x) (x_0, y_0) > 0$, then $f(x_0, y_0)$ is a relative minimum. - if $D > 0$ and $f_(x x) (x_0, y_0) < 0$, $f(x_0, y_0)$ is a relative maximum. - if $D < 0$ then $f(x_0, y_0)$ is neither and we call it a saddle point. - if $D = 0$ then we don't know == Lagrange multipliers Optimizing constrained curves. Idea: navigate along the curve and look for where the directional derivative is zero. #example[ Find the highest and lowest points on $f(x,y) = 81x^2 + y^2$ with the constraint $x^2 + y^2 = 0$. For notational purposes, we'll call $g(x,y) = 4x^2 + y^2$ and keep in mind we're looking for $g(x,y) = 9$. 1. Find the gradients of $f$ and $g$. $ vec(f_x,f_g) &= vec(162x,2y) \ vec(g_x,g_y) &= vec(8x,2y) $ 2. We want to find the points where the gradients "align", i.e. we want these vectors to be parallel, that is: $ arrow(F) = lambda arrow(G) \ 162x = 8x dot lambda \ 2y = 2y dot lambda $ Remember to keep the constraint! $ 4x^2 + y^2 = 9 $ Breaking it down into cases, $ 162 = 8lambda => lambda = 81 / 4 "for" x != 0 $ which implies $ 2y (lambda - 1) 0 \ y = 0 $ since $lambda - 1$ is nonzero. So if $x$ is nonzero, $y$ must be zero. $ 4x^2 = 9 \ x = plus.minus 3 / 2 $ Now consider when $x = 0$. Then $y = plus.minus 3$. So our critical points are $(plus.minus 3/2, plus.minus 3)$. Finally, just plug in these 4 critical points into $f$ and find the biggest/smallest. ] = Speedrun In this chapter I wrote up notes for the entirety of the course, starting from week 1, ending at week 8, because I skipped 80% of the classes up to the midterm. == Vector review We know about functions $y = f(x)$. We can parameterize functions by expressing them as a pair of coordinates $(x(t), y(t))$, modeling for example a particle traveling through space with respect to time. === Derivative of parameterized curve Given $(x(t), y(t))$, the derivative is given $ (x'(t), y'(t)) $ === Parameterizing an ellipse Consider an ellipse $ x^2 / n + y^2 / m = r^2 $ Then we note that this is just a circle with $x$ stretched by a factor of $sqrt(n)$ and likewise for $y$ by a factor of $sqrt(m)$. Then we can parameterize the ellipse by $ (r sqrt(n) cos(theta), r sqrt(m) sin(theta)) $ A sanity check: when $n = 1$, $m = 1$, $r = 1$, we have a unit circle and the parametrization reflects that. #example[ Consider the ellipse $x^2/9 + y^2/4 = 16$. Then the parametrization is $(12cos(theta), 8sin(theta))$, and this is indeed right. ] To parameterize a line passing through points $arrow(p)_1$ and $arrow(p)_2$, simply $ arrow(c) (t) = arrow(p)_1 + t(arrow(p)_2 - arrow(p)_1) $ Algebraically we can justify this by noting $t=0$ gives $arrow(p)_1$ and $t=1$ gives $arrow(p)_2$. === Polar coordinates Notation: $(r,theta)$ instead of $(x,y)$. Note that this is just highlighting that we're parameterizing a curve in terms of a radius $r$ (also called _modulus_)and argument (angle) $theta$. To get $r$, see that $r^2 = x^2 + y^2$ and it follows that $theta = arctan(y/x)$. Blah blah. To plot in terms of $x,y$, note that $ x = r cos(theta) \ y = r sin(theta) $ === Vector properties Many nonrigorous statements about vectors to add to our toolbox. Two vectors are parallel if and only if the angle formed between them is 0. Vectors can be added in linear combinations. The magnitude of a vector length $n$ is given by the $n$ dimensional Pythagorean theorem. $ sqrt(a_1^2 + a_2^2 + dots.c + a_n^2) $ A unit vector is a vector with magnitude 1. === Dot product Dot products are useful for seeing properties about orthogonality, parallelism, and projection. #definition[ The dot product takes two vectors and returns a single scalar. It is sometimes called the inner product. ] We can view the dot product algebraically, and geometrically. In the first sense, it's just the sum of the products of each pair of coordinates in the vectors. Let $A,B$ be vectors length $n$, and $a_i, b_i$ be the $i^"th"$ entry of their respective vectors, then $ A dot B = a_1 dot b_1 + a_2 dot b_2 + dots.c + a_n dot b_n $ Geometrically, it's the product of the magnitudes of the vectors and the cosine of angle between them. $ A dot B = |A| dot |B| dot cos(theta) $ where $theta$ is the angle between the vectors. It's nontrivial to prove this and we don't have time. Therefore, we know two vectors are parallel when $A dot B = |A| dot |B|$, because $cos(theta) = 1$. We know two vectors are orthogonal if the dot product is 0, because $cos(pi/2 + pi n)$ is 1. Dot products are very useful for projection. Pick a normal vector $arrow(u)$. For any vector $arrow(w)$, $arrow(u) dot arrow(w)$ gives the size of the parallel part. Also, note this gives the shortest distance between the tip of $arrow(w)$ and the line passing in the direction of $arrow(u)$! === Normalizing vectors Normalizing a vector means obtaining a vector pointing in the same direction, with magnitude 1. For a nonzero vector $v$, its normalized vector is given $ v / (|v|) $ where $|v|$ is the magnitude. === Cross product The cross product $times$ is a binary operation on vectors. For our purposes it's only defined in $RR^3$ (in fact it's defined in some other dimensions, but in general it is not defined). It produces a third vector perpendicular to both original vectors. Its direction is determined by the right hand rule. The magnitude of the cross product is given by $ |a times b| = |a| |b| sin theta $ where $theta$ is the angle between $a$ and $b$. So we know two vectors are parallel if the magnitude of the cross product is 0. In fact this magnitude is also the area of the parallelegram spanned by the two vectors. This is an alternative way to view the cross product being 0 implying the vectors parallel. We can compute the cross product by taking a determinant. $ a times b = det mat(i,j,k; a_1, a_2, a_3; b_1,b_2,b_3) $ The cross product is anticommutative, so $a times b = -(b times a)$. == Vector applications, building geometric intuition Let's build some geometric intuition for working with vectors, especially in $RR^3$. === Moving the equation of a line or plane while maintaining orientation We consider two cases. If we're working with a parametric equation, then we can just add a vector to the equation to shift everything by said vector. Otherwise, with an implicit equation, let's consider only the plane (since we need two implicit equations to specify a line, and at that point it's better to solve the system and parameterize). The plane equation $a x + b y + c z = r$ can be shifted $n$ units in the positive $x$, $y$, or $z$ direction by replacing all $x$, $y$, or $z$ with $x - n$ and so on. Then we can just multiply out collect terms. === Moving equation of a line/plane to pass through a specific point We want to do this without changing direction or orientation. We can just use our technique discussed above for this. First let's make sure the plane passes through the origin. If we have $ 3x - 2y + 7z = 12 $ we can just set the right hand to $0$ $ 3x - 2y + 7z = 0 $ Note that by our technique of shifting the plane or line, we see that the constant on the right side is determined entirely by shifts in space that preserve direction/orientation. So we are sure that setting it to 0 does nothing but move the plane/line through 0. === Equation of a line through a given point perpendicular to a plane Let's say we have a point $(5,2,3)$. Let's consider both parametric planes and implicitly defined planes. Suppose the plane is given by $ 2x + 3y + 4z = 12 $ Then observe that we need the line to be perpendicular to the plane but it doesn't really matter where the plane is. Recall that we can easily shift a plane around while preserving orientation. So let's just move the plane through the origin again. $ 2x + 3y + 4z = 0 $ Now note that we can obtain a perpendicular vector to the plane by finding a vector perpendicular to any particular vector on this plane. Then note that $vec(2,3,4)$ is one such perpendicular vector. See this by $ vec(2,3,4) dot vec(x,y,z) = 2x + 3y + 4z = 0 $ Then we can simply scale our perpendicular vector by a parameter $t$ to obtain a parametric line that's perpendicular to the plane. Now we can just shift it by our desired point, and it remains orthogonal while passing through the line (at $t = 0$). $ vec(5,2,3) + t vec(2,3,4) $ Now consider when we have a parametric equation, say $ vec(5,0,0) + s vec(2,0,-1) + t vec(0,4,-3) $ Then as long as we're perpendicular to both of the vectors being multiplied by $s$ and $t$, we're perpendicular. This is easy to show, just note that the plane is given by the span of the basis vectors $vec(2,0,-1)$ (shifted by $vec(5,0,0)$) and $vec(0,4,-3)$, so any vector perpendicular to both is perpendicular to the entire plane. So just take their cross product to get a desired vector. === Distance between point and a plane Think geometrically. We really want to move in a perpendicular line from the plane to the point (because that's the closest distance between them). We should start on the point, but where do we stop on the plane? Consider $ 4x + y + 3z = 1 $ and we want the distance to $(1,1,-5)$. The perpendicular line passing through the point is $ vec(1,1,-5) + t vec(4,1,3) $ The line "starts" at the point $t=0$, so let's find a value of $t$ that makes it stop precisely on the plane. To do this, simply note that our line is really a parametric equation $(x,y,z)$ where $x = 1 + 4t, y = 1 + t, z = -5 + 3t$. Then we can simply plug these into the equation of the plane and solve for $t$ to get the value of $t$ where the line meets the plane. Then plug $t$ into our line equation (which gives a vector) and the magnitude is the distance between the point and plane. === Area of a parallelogram formed by two vectors in $RR^3$ In $RR^2$ we can take the determinant. In $RR^3$ the determinant is the volume of the parallelepiped. So instead we just take the magnitude of the cross product. === Distance from point to line passing through two other points in $RR^2$ Note that there are multiple ways to do this. Let $P = (1,7)$, $A = (1,1)$, and $B = (3,9)$. We want the distance from $P$ to the line between $A$ and $B$. We could just find the line between them and then use a 2-dimensional version of our point to plane technique (solve for a vector orthogonal to the line, in the direction of $P$, passing through $P$), but since we're in $RR^2$, we can just project $P$ onto the normalized line and do some stuff. In particular, note that the magnitude of the cross product of $A times B$ is $|A| dot |B| dot sin theta$. So if we want the distance from the tip of $B$ to the line spanned by $A$, we should do $(|A times B|)/(|A|)$. If instead we want the length of the projection of $B$ onto $A$, we should do $(A dot B)/(|A|)$. There are multiple ways to interpret this geometrically. == Derivative of a curve What is the derivative of a curve? We can view the derivative at some point $x$ as the slope of the tangent line. But that doesn't give the derivative of a parametric curve traveling through the plane. However, this is simple. Because our curve is parameterized, each coordinate $x,y,z$ and so on is independent of each other and given by $t$. Therefore, we can collect another vector, taking the derivative of each coordinate, which gives us a vector of the rates at which each coordinate is changing. == Arclength parametrization We want to find an arc length parametrization of a curve. That is, we want to express a curve in terms of how far we've traveled on it. Idea: let $s=0$ when $t=0$, and let $s$ be the arclength traveled after $t$ seconds. Then we can integrate the curve's speed over $t$ to find the arc length. $ s(t) = integral_0^t ||arrow(c)'(u)|| dif u $ Then, we can solve for $t$ in terms of $s$, and plug it back into our original vector in terms of $t$, $arrow(c)(t)$. Then its position will be expressed by in terms of $s$, $arrow(c)(s)$, and we'll have a parametrization by arc length. A key notion here is now the velocity vectors are tangent, but also unit length (since we should imagine that we are always moving at unit speed along the curve at any given point). What direction does the _acceleration_ vector point in, then? For a parameterized curve $arrow(c)(t)$ with velocity $arrow(v)(t)$ and acceleration $arrow(a)(t)$, then the speed is magnitude $|arrow(v)(t)|$. When the speed is constant, $|arrow(v)(t)|$ doesn't change with time. A prototypical example: consider uniform circular motion. Then the angular velocity (speed) is always constant, yet there is always an acceleration vector pointing perpendicular to the tangent velocity vector (the centripetal acceleration). == Curvature The curvature in $RR^2$ is given by the second derivative. But this is just a lucky coincidence. Let's think about the notion of curvature. Somehow, the curvature measures the best-fitting second order approximation of a curve. Curvature is a measure of concavity with respect to the direction perpendicular to the direction of motion. We do have a formula $ kappa(t) = (|arrow(c)'(t) times arrow(c)'' (t)|) / (|arrow(c)'(t)|^3) $ == Building intuition for curvature Curvature is essentially asking how closely our curve resembles a unit circle at a given point. It follows that a unit circle has a curvature 1, and a straight line has curvature 0. Let's consider a parametric curve $ arrow(s)(t) = vec(t-sin(t), 1-cos(t)) $ Consider the unit tangent vectors to the curve at some points. We're essentially asking "how much do these tangent vectors change direction?" and considering points arbitrarily close. This is the essence of curvature. Now let's think about some geometric intuition. Suppose you're moving along a curve. It follows that if the curve is bending very sharply, the tangent vectors are changing direction very fast, in larger increments. Likewise, if the curve is straighter, the tangent vectors change direction less often. And when you're traveling on a straight line, the tangent vectors don't change direction at all. We want a mathematical model of this notion of "changing tangent vectors." The idea is that we can capture this with some sort of derivative, but with respect to what? If we just want to capture when the tangent vector is changing directions, we clearly want to ignore any change in the actual magnitude of the tangent vector itself, since this has no bearing on the directional change. Put another way, if you're traveling along the curved path, the speed at which you go (the magnitude of the tangent velocity vector) really doesn't matter with regard to curvature. We only care when the tangent velocity vector changes direction! So suppose $T$ is the unit tangent vector at each point. We want the rate of change of $T$, its derivative, but _not_ $(dif T)/(dif t)$, with respect to time. This is because we don't really care how _fast_ the tangent vector changes with respect to time, curvature is about measuring how much the tangent vector changes as we move some arbitrary distance on the curve! Instead, we really want $(dif T)/(dif s)$, where $s$ is the arc length we've traveled so far (from some arbitrarily chosen starting point). And this makes intuitive sense, because we just want how fast the tangent vector changes direction with regards to the distance we travel on the curve. Now we may note that we can actually find curvature if we can find an arc-length parametrization of $arrow(s)$! Because an arclength parametrization always has unit speed, its derivative gives the tangent vectors at every point, and we can differentiate with respect to arclength and take the magnitude to obtain the curvature. That is, $ kappa = abs((dif T) / (dif s)) $ But if we can't find an arclength parametrization, we're out of luck. Let's continue investigating. Consider a prototypical example: #example[ Let $arrow(s)(t) = vec(cos(t) R, sin(t) R)$. We're drawing a circle with radius $R$. Let's differentiate with respect to $t$. $ arrow(s)'(t) = vec(-sin(t) R, cos(t) R) $ But we want unit tangent vectors, so let's normalize it. Call our unit tangent $T$. $ T(t) = (arrow(s)'(t)) / (|arrow(s)'(t)|) $ We have $ |arrow(s)'(t)| = lr(|vec(-sin(t) R, cos(t) R)|) \ = sqrt(sin^2(t) R^2 + cos^2(t) R^2) = R $ Now we have our unit tangent vectors in terms of $t$. $ T(t) = (arrow(s)'(t)) / R $ We take $ (dif T) / (dif t) = vec(-cos(t), -sin(t)) $ and the magnitude of this is just 1. We should immediately note that not all cases will be so easy. When taking $|arrow(s)'(t)|$, in general, we have a very disgusting square root that cannot be simplified. Now note: $ abs((dif T)/(dif s)) = abs((dif T)/(dif t)) / abs((dif T)/(dif s)) $ So in fact $kappa = 1/R$! ] The key here is this equation: $ abs((dif T)/(dif s)) = abs((dif T)/(dif t)) / abs((dif s)/(dif t)) $ Although we didn't have an arclength parametrization of $T$, we note that its magnitude is essentially given by the magnitude at which it's changing with respect to time, and divided by the rate the curve is moving to "correct" for the discrepancies introduced by taking the derivative with respect to time! Obviously this is very nonrigorous. But I'm running out of time. Now if we do a bunch more reasoning and nonsense we can obtain the formula above, but at this point the goal seems to have been reached. We have an intuitive understanding of what curvature should measure. So recall the formula: $ kappa(t) = (|arrow(c)'(t) times arrow(c)'' (t)|) / (|arrow(c)'(t)|^3) $ Essentially, it's saying that the area of the parallelogram formed by the curve's velocity and acceleration vectors, divided by the cube of the speed, gives us the curvature. Intuitively we see that the more the acceleration vector diverges from the velocity vector, the sharper the velocity vector is changing direction, which gives us a notion of curvature. And somehow dividing by the speed cubed is normalizing out any influence due to speed to give us our curvature. == Quadric surfaces We really only need to know the identities and derivatives to do some integral hacks. The quadric surface is the generalization of the conic section to $n$ dimensions. Now recall that one conic section is the hyperbola. It turns out we can define analogues of the trigonometric functions that parameterize a so-called unit hyperbola instead of the unit circle. These functions are $ sinh(x) = (e^x -e^(-x)) / 2 \ cos(x) = (e^x + e^(-x)) / 2 \ tanh(x) = sinh(x) / cosh(x) $ The derivatives are $ (dif) / (dif x) sinh(x) = cosh(x) \ (dif) / (dif x) cosh(x) = sinh(x) $ It's pretty easy to show these using their definitions, and derive the derivative of $tanh$. == End of weeks 1-4 That was all of the content of week 1 to 4. Now we shift to weeks 5-7, where we studied more about vectors and their derivatives. == Partial derivatives These are the slopes of tangent lines for the graph in the direction of the changing variable. #theorem[Clairaut's theorem][ Suppose $f : RR^2 -> RR$ is defined on a disk $D$ that contains a point $(a,b)$. If the functions $f_(x y)$ and $f_(y x)$ are ocntinuous on this disk then they are the same. ] #theorem[Extended Clairaut][ Suppose $f : RR^2 -> RR$ is defined on a disk $D$ that contains $(a,b)$. If all of the mixed partial derivatives are continuous anywhere in the disk $D$, then the mixed partials are equal. ] == Multivariable chain rule The product rule actually follows from it. Recall: $ (f(x) g(x))' = f'(x) g(x) + f(x) g'(x) $ Then instead let's replace $f$ and $g$ with $x$ and $y$, such that we have something like $ z = x y $ Then let $x$ and $y$ be functions of $t$. So $ z = x(t) y(t) $ The partial derivatives $ (diff z) / (diff x) = y, (diff z) / (diff y) = x $ By the multivariable chain rule, $ (diff z) / (diff t) = x'(t) (diff z) / (diff x) + (diff z) / (diff y) y'(t) = x'(t) y(t) + x(t) y'(t) $ == Implicit differentiation It's similar to the single variable implicit differentiation, but remember to hold the extraneous variables constant in practice. #example[ Suppose you have a surface $ 3x^2 + 5 y z + z^3 = 0 $ And you want a partial derivative $(diff y)/(diff z)$ at some point. You can use implicit differentiation by viewing the surface as a level set of some larger function $F(x,y,z) = 3x^2 + 5y z + z^3$ where $F(x,y,z) = 0$. Now we differentiate both sides: $ (diff F) / (diff z) = diff / (diff z)(3x^2) + diff / (diff z)(5y z) + diff / (diff z)(z^3) \ = 0 + (5 (diff y) / (diff z) + 5y) + 3z^2 $ Then solve for $(diff y)/(diff z)$. ] == Multivariable chain rule as matrix Consider $z = f(x,y)$. Then the derivative of $z$ with respect to $x$ and $y$ would be a matrix: $ mat((diff z)/(diff x), (diff z)/(diff y)) $ Now suppose the coordinate system changes to $ x = 3u - v \ y = 2v $ Now suppose we want $z_u$ and $z_v$ at $(x,y) = (3,6)$. Then this is actually $(u,v) = (2,3)$ in $u v$ coordinates. The partials of $z$ with respect to $u$ and $v$ are just matrix multiplication: $ mat(z_u,z_v) = mat(z_x, z_y) mat(x_u,x_v;y_u,y_v) $ == Differentials Differentials are about linear approximation. Recall in the single variable case we use the tangent line approximation and differentials to approximate functions. In the multivariable case it's the tangent plane approximation and the directional derivative. Let $f$ be a function with two inputs and one output, say $ f(x,y) = x^2 + x cos(y) $ Then the tangent plane at $(x_0,y_0)$ is the plane that best approximates the function at that point. Now we need two slopes instead of one. Take a look at $(1,pi/2)$. Then the partial derivatives at that point are 2 and -1. The idea is we start at $(1,pi/2,1)$ and then move a slight nudge in either the $x$ or $y$ directions. In the $x$ directions, we move by $Delta x$ and get an increase in $z$ (height) of $2 dot Delta x$. If we move a slight nudge in the $y$ direction $Delta y$, then our height should increase (decrease) by $-1 dot Delta y$. Then we have a tangent plane approximation of $ Delta z approx 2 dot Delta x - 1 dot Delta y $ And like in single variable, we can replace the $Delta$ with $dif$. $ diff z approx 2 dot diff x - 1 dot diff y $ Sidenote: how can we actually have the equation of the tangent plane in terms of $x$,$y$,$z$? Just note that $x = 1 + Delta x$, $y = pi/2 + Delta y$, and $z = 1 + Delta z$. So just substitute $ (z - 1) = 2(x-1) - 1(y-pi / 2) $ So in general, the differential version $ dif z = m_x dif x + m_y dif y $ and the tangent plane equation at $(x_0, y_0, z_0)$: $ z = z_0 + m_x (x-x_0) + m_y (y-y_0) $ == Directional derivative Now take another look at the linear approximation in $z$ $ Delta z approx m_x Delta x + m_y Delta y $ The directional derivative reinterprets this as matrix multiplication $ Delta z approx mat(m_x,m_y) vec(Delta x, Delta y) $ This is the same as a dot product, in fact, the dot product of the gradient given by $vec(m_x,m_y)$ and a tiny movement vector. == Directional derivative We mentioned this before, now let's discuss in more detail. If you move by $Delta x$ and $Delta y$, in $x$ and $y$ directions, then the direction derivative computes your change in height on the tangent plane. Consider the question "What is the derivative of $f$" in the direction of the vector $vec(3,4)$? Now consider $m_x (3) + m_y (4) = 3m_x + 4m_y$. This is almost the answer, but really this is the change in $f$ resulting from a movement in the direction of $vec(3,4)$. If we want the derivative, we're asking for the slope. We have rise, now run is $sqrt(3^2 + 4^2) = 5$, so the answer is $(3m_x + 4m_y)/5$. A more intuitive way is to consider a unit vector $arrow(u) = 1/lr(|<3,4>|) <3,4>$ that points in the same direction. So now the "run" is simply 1. Clearly we get the same answer, but we have a good formula now $ "Direction derivative" = nabla arrow(f) dot arrow(u) $ where $arrow(u)$ is the *unit vector* in our desired direction. A geometric interpretation is that the direction derivative in a given direction is just the gradient vector projected in that direction. To recap: We discuss two questions: what is the slope in a given direction, and what direction has the steepest slope? Let $arrow(F)$ be a gradient vector of our partial derivatives and $arrow(v)$ be the movement vector in the $x y$-plane. Let $arrow(u)$ be a unit vector pointing in the same direction. The answer to the first question is $nabla arrow(f) dot arrow(u)$, where $arrow(u)$ is a unit vector in the given direction The answer to the second question can be derived. $ arrow(F) dot arrow(u) = lr(|arrow(F)|) lr(|arrow(u)|) cos(theta) $ and $theta$ is the angle between $arrow(F)$ and $arrow(u)$. So since $cos theta$ reaches maximum value at $theta = 0$, the maximum possible slope is actually in the same direction as $arrow(F)$ with a slope equal the magnitude of $arrow(F)$. Takeaways: - Movement in the direction of the gradient vector gives the "steepest" ascent of the function - Movement perpendicular to the gradient has slope of 0 - Movement in the opposite direction of the gradient has maximum negative slope (sharpest descent) with same magnitude as gradient - Any directional derivative in between can be calculated as a projection from the gradient == Optimization We spoke previously about the Lagrange multiplier. Now we discuss it in greater detail. When optimizing in two dimensions, we either optimize for all of $RR^2$, or on a constraint in $RR^2$ (such as a curve). For the first case, we use critical points. For the second, *Lagrange multipliers*. == Critical points Critical points occur when the gradient is zero or undefined. Both partials are zero *or* at least one of them isn't defined. Essentially, they occur when the tangent plane is flat. We can't just look at $f''(x)$ like in single variable calculus, but we can take the determinant of the second order partials for some sort of multivariable concavity. It measures how much the pure partial derivatives dominate the mixed partial derivatives, and they need to dominant to a certain extent such that there is consistently upward or downward curvature in every direction. We find the critical points when $m_x = 0$ and $m_y = 0$ or either are undefined. Then we classify them as follows. Recall second derivative test for single variable function, now consider the two-variable case. $ Dif (x_0,y_0) = det mat(f_(x x) (x_0, y_0), f_(x y) (x_0, y_0); f_(y x) (x_0,y_0), f_(y y) (x_0, y_0)) = f_(x x) (x_0, y_0) f_(y y) (x_0, y_0) - f_(x y) (x_0, y_0)^2 $ - If $Dif > 0$ and $f_(x x) (x_0, y_0) > 0$, then $f$ is a relative minimum. - If $Dif > 0$ and $f_(x x) (x_0, y_0) < 0$, then $f$ is a relative maximum. - If $Dif < 0$, then$f(x_0, y_0)$ is neither and it's a saddle point. - If $Dif = 0$, then we don't know == Lagrange multipliers We discuss optimization on a restricted curve in our domain. Idea: we should navigate along the curve and find where the direction derivative is 0. Recall that this is the same as when the velocity vector is perpendicular to the gradient. The issue is that we always have to parametrize the curve. To avoid this, Lagrange multipliers views the (implicit) constraint equation as a level curve of another surface. If we take the gradient of that surface everywhere on the level curve, then that gradient is parallel to the original function's gradient at critical points. So, we should be able to take the gradients of both the function and the constraint function, and look for when one is a scalar multiple of the other. Consider a function $f$ and a constraint $g$. Then we compute $nabla f$ and $nabla g$, then solve for when $nabla f = lambda nabla g$. Then we can just plug in points and figure it out. #exercise[ Find the highest and lowest points on $f(x,y) = 81x^2 + y^2$ with the constraint $4x^2 + y^2 = 9$. Let the second function be $g(x,y)$, and keep in mind our constraint is essentially the level set where $g(x,y) = 9$. ] Intuition: consider a constraint $g(x,y) = x^2 + y^2$, and our constraint is the level set where $g(x,y) = 25$. Notice, the gradient of $g$ is perpendicular to its level set at any given point. So, when optimizing on a function $f$ that is $g$-constrained, we are really looking for where the gradient of $g$ is parallel to the gradient $f$. That is why we are using a scalar multiple $lambda$ to relate them. Computation: We have $g(x,y) = 25$. We construct the equation $ vec(f_x, f_y) = lambda vec(g_x,g_y) $ This gives three equations $ f_x = lambda g_x \ f_y = lambda g_y \ g(x,y) = 25 $ We find $ y = 4 lambda^2 y $ If $y != 0$, then $lambda = plus.minus 1/2$. So we're looking for points on the circle, with radius 5, such that $x = plus.minus y$. This gives 4 points to consider: $(plus.minus 5/sqrt(2), plus.minus 5/sqrt(2))$. If $y = 0$, then $x$ is forced to be 0, and $(0,0)$ is not on the circle. So we ignore it. Now we just compare our four candidates and find the greatest (or least) for optimization! === Notes from Week 7 section We have a function $f : RR^n -> RR$ that is subject to a constraint $g : RR^n -> RR^c$, where $c$ is our number of constraints. It's really a vector of $c$ constraints, $ g = vec(g_1,g_2,dots.v,g_c) $ Idea: define the so-called *Lagrangian* $cal(L) = f + (g,lambda)$. #theorem[ If $f$ and $g$ are "nice" (partials continuous), there are no redundant constraints, and it's not overconstrained ($"Rank" Dif g = c < n$). Then any optimal solution that respects $g = 0$ solves $gradient f = lambda dot Dif g$. ] = Lecture #datetime(day: 27, year: 2025, month:2).display() == Volume Any 3D shape can be built recursively of atomic objects. #exercise[ Derive formulae for the volume of a pyramid and cone. ] Schley what are you doing??? == Signed area and volume