\(\newcommand{\R}{\mathbb R }\)
\(\Leftarrow\) \(\Uparrow\) \(\Rightarrow\)
In single variable calculus, we can find critical points in an open interval by checking any point where the derivative is \(0\). The local minima and maxima are a subset of these, and the second derivative test gives us information about which they are. To generalize this, we’ll need to find out how our function is changing in different directions. For example, \(f(x)=x^3\) has a critical point at \(0\), but decreases to the left and increases to the right, so it is not a local min or max. Multivariable functions have many more possible directions than just left or right, and we’ll use our tools from linear algebra to keep track of them.
Given a symmetric \(n\times n\) matrix \(A\), with entries \(a_{ij}\) for \(i,j \in \{1,\ldots, n\}\), we can define a function \(\R^n\to \R\) by sending \[ \mathbf x \mapsto (A\mathbf x)\cdot \mathbf x = \sum_{i,j=1}^n a_{ij}x_ix_j. \]
A symmetric \(n\times n\) matrix \(A\) is
In addition, we say that \(A\) is
Recall from linear algebra that the eigenvalues of a matrix \(A\) are the roots of the polynomial \(\det(A-\lambda I)\), called the characteristic polynomial, denote \(p_A(\lambda)\). For example, if \(A\) is the \(2\times 2\) matrix \[A = \left( \begin{array}{cc} a&b \\ c&d \end{array}\right) \] then its characteristic polynomial is \(p_A(\lambda)=\lambda^2 - (a+d)\lambda + (ad-bc).\) In general the eigenvalues are complex numbers, but if \(A\) is symmetric, then they are always real.
If we can find the eigenvalues of a symmetric matrix, then the next theorem lets us determine whether it is positive definite, nonnegative definite, indefinite etc.
The proof relies on the fact that if \(A\) is a symmetric matrix then \[\begin{equation}\label{rayleigh} \text{the smallest eigenvalue of }A = \min_{\{\mathbf u\in \R^n : |\mathbf u|=1\} } \mathbf u^T A\mathbf u. \end{equation}\] We will see that the minimum is attained, by the Extreme Value Theorem.
Let us write \(\lambda_1\) to denote the smallest eigenvalue of \(A\). Then \[\begin{align} \text{ all eigenvalues }>0 &\quad \iff\quad \lambda_1 >0 \nonumber \\ &\quad \iff\quad \mathbf u^T A\mathbf u \ge \lambda_1 >0\text{ for every unit vector }\mathbf u\text{, by $\eqref{rayleigh}$}. \label{r1}\end{align}\] Also, any nonzero \(\mathbf x\in \R^n\) can be written \(\mathbf x = |\mathbf x|\, \mathbf u\) where \(\mathbf u\) is the unit vector in the direction of \(\mathbf x\). With this notation, \[\begin{align*} \text{$\eqref{r1}$ holds} &\iff \mathbf x^T A \mathbf x = |\mathbf x|^2 \mathbf u^TA\mathbf u \ge |\mathbf x|^2\lambda_1>0 \text{ for every nonzero }\mathbf x \nonumber \\ &\ \Longrightarrow A \text{ is positive definite}. \end{align*}\]
Conversely, we claim that is a matrix is positive definite, then all eigenvalues are positive. For this, let \(\mathbf v\) be a unit vector such that \(\mathbf v^T A \mathbf v = \displaystyle{ \min_{\{\mathbf u\in \R^n : |\mathbf u|=1\} }} \mathbf u^T A\mathbf u =\) the smallest eigenvalue \(\lambda_1\), by \(\eqref{rayleigh}\). Such a vector \(\mathbf v\) exists, due to the Extreme Value Theorem. If \(A\) is positive definite, it follows directly from the definition of “positive definite” that \(\mathbf v^T A \mathbf v>0\), and hence that the smallest eigenvalue is positive.
The final assertion, about indefinite matrices, follows by simlar arguments.From the definition of eigenvalues, \(\lambda\) is an eigenvalue of \(A\) if and only if \(-\lambda\) is an eigenvalue of \(-A\). As a result, Theorem 1 implies that \(A\) is negative definite \(\iff\) all of its eigenvalues are \(<0\), and nonpositive definite \(\iff\) all of its eigenvalues are \(\le 0\).
Note that we have not said a word about eigenvectors. They appear in the proof of \(\eqref{rayleigh}\), but if all we want to do is check definiteness, then we can do so without finding eigenvectors.
Even for rather small matrices (say, \(3\times 3\)) it is already hard to find the eigenvalues “by hand”, since it involves computing a determinant and solving a cubic equation. Our selected examples may not be so difficult.
Let \[ A = \left( \begin{array}{ccc}1&1&0\\ 1&2&1\\0&1&1 \end{array} \right). \]Determine the definiteness (positive, nonnegative, indefinite, etc.) of the matrix \(A\).
In practice, if one wants to find the eigenvalues of a matrix that is \(3\times 3\) or larger, one normally uses mathematical software. Instead of computing determinants and finding roots of a polynomial, the standard algorithms find eigenvalues by determining a similar upper triangular matrix, using the QR factorization.
When \(n=2\), there is a simple rule for determining the sign of the eigenvalues of a symmetric matrix:
For the symmetric matrix \(A = \left( \begin{array}{cc}\alpha&\beta\\ \beta&\gamma \end{array}\right)\),
If \(\det A = \alpha\gamma - \beta^2<0\), then \(A\) is indefinite.
If \(\det A>0\), then
If \(\det A= 0\), then at least one eigenvalue equals zero, and
We know that if \(\mu_1, \mu_2\) are the roots of the characteristic polynomial, then it can be factored
\[ \lambda^2 - (\alpha+\gamma)\lambda +(\alpha\gamma-\beta^2) = (\lambda-\mu_1)(\lambda-\mu_2) = \lambda^2 - (\mu_1+\mu_2)\lambda +\mu_1\mu_2. \] Comparing the two sides, we find that \[ \mu_1\mu_2 = (\alpha\gamma-\beta^2)=\det A, \text{ and } \mu_1+\mu_2 = \alpha+\gamma. \] Then we easily see that
The easiest way to remember the statement of the theorem may be to remember the idea of its proof, which is that \(\det A\) is the product of the eigenvalues. Thus
\(\det A<0\) if and only if the two eigenvalues are nonzero and have opposite signs (making \(A\) indefinite).
\(\det A>0\) if and only if the two eigenvalues are nonzero and have the same sign. Since \(\alpha+\gamma\) is also the sum of the eigenvalues, the sign is the same as that of \(\alpha+\gamma\) which is the same as the sign of \(\alpha\), because \(\alpha\) and \(\gamma\) must have the same sign to make the determinant positive.
Note that this only applies to symmetric, \(2\times 2\) matrices. Try to determine where we have used both assumptions.
Just like in single variable calculus, determining local minima and maxima on the interior of a domain is simplified by finding all critical points there. Our definition of a critical point will need to account for the additional dimensions, and our extension of the second derivative test will use all of the second derivatives, in the form of the Hessian.
Suppose that \(S\) is an open subset of \(\R^n\) and that \(f\) is a function \(S\to \R\). A point \(\mathbf a\in S\) is said to be a local minimum point for \(f\) if there exists \(\varepsilon>0\) such that \(f(\mathbf a)\le f(\mathbf x)\) for all \(\mathbf x\in S\) such that \(|\mathbf x-\mathbf a|<\varepsilon\). (Also sometimes called a “local minimum” or even just a “local min”.)
Similarly, \(\mathbf a\in S\) is a local maximum point (or “local maximum”, or “local max”) for \(f\) if there exists \(\varepsilon>0\) such that \(f(\mathbf a)\ge f(\mathbf x)\) for all \(\mathbf x\in S\) such that \(|\mathbf x-\mathbf a|<\varepsilon\).The name local extremum is given to any point that is either a local min or a local max.
Find all critical points of \(f(x,y) = \frac {x+y}{1+x^2+y^2}\).
Solution
To do this we compute
\[ \partial_x f(x,y) = \frac{1+x^2+y^2 - 2x(x+y)}{(1+x^2+y^2)^2} = \frac{1 - x^2 - 2xy +y^2 }{(1+x^2+y^2)^2}, \] \[ \partial_y f(x,y) = \frac{1+x^2+y^2 - 2y(x+y)}{(1+x^2+y^2)^2} = \frac{1 + x^2 - 2xy -y^2 }{(1+x^2+y^2)^2}. \] A point \((x,y)\) is a critical point if and only if \(\partial_xf= \partial_yf = 0\) at \((x,y)\). Thus, we are led to consider the equations
\[ 1 + x^2 - 2xy -y^2 = 0, \quad 1 - x^2 - 2xy + y^2 = 0. \] Subtracting the two equations, we find that \(x^2=y^2\). Adding the two equations, we find that \(1-2xy=0\). We conclude that the only critical points are
\[ (x,y) = \pm \left( \sqrt{1/2}, \sqrt{1/2} \right). \]This example illustrates a general feature of the task of finding critical points: it requires solving a system of equations (\(n\) of them in \(\R^n\), that is \(\partial_j f(\mathbf x) = 0\) for \(j=1,\ldots, n\)) and they are typically nonlinear equations. This means that there is in general no systematic way to find solutions. It may require factoring, recognizing and ignoring terms that are always positive like \(e^x\), substituting a variable from one equation to reduce the number of variables, or other tricks. The best way to improve at it is to practice, and a good way to practice is on WeBWorK.
Two proofs
Suppose that \(\mathbf a\) is a local extremum of \(f\). For concreteness, let’s assume that it is a local minimum; for a local maximum, apply the same argument to \(-f\). For a unit vector \(\mathbf u\), consider the function \(g_\mathbf u(s) = f(\mathbf a+s\mathbf u)\). This is defined for all \(s\) in an interval containing the origin. Since \(\mathbf a\) is a local minimum, \(s=0\) is a local minimum point for \(g_\mathbf u\).
Also, the chain rule implies that \(g\) is differentiable and that \(g_\mathbf u'(0) = \mathbf u \cdot \nabla f(\mathbf a)\). It follows that \(\mathbf u \cdot \nabla f(\mathbf a)=0\). Since this same argument applies to every unit vector \(\mathbf u\), it must be the case that \(\nabla f(\mathbf a)={\bf 0}\).Just like single variable functions, once we have identified critical points, we may be able to determine if they are local minima, maxima, or saddle points by looking at the second deriviatives. And just like the single variable case, the test may not tell us anything – for example, when the Hessian has a \(0\) eigenvalue.
Suppose \(f:S\to \R\) is \(C^2\).
If \(\mathbf a\) is a local minimum point for \(f\), then \(\mathbf a\) is a critical point of \(f\) and \(H(\mathbf a)\) is nonnegative definite.
If \(\mathbf a\) is critical point and \(H(\mathbf a)\) is positive definite, then \(\mathbf a\) is a local minimum point.
Two notes:
The proof of the necessary condition is left as an exercise. For hints, see problem 16.
Similar conclusions hold for a local maximum point, if one replaces positive/nonnegative definite by negative/nonpositive definite. To make sense of the indefinite case, we add a definitionThese points are named for a horse’s saddle. When you’re sitting on a horse, you’re at a point that is flat (\(\nabla f(\mathbf a) = \mathbf 0\)), but the saddle goes up in one direction (toward the horse’s head) and down in another direction (towards the sides).
Summarizing the results from the Second Derivative Test:
Suppose that \(f\) is \(C^2\) and \(\nabla f(\mathbf a)=\bf 0\).
If a critical point is nondegenerate, then the Corollary says that by finding all the eigenvalues of the Hessian, we can immediately determine whether it is a local max, local min or saddle point. If it is degenerate, the Corollary doesn’t apply, and it is harder to determine if it is a local max, local min, or saddle point.
Find all critical points of \(f(x,y) = x^4+y^4 - 8x^2+4y\), and classify the nondegenerate critical points.
Classifying a critical point means determining whether it is a local minimum, local maximum, or saddle point.
Solution
We first compute
\[ \nabla f(x,y) =\binom{\partial_x f}{\partial_y f} = \binom{4x^3 - 16x}{4y^3+4} = 4\binom{x(x^2-4)}{y^3+1} \] It is then easy to see that \(\nabla f(x,y)= {\bf 0}\) only at the points \((2,-1), (0,-1)\) and \((-2,-1)\).
We next compute
\[ H = \left( \begin{array}{cc}12x^2-16 &0\\0&12y^2 \end{array} \right) \] Thus,
\[ H(-2,-1) = \left( \begin{array}{cc}32 &0\\0&12 \end{array} \right), \quad H(0,-1) = \left( \begin{array}{cc}-16 &0\\0&12 \end{array} \right), \quad H(2,-1) = \left( \begin{array}{cc}32 &0\\0&12 \end{array} \right). \] We conclude from Theorems 2 and 4 that \((\pm 2, -1)\) are local minima, and \((0,-1)\) is a saddle point.Let \(f(x,y) = x^2 + a y^3 + b y^4\). Show that the origin is a critical point and classify it. The answer will depend on \(a\) and \(b\).
Solution
We first compute \(\nabla f = (2x, 3ay^2+4by^3)\), so the origin is a critical point. To classify it, we need the Hessian \[
H =
\left(
\begin{array}{cc}2 &0\\0&6ay+12by^2
\end{array}
\right).
\] Thus \(H = \left(\begin{array}{cc}2 &0\\0&0\end{array}\right)\) at the origin, for all \(a\) and \(b\), with eigenvalues \(0\) and \(2\). Thus we cannot determine the type of critical point from the second derivative test. We can however say that it is neither a local maximum nor a saddle point.
So the critical point is either a local minimum, or “none of the above”. It is a local minimum only if \(f(x,y)\ge 0\) for all \((x,y)\) close to \((0,0)\). The points \((0, \varepsilon)\) have values approximately \(a\varepsilon^3\) when \(\varepsilon\) is small, so they can be made negative if \(a\neq 0\). When \(a=0\), they have value \(b\varepsilon^4\). Hence, it is a local minimum if \(a=0\) and \(b\ge 0\), and otherwise it is neither a saddle point nor a local extremum.
If \(a,b\) are both nonzero, then \((0, -3a/4b)\) is a second critical point, but we were not asked about other points. Also, if \(a=b=0\), then the entire \(y\)-axis consists of critical points - the function does not care about \(y\) at all.Find all critical points of a function, and determine whether each nondegenerate critical point is a local min, local max, or saddle point.
or more briefly Find all critical points, and classify all nondegenerate critical points.
We might also ask you to classify degenerate critial points, when possible.
\(f(x,y) = (x^2-y^2)(6-y)\).
\(f(x,y) = y(3-x^2-y^2)\).
\(f(x,y)= x^3+y^2\)
\(f(x,y) = \sin(x)\sin(y)\).
\(f(x,y,z) = x^2 +xz +y^4 + z^2\).
\(f(x,y) = (y-x^2)(x-y)\).
\(f(x,y) = e^{x^3-3y}\).
\(f(x,y) = xy e^{x^2+y^2}\)
\(f(x,y,z) = (x^2+ y^2+z^2)e^{3x^2+2y^2 + z^2}\)
Let \(A\) be a symmetric \(n\times n\) matrix. Use \(\eqref{rayleigh}\) to prove that \[ \text{the largest eigenvalue of }A = \max_{{\mathbf u\in \R^n : |\mathbf u|=1} } \mathbf u^T A\mathbf u. \]
Hint
First, show that \(\lambda\) is an eigenvalue for \(A\) if and only if \(-\lambda\) is an eigenvalue for \(-A\).
(An excellent question from Folland, 2.8.4). Let \(f(x,y) = (y-x^2)(y-2x^2)\).
Suppose that \(f:\R^2\to \R\) is a \(C^2\) function, that \(\bf 0\) is a nondegenerate critical point, and that \(f(x,mx)\) has a local minimum at the origin for every \(m\). Show that \(f\) has a local minimum at the origin.
Let \(A\) be a symmetric \(n\times n\) matrix.
If possible, classify the critical points \(\mathbf a\) and \(\mathbf b\).
\(\Leftarrow\) \(\Uparrow\) \(\Rightarrow\)
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Canada License.