2.7: Critical Points

\(\newcommand{\R}{\mathbb R }\)

2.7: Critical Points

  1. Symmetric matrices
  2. Critical points
  3. Problems

\(\Leftarrow\)  \(\Uparrow\)  \(\Rightarrow\)

In single variable calculus, we can find critical points in an open interval by checking any point where the derivative is \(0\). The local minima and maxima are a subset of these, and the second derivative test gives us information about which they are. To generalize this, we’ll need to find out how our function is changing in different directions. For example, \(f(x)=x^3\) has a critical point at \(0\), but decreases to the left and increases to the right, so it is not a local min or max. Multivariable functions have many more possible directions than just left or right, and we’ll use our tools from linear algebra to keep track of them.

Symmetric matrices

Given a symmetric \(n\times n\) matrix \(A\), with entries \(a_{ij}\) for \(i,j \in \{1,\ldots, n\}\), we can define a function \(\R^n\to \R\) by sending \[ \mathbf x \mapsto (A\mathbf x)\cdot \mathbf x = \sum_{i,j=1}^n a_{ij}x_ix_j. \]

A symmetric \(n\times n\) matrix \(A\) is

In addition, we say that \(A\) is

A matrix \(A\) is indefinite if none of the above holds. Equivalently, \(A\) is indefinite if there exist \(\mathbf x,\mathbf y\in \R^n\) such that \[ \mathbf x^T A\mathbf x < 0 < \mathbf y^T A \mathbf y.\]

Eigenvalues of definite matrices

Recall from linear algebra that the eigenvalues of a matrix \(A\) are the roots of the polynomial \(\det(A-\lambda I)\), called the characteristic polynomial, denote \(p_A(\lambda)\). For example, if \(A\) is the \(2\times 2\) matrix \[A = \left( \begin{array}{cc} a&b \\ c&d \end{array}\right) \] then its characteristic polynomial is \(p_A(\lambda)=\lambda^2 - (a+d)\lambda + (ad-bc).\) In general the eigenvalues are complex numbers, but if \(A\) is symmetric, then they are always real.

If we can find the eigenvalues of a symmetric matrix, then the next theorem lets us determine whether it is positive definite, nonnegative definite, indefinite etc.

Suppose that \(A\) is a symmetric matrix. Then \[\begin{align*} A\text{ is positive definite } &\iff \text{ all of its eigenvalues are positive } \\ &\iff \exists\lambda_1>0\text{ such that }\mathbf x^T A\mathbf x \ge \lambda_1|\mathbf x|^2 \text{ for all }\mathbf x\in \R^n \end{align*}\] and \[ A\text{ is nonnegative definite } \iff \text{ all of its eigenvalues are nonnegative.} \] Finally, \[ A\text{ is indefinite } \iff \text{ $A$ has both positive and negative eigenvalues.} \]

The proof relies on the fact that if \(A\) is a symmetric matrix then \[\begin{equation}\label{rayleigh} \text{the smallest eigenvalue of }A = \min_{\{\mathbf u\in \R^n : |\mathbf u|=1\} } \mathbf u^T A\mathbf u. \end{equation}\] We will see that the minimum is attained, by the Extreme Value Theorem.

Details

Let us write \(\lambda_1\) to denote the smallest eigenvalue of \(A\). Then \[\begin{align} \text{ all eigenvalues }>0 &\quad \iff\quad \lambda_1 >0 \nonumber \\ &\quad \iff\quad \mathbf u^T A\mathbf u \ge \lambda_1 >0\text{ for every unit vector }\mathbf u\text{, by $\eqref{rayleigh}$}. \label{r1}\end{align}\] Also, any nonzero \(\mathbf x\in \R^n\) can be written \(\mathbf x = |\mathbf x|\, \mathbf u\) where \(\mathbf u\) is the unit vector in the direction of \(\mathbf x\). With this notation, \[\begin{align*} \text{$\eqref{r1}$ holds} &\iff \mathbf x^T A \mathbf x = |\mathbf x|^2 \mathbf u^TA\mathbf u \ge |\mathbf x|^2\lambda_1>0 \text{ for every nonzero }\mathbf x \nonumber \\ &\ \Longrightarrow A \text{ is positive definite}. \end{align*}\]

Conversely, we claim that is a matrix is positive definite, then all eigenvalues are positive. For this, let \(\mathbf v\) be a unit vector such that \(\mathbf v^T A \mathbf v = \displaystyle{ \min_{\{\mathbf u\in \R^n : |\mathbf u|=1\} }} \mathbf u^T A\mathbf u =\) the smallest eigenvalue \(\lambda_1\), by \(\eqref{rayleigh}\). Such a vector \(\mathbf v\) exists, due to the Extreme Value Theorem. If \(A\) is positive definite, it follows directly from the definition of “positive definite” that \(\mathbf v^T A \mathbf v>0\), and hence that the smallest eigenvalue is positive.

The final assertion, about indefinite matrices, follows by simlar arguments.

From the definition of eigenvalues, \(\lambda\) is an eigenvalue of \(A\) if and only if \(-\lambda\) is an eigenvalue of \(-A\). As a result, Theorem 1 implies that \(A\) is negative definite \(\iff\) all of its eigenvalues are \(<0\), and nonpositive definite \(\iff\) all of its eigenvalues are \(\le 0\).

Note that we have not said a word about eigenvectors. They appear in the proof of \(\eqref{rayleigh}\), but if all we want to do is check definiteness, then we can do so without finding eigenvectors.

Even for rather small matrices (say, \(3\times 3\)) it is already hard to find the eigenvalues “by hand”, since it involves computing a determinant and solving a cubic equation. Our selected examples may not be so difficult.

Example 1.

Let \[ A = \left( \begin{array}{ccc}1&1&0\\ 1&2&1\\0&1&1 \end{array} \right). \]Determine the definiteness (positive, nonnegative, indefinite, etc.) of the matrix \(A\).

Solution. The characteristic polynomial is \[ \det \left( \begin{array}{ccc}1-\lambda&1&0\\ 1&2-\lambda&1\\0&1&1-\lambda \end{array} \right). \] This is \[ (1-\lambda)^2(2-\lambda) - 2(1-\lambda) = (1-\lambda)(\lambda^2 - 3\lambda) \] and the roots are \(\lambda = 0, 1\), and \(3\). So Theorem 1 tells us that this matrix is nonnegative definite, but not positive definite.

In practice, if one wants to find the eigenvalues of a matrix that is \(3\times 3\) or larger, one normally uses mathematical software. Instead of computing determinants and finding roots of a polynomial, the standard algorithms find eigenvalues by determining a similar upper triangular matrix, using the QR factorization.

When \(n=2\), there is a simple rule for determining the sign of the eigenvalues of a symmetric matrix:

For the symmetric matrix \(A = \left( \begin{array}{cc}\alpha&\beta\\ \beta&\gamma \end{array}\right)\),

  1. If \(\det A = \alpha\gamma - \beta^2<0\), then \(A\) is indefinite.

  2. If \(\det A>0\), then

    • if \(\alpha>0\), then \(A\) is positive definite, while
    • if \(\alpha<0\), then \(A\) is negative definite.
  3. If \(\det A= 0\), then at least one eigenvalue equals zero, and

    • if \(\alpha>0\), then \(A\) is nonnegative definite
    • if \(\alpha<0\), then \(A\) is nonpositive definite.
Details.

We know that if \(\mu_1, \mu_2\) are the roots of the characteristic polynomial, then it can be factored

\[ \lambda^2 - (\alpha+\gamma)\lambda +(\alpha\gamma-\beta^2) = (\lambda-\mu_1)(\lambda-\mu_2) = \lambda^2 - (\mu_1+\mu_2)\lambda +\mu_1\mu_2. \] Comparing the two sides, we find that \[ \mu_1\mu_2 = (\alpha\gamma-\beta^2)=\det A, \text{ and } \mu_1+\mu_2 = \alpha+\gamma. \] Then we easily see that

The easiest way to remember the statement of the theorem may be to remember the idea of its proof, which is that \(\det A\) is the product of the eigenvalues. Thus

Note that this only applies to symmetric, \(2\times 2\) matrices. Try to determine where we have used both assumptions.

Critical points

Just like in single variable calculus, determining local minima and maxima on the interior of a domain is simplified by finding all critical points there. Our definition of a critical point will need to account for the additional dimensions, and our extension of the second derivative test will use all of the second derivatives, in the form of the Hessian.

Suppose that \(S\) is an open subset of \(\R^n\) and that \(f\) is a function \(S\to \R\). A point \(\mathbf a\in S\) is said to be a local minimum point for \(f\) if there exists \(\varepsilon>0\) such that \(f(\mathbf a)\le f(\mathbf x)\) for all \(\mathbf x\in S\) such that \(|\mathbf x-\mathbf a|<\varepsilon\). (Also sometimes called a “local minimum” or even just a “local min”.)

Similarly, \(\mathbf a\in S\) is a local maximum point (or “local maximum”, or “local max”) for \(f\) if there exists \(\varepsilon>0\) such that \(f(\mathbf a)\ge f(\mathbf x)\) for all \(\mathbf x\in S\) such that \(|\mathbf x-\mathbf a|<\varepsilon\).

The name local extremum is given to any point that is either a local min or a local max.

If \(S\) is an open subset of \(\R^n\) and \(f:S\to \R\) is differentiable, then a point \(\mathbf a\in S\) is a critical point if \(\nabla f(\mathbf a)= \mathbf 0\).

Example 2.

Find all critical points of \(f(x,y) = \frac {x+y}{1+x^2+y^2}\).

Solution To do this we compute

\[ \partial_x f(x,y) = \frac{1+x^2+y^2 - 2x(x+y)}{(1+x^2+y^2)^2} = \frac{1 - x^2 - 2xy +y^2 }{(1+x^2+y^2)^2}, \] \[ \partial_y f(x,y) = \frac{1+x^2+y^2 - 2y(x+y)}{(1+x^2+y^2)^2} = \frac{1 + x^2 - 2xy -y^2 }{(1+x^2+y^2)^2}. \] A point \((x,y)\) is a critical point if and only if \(\partial_xf= \partial_yf = 0\) at \((x,y)\). Thus, we are led to consider the equations

\[ 1 + x^2 - 2xy -y^2 = 0, \quad 1 - x^2 - 2xy + y^2 = 0. \] Subtracting the two equations, we find that \(x^2=y^2\). Adding the two equations, we find that \(1-2xy=0\). We conclude that the only critical points are

\[ (x,y) = \pm \left( \sqrt{1/2}, \sqrt{1/2} \right). \]

This example illustrates a general feature of the task of finding critical points: it requires solving a system of equations (\(n\) of them in \(\R^n\), that is \(\partial_j f(\mathbf x) = 0\) for \(j=1,\ldots, n\)) and they are typically nonlinear equations. This means that there is in general no systematic way to find solutions. It may require factoring, recognizing and ignoring terms that are always positive like \(e^x\), substituting a variable from one equation to reduce the number of variables, or other tricks. The best way to improve at it is to practice, and a good way to practice is on WeBWorK.

If \(f:S\to \R\) is differentiable, then every local extremum is a critical point.

Two proofs

Suppose that \(\mathbf a\) is a local extremum of \(f\). For concreteness, let’s assume that it is a local minimum; for a local maximum, apply the same argument to \(-f\). For a unit vector \(\mathbf u\), consider the function \(g_\mathbf u(s) = f(\mathbf a+s\mathbf u)\). This is defined for all \(s\) in an interval containing the origin. Since \(\mathbf a\) is a local minimum, \(s=0\) is a local minimum point for \(g_\mathbf u\).

Also, the chain rule implies that \(g\) is differentiable and that \(g_\mathbf u'(0) = \mathbf u \cdot \nabla f(\mathbf a)\). It follows that \(\mathbf u \cdot \nabla f(\mathbf a)=0\). Since this same argument applies to every unit vector \(\mathbf u\), it must be the case that \(\nabla f(\mathbf a)={\bf 0}\).
For a different approach, we will prove the contrapositive. So, we assume that \(\nabla f(\mathbf a)\ne {\bf 0}\), and we will show that \(\mathbf a\) is not a local extremum. To do this, let \(\mathbf u = \frac {\nabla f(\mathbf a)}{|\nabla f(\mathbf a)|}\). Then according to Taylor’s Theorem, for \(h\) small we have \[ f(\mathbf a+ h \mathbf u) - f(\mathbf a) = \nabla f(\mathbf a) \cdot h \mathbf u + R_{\mathbf a, 1}(h \mathbf u) = h|\nabla f(\mathbf a)|+ R_{\mathbf a, 1}(h \mathbf u) \] and \(\frac 1{|h|}R_{\mathbf a, 1}(h \mathbf u)\to 0\) as \(h\to 0\). It follows that there exists \(h_0>0\) such that the right-hand side is positive for all \(h\in (0,h_0)\) and negative for all \(h\in (-h_0,0)\). Thus \(\mathbf a\) is neither a local minimum nor a local maximum.

Just like single variable functions, once we have identified critical points, we may be able to determine if they are local minima, maxima, or saddle points by looking at the second deriviatives. And just like the single variable case, the test may not tell us anything – for example, when the Hessian has a \(0\) eigenvalue.

Suppose \(f:S\to \R\) is \(C^2\).

  1. If \(\mathbf a\) is a local minimum point for \(f\), then \(\mathbf a\) is a critical point of \(f\) and \(H(\mathbf a)\) is nonnegative definite.

  2. If \(\mathbf a\) is critical point and \(H(\mathbf a)\) is positive definite, then \(\mathbf a\) is a local minimum point.

Two notes:

Proof of the sufficient condition.
If \(f\) is \(C^2\) and \(\nabla f(\mathbf a)=0\), then \[ f(\mathbf a+ \mathbf h) = f(\mathbf a) +\frac 12 \mathbf h^T H(\mathbf a)\mathbf h + R_{\mathbf a,2}(\mathbf h),\ \ \text{ and }\lim_{{\bf h \to 0}}\frac 1{|\mathbf h|^2}R_{\mathbf a,2}(\mathbf h) = 0, \] by Taylor’s Theorem. We know from Theorem 1 that if \(H(\mathbf a)\) is positive definite, then there exists \(\lambda_1>0\) such that \(\mathbf h^T H(\mathbf a)\mathbf h \ge \lambda_1 |\mathbf h|^2\) for all \(\mathbf h \in \R^n\). Thus \[ f(\mathbf a+\mathbf h)\ge f(\mathbf a) + |\mathbf h|^2\left(\frac 12 \lambda_1 +\frac 1{|\mathbf h|^2}R_{\mathbf a,2}(\mathbf h) \right). \] The definition of limit implies that there exists \(\varepsilon>0\) such that \(\left|\frac 1{|\mathbf h|^2}R_{\mathbf a,2}(\mathbf h)\right|< \frac 14 \lambda_1\) if \(|\mathbf h|<\varepsilon\), so it follows that \[ f(\mathbf a+\mathbf h)\ge f(\mathbf a) +\frac {\lambda_1 }4 |\mathbf h|^2, \qquad \text{ if }|\mathbf h|< \varepsilon . \] This implies that \(\mathbf a\) is a local minimum point, establishing the sufficient condition.

The proof of the necessary condition is left as an exercise. For hints, see problem 16.

Similar conclusions hold for a local maximum point, if one replaces positive/nonnegative definite by negative/nonpositive definite. To make sense of the indefinite case, we add a definition
A critical point \(\mathbf a\in S\) is a saddle point if \(H(\mathbf a)\) is indefinite.

These points are named for a horse’s saddle. When you’re sitting on a horse, you’re at a point that is flat (\(\nabla f(\mathbf a) = \mathbf 0\)), but the saddle goes up in one direction (toward the horse’s head) and down in another direction (towards the sides).

Summarizing the results from the Second Derivative Test:

Suppose that \(f\) is \(C^2\) and \(\nabla f(\mathbf a)=\bf 0\).

If \(f\) is a \(C^2\) function and \(\mathbf a\) is a critical point of \(f\), we say that a critical point is degenerate if \(\det H(\mathbf a)=0\), and nondegenerate if \(\det H(\mathbf a)\ne 0\).

If a critical point is nondegenerate, then the Corollary says that by finding all the eigenvalues of the Hessian, we can immediately determine whether it is a local max, local min or saddle point. If it is degenerate, the Corollary doesn’t apply, and it is harder to determine if it is a local max, local min, or saddle point.

Example 3.

Find all critical points of \(f(x,y) = x^4+y^4 - 8x^2+4y\), and classify the nondegenerate critical points.

Classifying a critical point means determining whether it is a local minimum, local maximum, or saddle point.

Solution We first compute

\[ \nabla f(x,y) =\binom{\partial_x f}{\partial_y f} = \binom{4x^3 - 16x}{4y^3+4} = 4\binom{x(x^2-4)}{y^3+1} \] It is then easy to see that \(\nabla f(x,y)= {\bf 0}\) only at the points \((2,-1), (0,-1)\) and \((-2,-1)\).

We next compute

\[ H = \left( \begin{array}{cc}12x^2-16 &0\\0&12y^2 \end{array} \right) \] Thus,

\[ H(-2,-1) = \left( \begin{array}{cc}32 &0\\0&12 \end{array} \right), \quad H(0,-1) = \left( \begin{array}{cc}-16 &0\\0&12 \end{array} \right), \quad H(2,-1) = \left( \begin{array}{cc}32 &0\\0&12 \end{array} \right). \] We conclude from Theorems 2 and 4 that \((\pm 2, -1)\) are local minima, and \((0,-1)\) is a saddle point.

Example 4.

Let \(f(x,y) = x^2 + a y^3 + b y^4\). Show that the origin is a critical point and classify it. The answer will depend on \(a\) and \(b\).

SolutionWe first compute \(\nabla f = (2x, 3ay^2+4by^3)\), so the origin is a critical point. To classify it, we need the Hessian \[ H = \left( \begin{array}{cc}2 &0\\0&6ay+12by^2 \end{array} \right). \] Thus \(H = \left(\begin{array}{cc}2 &0\\0&0\end{array}\right)\) at the origin, for all \(a\) and \(b\), with eigenvalues \(0\) and \(2\). Thus we cannot determine the type of critical point from the second derivative test. We can however say that it is neither a local maximum nor a saddle point.

So the critical point is either a local minimum, or “none of the above”. It is a local minimum only if \(f(x,y)\ge 0\) for all \((x,y)\) close to \((0,0)\). The points \((0, \varepsilon)\) have values approximately \(a\varepsilon^3\) when \(\varepsilon\) is small, so they can be made negative if \(a\neq 0\). When \(a=0\), they have value \(b\varepsilon^4\). Hence, it is a local minimum if \(a=0\) and \(b\ge 0\), and otherwise it is neither a saddle point nor a local extremum.

If \(a,b\) are both nonzero, then \((0, -3a/4b)\) is a second critical point, but we were not asked about other points. Also, if \(a=b=0\), then the entire \(y\)-axis consists of critical points - the function does not care about \(y\) at all.

Problems

Basic

Find all critical points of a function, and determine whether each nondegenerate critical point is a local min, local max, or saddle point.

or more briefly Find all critical points, and classify all nondegenerate critical points.

We might also ask you to classify degenerate critial points, when possible.

  1. \(f(x,y) = (x^2-y^2)(6-y)\).

  2. \(f(x,y) = y(3-x^2-y^2)\).

  3. \(f(x,y)= x^3+y^2\)

  4. \(f(x,y) = \sin(x)\sin(y)\).

  5. \(f(x,y,z) = x^2 +xz +y^4 + z^2\).

  6. \(f(x,y) = (y-x^2)(x-y)\).

  7. \(f(x,y) = e^{x^3-3y}\).

  8. \(f(x,y) = xy e^{x^2+y^2}\)

  9. \(f(x,y,z) = (x^2+ y^2+z^2)e^{3x^2+2y^2 + z^2}\)

Advanced

  1. Let \(A\) be a symmetric \(n\times n\) matrix. Use \(\eqref{rayleigh}\) to prove that \[ \text{the largest eigenvalue of }A = \max_{{\mathbf u\in \R^n : |\mathbf u|=1} } \mathbf u^T A\mathbf u. \]

    Hint First, show that \(\lambda\) is an eigenvalue for \(A\) if and only if \(-\lambda\) is an eigenvalue for \(-A\).

  2. (An excellent question from Folland, 2.8.4). Let \(f(x,y) = (y-x^2)(y-2x^2)\).

    • Prove that the origin is a degenerate critical point for \(f\).
    • For any \(m\in \R\), prove that the function \(g_m(x) = f(x,mx)\) has a local minimum at \(x=0\). This says the restriction of \(f\) to any line through the origin has a local minimum at the origin.
    • Prove that \(f\) does not have a local minimum at the origin.
      Hint Draw a picture of the \(xy\) plane and indicate where \(f\) is positive and negative.
  3. Suppose that \(f:\R^2\to \R\) is a \(C^2\) function, that \(\bf 0\) is a nondegenerate critical point, and that \(f(x,mx)\) has a local minimum at the origin for every \(m\). Show that \(f\) has a local minimum at the origin.

  4. Let \(A\) be a symmetric \(n\times n\) matrix.

    • Prove that if any diagonal entry is negative, then \(A\) has a negative eigenvalue.
    • Prove that if any diagonal entry is positive, then \(A\) has a positive eigenvalue.
Hint Let the entries of \(A\) be denoted \((a_{ij})\), and assume that the \(k\)th diagonal entry is negative for some \(k\), that is, \(a_{kk}<0\). Try to find a unit vector \(\mathbf u\) such that \(a_{kk} = \mathbf u^T A \mathbf u\).
  1. Suppose that \(f\) is a \(C^2\) function on \(\R^4\), with exactly two critical points \(\mathbf a, \mathbf b\). Suppose you know that \(f\) attains either its min or its max (but you do not know which), and also that \[ H(\mathbf a) = \left( \begin{array}{cccc} 5&-2&3&1\\ -2&4&-1&0\\ 3&-1&2&1\\ 1&0&1&7\\ \end{array} \right),\qquad H(\mathbf b) = \left( \begin{array}{ccc} 1&2&3&12\\ 2&4&1&2\\ 3&1&-2&5\\ 12&2&5&3 \end{array} \right). \]

If possible, classify the critical points \(\mathbf a\) and \(\mathbf b\).

Hint Somewhere nearby is some fact or idea that makes it possible to solve this problem at a single glance.
  1. Prove that if \(f\) and \(g\) are \(C^2\) functions, then \[ H_{fg} = g H_f + f H_g + \nabla f \, Dg + \nabla g \, Df. \] Here, for any function \(w\), we are writing \(H_w\) to denote the Hessian of \(w\). Note that \(\nabla f\) is a column vector and \(Dg\) is a row vector, so \(\nabla f\, Dg\) is a square matrix. (Similarly \(\nabla g \, Df\))
16. Prove the necessary condition in the second derivative test. (See the Second Derivative Test for its statement.)
HintProve the contrapositive: if \(H(\mathbf a)\) has a negative eigenvalue \(\lambda <0\), then \(\mathbf a\) is not a local minimum point. To do this, assume that \(\mathbf v\) is an eigenvector for the eigenvalue \(\lambda\), and show that there exists \(s_0>0\) such that \[ f(\mathbf a+s\mathbf v) < f(\mathbf a)\text{ for } 0<|s|<s_0. \]
Different hint If \(\mathbf a\) is a local minimum point for \(f\), then for every vector \(\mathbf h\), the function \(g_\mathbf h(s) = f(\mathbf a+ s\mathbf h)\) has a local minimum at \(s=0\). Thus \(g_\mathbf h''(0)\ge 0\) if is exists. Show that \(g_\mathbf h''(0)\) exists by using the chain rule to compute it, and by expressing \(g_\mathbf h''\) in terms of \(\mathbf h\) and derivatlves of \(f\), complete the proof of the theorem.

\(\Leftarrow\)  \(\Uparrow\)  \(\Rightarrow\)

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Canada License.