2.7: Critical Points

$\newcommand{\R}{\mathbb R }$ $\newcommand{\N}{\mathbb N }$ $\newcommand{\Z}{\mathbb Z }$ $\newcommand{\bfa}{\mathbf a}$ $\newcommand{\bfb}{\mathbf b}$ $\newcommand{\bfc}{\mathbf c}$ $\newcommand{\bff}{\mathbf f}$ $\newcommand{\bfg}{\mathbf g}$ $\newcommand{\bfG}{\mathbf G}$ $\newcommand{\bfh}{\mathbf h}$ $\newcommand{\bfu}{\mathbf u}$ $\newcommand{\bfv}{\mathbf v}$ $\newcommand{\bfx}{\mathbf x}$ $\newcommand{\bfp}{\mathbf p}$ $\newcommand{\bfy}{\mathbf y}$ $\newcommand{\ep}{\varepsilon}$

Critical Points

Symmetric matrices

Critical points

Problems

symmetric matrices

Given a symmetric $n\times n$ matrix $A$, with entries denoted $a_{ij}$ for $i,j=1,\ldots, n$, we often will need to consider the associated quadratic function $$ \bfx \mapsto \bfx^T A \bfx = (A\bfx)\cdot \bfx = \sum_{i,j=1}^n a_{ij}x_ix_j. $$

Definitions 1. A symmetric $n\times n$ matrix $A$ is

positive definite if $ \bfx^T A \bfx>0$ for all $\bfx\in \R^n\setminus\{ {\bf 0}\}$.
nonnegative definite if $ \bfx^T A \bfx\ge 0$ for all $\bfx\in \R^n$.

In addition, we say that $A$ is

negative definite if $-A$ is positive definite.
nonpositive definite if $-A$ is nonnegative definite.

A matrix $A$ is indefinite if none of the above holds. Equivalently, $A$ is indefinite if there exist $\bfx,\bfy\in \R^n$ such that $$ \bfx^T A\bfx < 0 < \bfy^T A \bfy. $$

eigenvalues and positive/nonnegative/etc definiteness

Recall from linear algebra that the eigenvalues of a matrix $A$ are the roots of the polynomial $\det(A-\lambda I)$, called the characteristic polynomial. For example, if $A$ is the $2\times 2$ matrix $$A = \left( \begin{array}{cc} a&b \\ c&d \end{array}\right) $$ then its characteristic polynomial is \begin{align} \det\left[ \left( \begin{array}{cc} a&b \\ c&d \end{array}\right) - \lambda \left( \begin{array}{cc} 1&0 \\ 0&1 \end{array}\right) \right] &= \det \left( \begin{array}{cc} a-\lambda &b \\ c&d-\lambda \end{array}\right) \nonumber\\ &=(a-\lambda)(d-\lambda)-bc \nonumber\\ &= \lambda^2 - (a+d)\lambda + (ad-bc). \end{align} In general the eigenvalues are complex numbers, but if $A$ is symmetric, then they are always real. (In case you never saw this, we will prove it, using calculus rather than linear algebra, in a later homework assignment.)

If we can find the eigenvalues of a symmetric matrix, then the next theorem lets us determine whether it is positive definite, nonnegative definite, indefinite etc.

Theorem 1. Assume that $A$ is a symmetric matrix. Then \begin{align} A\mbox{ is positive definite } &\iff \mbox{ all its eigenvalues are positive }\nonumber \\ &\iff \exists\lambda_1>0\mbox{ such that }\bfx^T A\bfx \ge \lambda_1|\bfx|^2 \mbox{ for all }\bfx\in \R^n.\nonumber \end{align} and $$ A\mbox{ is nonnegative definite } \iff \mbox{ all its eigenvalues are nonnegative.} $$ Finally, $$ A\mbox{ is indefinite } \iff \mbox{ $A$ has both positive and negative eigenvalues.} $$

The proof relies on the fact that if $A$ is a symmetric matric then \begin{equation}\label{rayleigh} \mbox{the smallest eigenvalue of }A = \min_{\{\bfu\in \R^n : |\bfu|=1\} } \bfu^T A\bfu \end{equation} (Note that the min is attained, by the Extreme Value Theorem.) This will be proved in a later assignment. For now we will feel free to use it in the proof below.

Proof of Theorem 1.

Let us write $\lambda_1$ to denote the smallest eigenvalue of $A$. Then \begin{align} \mbox{ all eigenvalues }>0 &\quad \iff\quad \lambda_1 >0
\nonumber \\ &\quad \iff\quad \bfu^T A\bfu \ge \lambda_1 >0\mbox{ for every unit vector }\bfu\mbox{, by \eqref{rayleigh}}. \label{r1}\end{align} Also, any nonzero $\bfx\in \R^n$ can be written $\bfx = |\bfx|\, \bfu$ where $\bfu$ is the unit vector $\bfu = \frac{\bfx}{|\bfx|}$. With this notation, \begin{align} \mbox{\eqref{r1} holds} &\iff \bfx^T A \bfx = |\bfx|^2 \bfu^TA\bfu \ge |\bfx|^2\lambda_1 \mbox{ for every nonzero }\bfx \nonumber \\ &\ \Longrightarrow A \mbox{ is positive definite}.\nonumber \end{align}

Conversely, we claim that positive definite $\Longrightarrow$ all eigenvalues (including the smallest) are positive. For this, let $\bfv$ be a unit vector such that $\bfv^T A \bfv = \min_{\{\bfu\in \R^n : |\bfu|=1\} } \bfu^T A\bfu = $ the smallest eigenvalue $\lambda_1$, by \eqref{rayleigh}. Such a vector $\bfv$ exists, due to the Extreme Value Theorem. If $A$ is positive definite, it follows directly from the definition of positive definite that $\bfv^T A \bfv>0$, and hence that the smallest eigenvalue is positive.

The final assertion, about indefinite matrices, follows by simlar arguments. Details omitted. $\quad \Box$

It is easy to check that $\lambda$ is an eigenvalue of $A$ if and only if $-\lambda$ is an eigenvalue of $-A$. As a result, Theorem 1 implies that $A$ is negative definite iff all its eigenvalues are $<0$, and nonpositive definite iff all of its eigenvalues are $\le 0$.

Note that we have not said a word about eigenvectors. They appear in the proof of \eqref{rayleigh}, but if all we want to do is check positive definiteness etc, then we can do so without finding eigenvectors.

Even for rather small matrices (say, $3\times 3$) it is already in general hard to find the eigenvalues by hand, since it involves solving a cubic equation, and this is usually not easy. But it is possible to concoct examples, like the one below, that are not so difficult.

Example 1. Let $$ A := \left( \begin{array}{ccc}1&1&0\\ 1&2&1\\0&1&1 \end{array} \right). $$ Then the characteristic polynomial is $$ \det \left( \begin{array}{ccc}1-\lambda&1&0\\ 1&2-\lambda&1\\0&1&1-\lambda \end{array} \right). $$ If you work it out, this is $$ (1-\lambda)^2(2-\lambda) - 2(1-\lambda) = (1-\lambda)(\lambda^2 - 3\lambda) $$ and the roots are $\lambda = 0, 1$, and $3$. So Theorem 1 tells us that this matrix is nonnegative definite, but not positive definite.

In practice, if one wants to find the eigenvalues of a matrix that is $3\times 3$ or larger, one normally uses mathematical software.

When $n=2$, there is a simple rule for determining the sign of the eigenvalues of a symmetric matrix:

Theorem 2. For the matrix $A = \left( \begin{array}{cc}\alpha&\beta\\ \beta&\gamma \end{array}\right)$,

if $\det A = \alpha\gamma - \beta^2<0$, then $A$ is indefinite.
if $\det A>0$ then
- if $\alpha>0$ then $A$ is positive definite
- if $\alpha<0$ then $A$ is negative definite.
if $\det A= 0$ then at least one eigenvalue equals zero.

In the third case we can easily determine the sign of the second eigenvalue (it is the same as the sign of $\alpha$) and hence whether $A$ is nonnegative or nonpositive definite.

Proof of Theorem 2.

We know that if $\mu_1, \mu_2$ are the roots of the characteristic polynomial, then it can be factored $$ \lambda^2 - (\alpha+\gamma)\lambda +(\alpha\gamma-\beta^2) = (\lambda-\mu_1)(\lambda-\mu_2) = \lambda^2 - (\mu_1+\mu_2)\lambda +\mu_1\mu_2. $$ Comparing the two sides, we find that $$ \mu_1\mu_2 = (\alpha\gamma-\beta^2), \qquad \mu_1+\mu_2 = \alpha+\gamma. $$ Then we easily see that

if $\alpha\gamma-\beta^2 <0$ then $\mu_1$ and $\mu_2$ have opposite signs and so $A$ is indefinite.
if $\alpha\gamma-\beta^2 =0$ then at least one eigenvalue equals $0$.
if $\alpha\gamma-\beta^2 >0$ then $\mu_1$ and $\mu_2$ have the same sign. If $\alpha>0$ then $\gamma>0$ (otherwise $\alpha\gamma-\beta^2 \le 0$) so $\mu_1+\mu_2 = \alpha+\gamma >0$. Thus in this case both eigenvalues are positive, so $A$ is positive definite. The case $\alpha<0$ is similar.

The easiest way to remember the statement of the theorem may be to remember the idea of its proof, which is that $\det A $ is the product of the eigenvalues. Thus

$\det A<0$ iff the two eigenvalues are nonzero and have opposite signs (making $A$ indefinite).
$\det A>0$ iff the two eigenvalues are nonzero and have the same sign. Since $\alpha+\gamma = $ the sum of the eigenvalues, the sign is the same as that of $\alpha+\gamma$ which, it turns out, is the same as the sign of $\alpha$ in this case.

Critical points

Assume that $S$ is an open subset of $\R^n$ and that $f$ is a function $S\to \R$.

A point $\bfa\in S$ is said to be a local minimum point for $f$ if there exists $\ep>0$ such that $f(\bfa)\le f(\bfx)$ for all $\bfx\in S$ such that $|\bfx-\bfa|<\ep$. (Also sometimes called a local minimum or even just a local min.)

Similarly, $\bfa\in S$ is a local maximum point (or local maximum, or local max) for $f$ if there exists $\ep>0$ such that $f(\bfa)\ge f(\bfx)$ for all $\bfx\in S$ such that $|\bfx-\bfa|<\ep$.

A point is a local extremum if it is either a local min or a local max.

If $S$ is an open subset of $\R^n$ and $f:S\to \R$ is differentiable, then a point $\bfa\in S$ is a critical point if $\nabla f(\bfa)= {\bf 0}$.

A critical point $\bfa\in S$ is a saddle point if $H(\bfa)$ is indefinite.

Example 2. Find all critical points of $f(x,y) = \frac {x+y}{1+x^2+y^2}$.

To do this we compute $$ \partial_x f(x,y) = \frac{1+x^2+y^2 - 2x(x+y)}{(1+x^2+y^2)^2} = \frac{1 - x^2 - 2xy +y^2 }{(1+x^2+y^2)^2}, $$ $$ \partial_y f(x,y) = \frac{1+x^2+y^2 - 2y(x+y)}{(1+x^2+y^2)^2} = \frac{1 + x^2 - 2xy -y^2 }{(1+x^2+y^2)^2}. $$ A point $(x,y)$ is a critical point iff $\partial_xf= \partial_yf = 0$ at $(x,y)$. Thus, the equations $$ 1 + x^2 - 2xy -y^2 = 0, \quad 1 - x^2 - 2xy + y^2 = 0 $$ Subtracting the two equations, we find that $x^2=y^2$. Adding the two equations, we find that $1-2xy=0$. We conclude that the only solutions (that is,the only critical points) are $$ (x,y) = \pm ( \sqrt{1/2}, \sqrt{1/2} ). $$

This problem illustrates a general feature of the task of finding critical points: it requires solving a system of equations ($n$ of them in $\R^n$, that is $\partial_j f(\bfx) = 0$ for $j=1,\ldots, n$) and they are typically nonlinear equations. This means that there is in general no systematic way to find solutions, and so the only technique is: just try to solve it until you figure out a way. Experience helps.

Theorem 3. (first derivative test). If $f:S\to \R$ is differentiable, then every local extremum is a critical point.

Two proofs, both with some small details left for you to check.

Proof 1. Assume that $\bfa$ is a local extremum of $f$. For concreteness, let's assume that it is a local minimum; the case of local maximum is basically identical. For a unit vector $\bfu$, consider the function $g_\bfu(s) := f(\bfa+s\bfu)$. This is defined for all $s$ in an interval containing the origin. Check that $s=0$ is a local minimum point for $g_\bfu$. (This is straightforward, if not obvious.) Also, the chain rule implies that $g$ is differentiable and that $g_\bfu'(0) = \bfu \cdot \nabla f(\bfa)$. It follows that $\bfu \cdot \nabla f(\bfa)=0$. Since this same argument applies to every unit vector $\bfu$, it must be the case that $\nabla f(\bfa)={\bf 0}$. $\quad\Box$

Proof 2. For a different approach, we will prove the contrapositive. So, we assume that $\nabla f(\bfa)\ne {\bf 0}$, and we will show that $\bfa$ is not a local minimum. To do this, let $\bfu = \frac {\nabla f(\bfa)}{|\nabla f(\bfa)|}$. Then according to Taylor's Theorem, for $h$ small we have $$ f(\bfa+ h \bfu) - f(\bfa) = \nabla f(\bfa) \cdot h \bfu + R_{\bfa, 1}(h \bfu) = h|\nabla f(\bfa)|+ R_{\bfa, 1}(h \bfu) $$ and $\frac 1{|h|}R_{\bfa, 1}(h \bfu)\to 0$ as $h\to 0$. It follows (this is a straightforward exercise) that there exists $h_0>0$ such that the right-hand side is positive for all $h\in (0,h_0)$ and negative for all $h\in (-h_0,0)$. Thus $\bfa$ is neither a local minimum nor a local maximum. $\quad\Box$

Theorem 4. (second derivative test)

(necessary condition for a local minimum). If $f:S\to \R$ is $C^2$ and $\bfa$ is a local minimum point for $f$, then $\bfa$ is a critical point of $f$ and $H(\bfa)$ is nonegative definite.
(sufficient condition for a local minimum). If $\bfa$ is critical point and $H(\bfa)$ is positive definite, then $\bfa$ is a local minimum point.

Proof of the sufficient condition. If $f$ is $C^2$ and $\nabla f(\bfa)=0$, then $$ f(\bfa+ \bfh) = f(\bfa) +\frac 12 \bfh^T H(\bfa)\bfh + R_{\bfa,2}(\bfh),\ \ \mbox{ and }\lim_{{\bf h \to 0}}\frac 1{|\bfh|^2}R_{\bfa,2}(\bfh) = 0, $$ by Taylor's Theorem. We know from Theorem 1 that if $H(\bfa)$ is positive definite, then there exists $\lambda_1>0$ such that $ \bfh^T H(\bfa)\bfh \ge \lambda_1 |\bfh|^2$ for all $\bfh \in \R^n$. Thus $$ f(\bfa+\bfh)\ge f(\bfa) + |\bfh|^2\left(\frac 12 \lambda_1 +\frac 1{|\bfh|^2}R_{\bfa,2}(\bfh) \right). $$ The definition of limit implies that there exists $\ep>0$ such that $\left|\frac 1{|\bfh|^2}R_{\bfa,2}(\bfh)\right|< \frac 14 \lambda_1$ if $|\bfh|<\ep$, so it follows that $$ f(\bfa+\bfh)\ge f(\bfa) +\frac {\lambda_1 }4 |\bfh|^2, \qquad \mbox{ if }|\bfh|< \ep . $$ This implies that $\bfa$ is a local minimum point, establishing the sufficient condition. $\quad \Box$

The proof of the necessary condition is left as an exercise. For hints, see the Problems.

Similar conclusions hold for a local maximum point, if one replaces positive/nonnegative definite by negative/nonpositive definite. We will summarize in the following statement:

Corollary. Assume that $f$ is $C^2$ and $\nabla f(\bfa)=\bf 0$.
If $H(\bfa)$ is positive definite, then $\bfa$ is a local minimum point;
If $H(\bfa)$ is negative definite, then $\bfa$ is a local maximum point;
If $H(\bfa)$ is indefinite, then $\bfa$ is a saddle point.
If none of the above hold, then we cannot determine the character of the critical point without further thought.

Definition. If $f$ is a $C^2$ function and $\bfa$ is a critical point of $f$, we say that a critical point is degenerate if $\det H(\bfa)=0$, and nondegenerate if $\det H(\bfa)\ne 0$.

If a critical point is nondegenerate, and if we know all the eigenvalues of the Hessian, then we can immediately determine from the Corollary whether it is a local max, local min or saddle point. If it id degenerate, it is harder to determine this.

Example 3. Find and, if possible, classify all critical points of $f(x,y) = x^4+y^4 - 8x^2+4y$.

(To classify a critical point means to determine whether it is a local minimum, local maximum, or saddle point.)

We first compute $$ \nabla f(x,y) =\binom{\partial_x f}{\partial_y f} = \binom{4x^3 - 16x}{4y^3+4} = 4\binom{x(x^2-4)}{y^3+1} $$ It is then easy to see that $\nabla f(x,y)= {\bf 0}$ only at the points $(2,-1), (0,-1)$ and $(-2,-1)$.

We next compute $$ H = \left( \begin{array}{cc}12x^2-16 &0\\0&12y^2 \end{array} \right) $$ Thus, $$ H(-2,-1) = \left( \begin{array}{cc}32 &0\\0&12 \end{array} \right), \quad H(0,-1) = \left( \begin{array}{cc}-16 &0\\0&12 \end{array} \right), \quad H(2,-1) = \left( \begin{array}{cc}32 &0\\0&12 \end{array} \right). $$ We conclude from Theorems 2 and 4 that $(\pm 2, -1)$ are local minima, and $(0,-1)$ is a saddle point.

Example 4. Let $f(x,y) = x^2 + a y^3 + b y^4$. Show that the origin is a critical point, and if possible classify it. The answer may depend on $a$ and $b$.

To do this, we compute $$ \nabla f = \binom{2x}{3ay^2+4by^3}, \quad H = \left( \begin{array}{cc}2 &0\\0&6ay+12by^2 \end{array} \right). $$ Thus the origin is a critical point, and $H = \left( \begin{array}{cc}2 &0\\0&0 \end{array} \right)$ at the origin, for all $a$ and $b$, with eigenvalues $0$ and $2$. Thus we cannot determine the character of the critical point from the second derivative test. We can however say that it is neither a local maximum nor a saddle point.

So the critical point is either a local minimum, or none of the above. It is a local minimum only if $f(x,y)\ge 0$ for all $(x,y)$ close to $(0,0)$, and it is easy to see that this happens if and only if $a=0 $ and $b\ge 0$.

Conclusion: $(0,0)$ is a critical point, and $$ \begin{cases}\ \ \ \ \ \mbox{it is a local min if }a=0 \mbox{ and }b\ge 0,&\\ \quad\mbox{ otherwise it is neither a saddle point nor a local extremum.}& \end{cases} $$

(If $a,b$ are both nonzero, then $(0, -3a/4b)$ is a second critical point, but we were not asked about that. Also, if $a=b=0$, then the entire $y$-axis consists of critical points.)

Problems

Basic skills

Find all critical points of the function $f = \cdots$, and determine whether each nondegenerate critical point is a local min, local max, or saddle point.
(or more briefly Find all critical points, and classify all nondegenerate critical point.)
We might also ask: classify degenerate critial points as well, when possible.
Some examples: