2.5: Higher Order Derivatives

$\newcommand{\R}{\mathbb R }$ $\newcommand{\N}{\mathbb N }$ $\newcommand{\Z}{\mathbb Z }$ $\newcommand{\bfa}{\mathbf a}$ $\newcommand{\bfb}{\mathbf b}$ $\newcommand{\bfc}{\mathbf c}$ $\newcommand{\bff}{\mathbf f}$ $\newcommand{\bfg}{\mathbf g}$ $\newcommand{\bfG}{\mathbf G}$ $\newcommand{\bfh}{\mathbf h}$ $\newcommand{\bfu}{\mathbf u}$ $\newcommand{\bfx}{\mathbf x}$ $\newcommand{\bfp}{\mathbf p}$ $\newcommand{\bfy}{\mathbf y}$ $\newcommand{\ep}{\varepsilon}$

Higher Order Derivatives

  1. second-order derivatives
  2. $k$th-order derivatives
  3. higher derivatives and the chain rule
  4. Some notation
  5. Problems

Second-order derivatives

Assume that $S\subset\R^n$ is open. If $f:S\to \R$ is a function of class $C^1$, it could happen that the partial derivatives of $f$ are themselves differntiable, so that we can define $$ \frac{\partial}{\partial x_j}(\frac{\partial}{\partial x_i}f) $$ for all $i,j = 1,\ldots, n$. These are second-order partial derivatives of $f$. They are denoted in a variety of ways, including $$ %\frac{\partial}{\partial x_j}(\frac{\partial}{\partial x_i}f), \quad \frac{\partial^2 f}{\partial x_j \partial x_i}, \quad f_{x_i \, x_j}, \quad \partial_j\partial_i f,\quad \partial_{ji} f $$ and more. If $i=j$ then we also may write $$ \frac{\partial^2 f}{\partial x_i^2}, \quad f_{x_i \, x_i}, \quad \partial_i^2 f,\quad \partial_{ii} f, \ldots $$ A second-order partial derivative $\frac{\partial^2 f}{\partial x_i \partial x_j} $ is said to be mixed if $i\ne j$ and pure otherwise.

Definition 1. We say that $f$ is $C^2$ (or sometimes of class $C^2$) in $S$ if the second-order partial derivatives exist and are continuous everywhere in $S$.

Theorem 1. Assume that $S$ is an open subset of $\R^n$. If $f:S\to \R$ is $C^2$,

$$ \frac{\partial^2 f}{\partial x_i \partial x_j} \ = \ \frac{\partial^2 f}{\partial x_j \partial x_i} $$

for all $i,j=1,\ldots, n$, everywhere in $S$.

Proof.

For notational simplicity, we will prove this for a function of $2$ variables. This implies the general case, since when we compute $\frac{\partial^2 f}{\partial x_i \partial x_j}$ or $\frac{\partial^2 f}{\partial x_j \partial x_i} $ at a particular point, all the variables except $x_i$ and $x_j$ are frozen, so that $f$ can be considered (for that computation) as a function of $x_i$ and $x_j$ alone.

Given $\bfx = (x,y) \in S\subset\R^2$, we will show that \begin{equation}\label{clair} \lim_{h\to 0} \frac 1 {h^2} \big[ f(x+h, y+h) -f(x, y+h) - f(x+h, y) +f(x, y) \big] = \frac{\partial^2 f}{\partial x \partial y} (x,y) \end{equation} and that the same limit also equals $\frac{\partial^2 f}{\partial y \partial x} (x,y)$. This will both prove the theorem and tell us what mixed second-order partial derivatives mean.

Temporarily fix $h\in \R$ such that $B(2h , \bfx)\subset S$, and let $$ \phi(s) := f(x+h, s) -f(x, s). $$ Then, using the $1$-dimensional Mean Value Theorem twice, \begin{align} f(x+h, y+h) -&f(x, y+h) - f(x+h, y) +f(x, y) & \nonumber \\ &= \phi(y+h) - \phi(y)& \nonumber \\ &= h\phi'(y+\theta_1 h)& \mbox{for some }\theta_1\in (0,1) %,\mbox{ by the 1d MVT} \nonumber \\ &= h \Big[ \frac{\partial f}{\partial y}(x+h, y+\theta_1 h) - \frac{\partial f}{\partial y}(x, y+\theta_1 h) \Big] \nonumber \\ &= h \Big[ h\big( \frac{\partial }{\partial x} (\frac{\partial f}{\partial y}(x+\theta_2 h, y+\theta_1 h)\big) \Big] & \mbox{for some }\theta_2\in (0,1) %,\mbox{ by the 1d MVT} \nonumber \\ &= h^2 \frac{\partial^2 f}{\partial x \partial y}(x+\theta_2 h, y+\theta_1 h). \nonumber \end{align} Since this holds for all sufficiently small $h$ (with $\theta_1$ and $\theta_2$ depending on $h$, but always in the interval $(0,1)$) and $(x+\theta_2 h, y+\theta_1 h)\to (x,y)$ as $h\to 0$, this implies \eqref{clair}. On the other hand, by going through the same argument, but with the roles of the $x$ and $y$ reversed, we find that the limit in \eqref{clair} also equals $\frac{\partial^2 f}{\partial y \partial x} (x,y)$. This completes the proof. $\quad\Box$

Example 1. It can happen that a function $f$ has second-order partial derivatives at every point, but that at some points, $\frac{\partial^2 f}{\partial x \, \partial y} \ne \frac{\partial^2 f}{\partial y \, \partial x}$.

We will not worry about this much, because in this class we will mostly encounter functions of class $C^2$ (or better). But it you are interested in an example, consider $$ f(x,y) = \begin{cases} \frac{x^3y}{x^2+y^2}&\mbox{ if }(x,y)\ne (0,0)\\ 0&\mbox{ if }(x,y)=(0,0) . \end{cases} $$ Then one can check that $$ \partial_x f(x,y) = \begin{cases} \frac{x^4y+ 3x^2y^3}{(x^2+y^2)^2}&\mbox{ if }(x,y)\ne (0,0)\\ 0&\mbox{ if }(x,y)=(0,0), \end{cases} $$ $$ \partial_y f(x,y) = \begin{cases} \frac{x^5 -x^3y^2}{(x^2+y^2)^2}&\mbox{ if }(x,y)\ne (0,0)\\ 0&\mbox{ if }(x,y)=(0,0). \end{cases} $$ (In computing partial derivatives at $(0,0)$, one should use the definition of the partial derivative as a limit, which however is easy in these cases.) In particular, $\partial_x f(0,y)=0$ for all $y$, and $\partial_yf(x,0) = x$ for all $x$. It follows that $$ \partial_y\partial_x f(0,0) = 0 \ne 1 = \partial_x\partial_y f(0,0). $$

Higher-order partial derivatives

In general, we can keep on differentiating partial derivatives as long as succesive partial derivatives continue to exist.

Definition 2. We say that $f$ is $C^k$ (or sometimes of class $C^k$) in $S$ if all $k$th-order partial derivatives of $f$ exist and are continuous everywhere in $S$.

If $f$ is $C^k$, then when computing partial derivatives up to order $k$, we do not need to worry about the sequence in which the partial derivatives are computed.

Theorem 2. Assume that $S$ is an open subset of $\R^n$ and that $f:S\to \R$ is $C^k$. For any integers $i_1,\ldots, i_k$ between $1$ and $n$, if $j_1,\ldots, j_k$ is a reordering of $i_1,\ldots, i_k$, then $$ \frac{\partial}{\partial x_{i_k} }\cdots \frac{\partial}{\partial x_{i_1} }f \ = \ \frac{\partial}{\partial x_{j_k} }\cdots \frac{\partial}{\partial x_{j_1} }f $$ everywhere in $S$.

Proof. (sketch)

It suffices to show that \begin{equation}\label{rearr} \frac{\partial}{\partial x_{i_k} }\cdots \frac{\partial}{\partial x_{i_1} }f \ = \ (\frac{\partial}{\partial x_n })^{\alpha_n}\cdots (\frac{\partial}{\partial x_1 })^{\alpha_1}f \end{equation} where $\alpha_j\ge 0$ denotes the number of times that $\frac\partial{\partial x_j}$ appears on the right-hand side of \eqref{rearr}, for $j=1,\ldots, n$. (Thus $\alpha_1+\ldots + \alpha_n = k$.) In other words, we aim to show that we can rearrange the indices $i_1,\ldots, i_k$ in increasing order, so that the rearranged indices $j_1,\ldots, j_k$ satisfy $j_1\le j_2\le \cdots \le j_k$.

We prove \eqref{rearr} by induction. For $k=2$, it follows from Theorem 1. Suppose it holds for $k-1$, and consider $\frac{\partial}{\partial x_{i_k} }\cdots \frac{\partial}{\partial x_{i_1} }f$. Without writing out all the indices, which would be messy, the point is that by repeatedly rearranging in increasing order different sequences of $k-1$ partal differentiations (the first $k-1$ partial derivatives, then the last $k-1$, then the first $k-1$ again) we can arrange that all of the partial derivatives occur in increasing order, and according to the induction hypothesis, all these manipulations leave the right- and left-hand sides of \eqref{rearr} equal.

For example, if $f:\R^3\to \R$ is $C^5$, then

$$ \frac{\partial}{\partial x } \frac{\partial}{\partial y } \frac{\partial}{\partial z } \frac{\partial}{\partial y } \frac{\partial}{\partial x }f \ = \
\frac{\partial^2}{\partial x^2 } \frac{\partial^2}{\partial y^2 } \frac{\partial}{\partial z } f \ = \ \frac{\partial}{\partial z } \frac{\partial^2}{\partial y^2 } \frac{\partial^2}{\partial x^2 } f = \mbox{etc.} $$

Example 2. Compute $$ \frac{\partial^3}{\partial x^3 } \frac{\partial^{17}}{\partial y^{17} } \frac{\partial^{41}}{\partial z^{41} } f(x,y,z) \qquad \mbox{ for }f(x,y,z) = x^2 \arctan( y^2+z \cos(e^{\sin (yz)})). $$

To do this, note that $f$ is a composition of sums and products of infinitely differentiable functions, so $f$ is $C^k$ on all of $\R^3$, for every $k$. Since it is easy to see that $\frac{\partial^3 f}{\partial x^3}=0$, Theorem 2 allows us to compute $$ \frac{\partial^3}{\partial x^3 } \frac{\partial^{17}}{\partial y^{17} } \frac{\partial^{41}}{\partial z^{41} } f(x,y,z) = \frac{\partial^{17}}{\partial y^{17} } \frac{\partial^{41} }{\partial z^{41} } \frac{\partial^3}{\partial x^3 } f(x,y,z) = 0. $$

Example 3. The conclusion of Theorem 2 can fail to hold if $f$ is not of class $C^k$.

Higher derivatives and the chain rule

Suppose that $f:\R^n\to \R$ and $\bfg:\R^m\to \R^n$ are functions of class $C^2$, and consider the composite function $\phi = f\circ \bfg:\R^m\to \R$.

The chain rule is easily seen to imply that $\phi$ is $C^2$. We can write all second partial derivatives of $\phi$ in terms of first and second partial derivatives of $f$ and $\bfg$, but it is easy to make mistakes, so one has to be careful.

Example 3. Assume that $f:\R^3\to\R$ and $\bfg:\R^2\to \R^3$ are both $C^2$. Compute $\frac{\partial^2\phi}{\partial x^2}$, for $\phi = f\circ \bfg$.

Before getting started, for reasons that will become clear, it is useful to write down what the chain rule says about $\partial_x (w\circ \bfg)$, where $w:\R^3\to \R$ is $C^1$. It says: \begin{equation}\label{wd} \partial_x (w\circ \bfg) = \partial_1 w \ \partial_x g_1 + \partial_2 w \ \partial_x g_2 + \partial_3 w \ \partial_x g_3. \end{equation} where partial derivatives of $w$ are evaluated at $\bfg(\bfx)$. That is, $\partial_j w$ really means $(\partial_j w)\circ \bfg$. Also, $g_1,g_2, g_3$ denote the components of $\bfg$, as usual.

To compute a second derivative of $\phi$, we must start by differentiating once. We use the general rule \eqref{wd}, with $w$ replaced by $f$. We will carefully keep track of which functions are composite functions, since we need to pay attention to this when we differentiate a second time. This gives \begin{equation}\label{phi1} \partial_x \phi = \partial_x (f\circ \bfg) = ((\partial_1 f)\circ\bfg) \ \partial_x g_1 + ((\partial_2 f)\circ\bfg) \ \partial_x g_2 + ((\partial_3 f)\circ\bfg) \ \partial_x g_3. \end{equation} Now we have to differentiate again. Each term $(\partial_j f)\circ\bfg$ can be differentiated by using the general rule \eqref{wd}, with $w$ replaced by $\partial_j f$, for $j = 1,2,3$. This gives \begin{equation}\label{wd2} \partial_x((\partial_j f)\circ \bfg) = \partial_1 \partial_j f \ \partial_x g_1 + \partial_2 \partial_j f \ \partial_x g_2 + \partial_3 \partial_j f \ \partial_x g_3. \end{equation} where again, the derivatives of $\partial_j f$ (that is, second derivatives of $f$) are evaluated at $\bfg(\bfx)$.

Now we can differentiate both sides of \eqref{phi1}, using the product rule and formula \eqref{wd2} where necessary, to find (after some bookkeeping) that $$ \partial_x\partial_x \phi = \sum_{i,j=1}^3 \partial_i\partial_j f \ \partial_x g_i \, \partial_x g_j + \sum_{i=1}^3 \partial_i f \, \partial_x \partial_x g_i. $$ where first and second derivatives of $f$ are evaluated at $\bfg(\bfx)$.

Note, although we have called it an Example, this example is close to the general case.
Note also, our preliminary step of writing down \eqref{wd} will be unnecessary when you are thoroughly fluent with the chain rule and can effortlessly do computations like \eqref{wd2}.

Remark. In carrying out computations like this, people often write $\partial_j f$ instead of $(\partial_j f)\circ \bfg$ in formulas like \eqref{phi1}. This is okay (in fact we will do it below) , as long as you

Example 4. Assume that $f:\R^2\to \R$ is a $C^2$ function, and define $\bfg: \R^2\to \R^2$ by $$ g(r,\theta) = (r\cos \theta, r\sin \theta) $$ Let $\phi = f\circ \bfg$, so that $\phi(r,\theta)= f(r\cos\theta, r\sin\theta)$, and compute $\partial_r\partial_\theta \phi$ in terms of $r,\theta$, and derivatives of $f$. In general, if $w(x,y)$ is a $C^2$ function of 2 variables, then \begin{align} \partial_r (w\circ \bfg) &= \partial_x w \, \cos\theta + \partial_y w \ \sin \theta \label{pr} \\ \partial_\theta (w\circ \bfg) &= -\partial_x w \, r\sin\theta + \partial_y w \ r\cos \theta \label{ptheta} \end{align} where derivatives of $w$ are evaluated at $(r\cos\theta, r\sin \theta)$. Thus \eqref{pr} implies that $$ \partial_r \phi = \partial_x f \, \cos\theta + \partial_y f \ \sin \theta $$ where derivatives of $f$ are evaluated at $(r\cos\theta, r\sin \theta)$. Now we differentiate with respect to $\theta$. We have to use \eqref{ptheta} twice (with $w$ replaced by $\partial_x f$ and $\partial_y f$) when differentiating derivatives of $f$, since they are composed with $\bfg$. Doing this, and assembling the results, we end up with $$ \partial_\theta\partial_r \phi = (-\partial_x \partial_x f \ + \ \partial_y \partial_y f) r\sin \theta \, \cos \theta \ +\partial_x \partial_y f (r\cos^2\theta - r\sin^2\theta) -\partial_x f \sin\theta + \partial_y f \cos\theta, $$ where derivatives of $f$ are evaluated at $(r\cos\theta,r\sin \theta)$.

(If you are asked a question like this on a test, you do not need to write so many words, just to get the right answer.)

Some Notation

multi-index notation for higher derivatives.

If $f$ is a $C^k$ function of $n$ variables, then a $k$th-order partial derivative of $f$ can always be written in the form $$ (\frac{\partial}{\partial x_1 })^{\alpha_1}(\frac{\partial}{\partial x_{2} })^{\alpha_{2}}\cdots (\frac{\partial}{\partial x_n })^{\alpha_n}f $$ where $\alpha_j\ge 0$ for every $j$, and $\alpha_1+\ldots+\alpha_n = k$.
It is convenient to introduce the more concise notation $$ \partial^\alpha f := (\frac{\partial}{\partial x_1 })^{\alpha_1}(\frac{\partial}{\partial x_{2} })^{\alpha_{2}}\cdots (\frac{\partial}{\partial x_n })^{\alpha_n}f, \quad\mbox{ for }\alpha = (\alpha_1,\ldots, \alpha_n). $$ An $\alpha$ of the sort (that is, $\alpha = (\alpha_1,\ldots, \alpha_n)$ with every $\alpha_j$ a nonnegative integer) is called a multi-index. For a multi-index $\alpha$, the sum $\alpha_1+\ldots +\alpha_n$ is called the order of $\alpha$.

Thus, if $\alpha$ is a multi-index of order $k$, then $\partial^\alpha f$ is a partial derivative of order $k$.

The order of a mutli-index is sometimes denoted $|\alpha|$. Note that this is not the same as the Euclidean norm.

Example 5. For $f(x,y,z) = x^7 e^{yz}$, compute $\partial^\alpha f$ for $\alpha = (4,1,5)$.

This is just another way of asking us to compute $\frac{\partial^4}{\partial x^4}\frac{\partial}{\partial y}\frac{\partial^5}{\partial z^5}f$. Also, we can differentiate in any order we please. So, it is not hard to see that $$ \frac{\partial^5}{\partial z^5}f = x^7 y^5e^{yz}, $$ So $$ \frac{\partial^4} {\partial x^4} \frac{\partial^5} {\partial z^5} f = 7\cdot6\cdot5\cdot4\cdot x^3 y^5 e^{yz} = 840 x^3 y^5e^{yz}. $$ Then finally $$ \partial^\alpha f = \frac{\partial}{\partial y}\frac{\partial^4}{\partial x^4}\frac{\partial^5}{\partial z^5}f = 840x^3( 5y^4 e^{yz} + y^5 ze^{yz}) = 840x^3y^4e^{yz}(5+yz) $$

The Hessian matrix

If $f$ is a $C^2$ function of $n$ variables, then it is customary to arrange the second derivatives of $f$ into a $n\times n$ matrix, called the Hessian matrix (or sometimes just the Hessian) and denote $H$: $$ H = \left(\begin{array}{ccc} \partial_1\partial_1 f & \cdots & \partial_n\partial_1 f \\ \vdots & \ddots & \vdots \\ \partial_1\partial_n f & \cdots & \partial_n\partial_n f \end{array} \right) $$ If we write $H(\bfa)$, it means that all of the entries of $H$ are evaluated at the point $\bfa$ (which should be a point in the domain of $f$.)

Problems

Basic skills

Combining the chain rule and higher-order partial derivatives. Questions of this sort are overwhelmingly likely to appear on one or more tests. Here are some examples.

  1. Assume that $f:\R^2\to \R$ is a $C^2$ function, and define $\bfg: \R^2\to \R^2$ by $$ \bfg(s,t) = (2st, s^2-t^2) $$ Let $\phi = f\circ \bfg$, so that $\phi(s,t)= f(2st, s^2-t^2)$, and compute $\partial_s\partial_s \phi$ in terms of $s, t$, and derivatives of $f$.

  2. Assume that $f:\R^2\to \R$ is a $C^2$ function, and define $\bfg: \R^2\to \R^2$ by $$ \bfg(r,t) = (r \cosh t, r\sinh t) $$ Let $\phi = f\circ \bfg$, and compute $\frac{\partial^2 \phi}{\partial r\, \partial t}$ in terms of $r, t$, and derivatives of $f$.
    (Recall that $$ \cosh t = \frac 12(e^t+e^{-t}), \quad \sinh t = \frac 12(e^t-e^{-t}). $$ They are very easy to differentiate: they satisfy $\cosh' = \sinh$ and $\sinh' = \cosh$.)

  3. Assume that $f:\R^2\to \R$ is a $C^2$ function, and for $(r,s)\in \R^2$ such that $s\ne 0$, define $\phi = f(rs, \frac rs)$. Compute $\frac{\partial^2 \phi}{\partial r^2}$ in terms of $r,s$, and derivatives of $f$.

  4. Assume that $f:\R^2\to \R$ is a $C^2$ function, and for $(x,y,z)\in \R^3$ such that $z\ne 0$, define $\phi = f(xz, \frac yz)$. Compute $\frac{\partial^2 \phi}{\partial z^2}$ in terms of $x,y,z$, and derivatives of $f$.

  5. Assume that $g:\R^n\to \R$ and $f:\R\to \R$ are $C^2$ functions, and let $\phi = f\circ g:\R^n\to \R$. Express an arbitrary second-order partial derivative $\frac{\partial^2\phi}{\partial x_i\partial x_j}$ in terms of derivatives of $f$ and $g$. (Since $f$ is a function of a single variable, you can write $f'$ and $f''$ to denote its first and second derivatives.)

  6. Assume that $f:\R^2\to \R$ is a $C^2$ function. Fix $\bfa$ and $\bfh$ in $\R^2$, and define $\phi:\R\to \R$ by $$ \phi(s) = f(\bfa + s \bfh). $$ Express $\phi''(s)$ in terms of $f, s,\bfa$, and $\bfh$.
    You can also consider the same question for $f:\R^n\to \R$, with $\bfa, bfh\in \R^n$. The extra difficulties are mainly notational but are not very hard in fact.

  7. Assume that $f:\R^2\to \R$ is a $C^3$ function. Fix $\bfa$ and $\bfh$ in $\R^2$, and define $\phi:\R\to \R$ by $$ \phi(s) = f(\bfa + s \bfh). $$ Express $\phi'''(s)$ in terms of $f, s,\bfa$, and $\bfh$.

You also should be sufficiently familiar with multi-index notation to answer questions like the following:

  1. For $f(x,y,z) = e^{2x+3y+4z}$, and for a multi-index $\alpha= (\alpha_1,\alpha_2,\alpha_3)$, compute $\partial^\alpha f$.

  2. For $f(x,y,z) = y^2\cos(x^2 z)$, compute $\partial^\alpha f$ for $\alpha = (1,2,1)$.

  3. Use multi-index notation to express your answer to the above question about $\phi'''$, where $\phi(s) = f(\bfa+s\bfh)$.

More advanced questions

  1. Let $f(x,y) = |x|(x^2+y^2)^{1/2} =(x^4+x^2y^2)^{1/2}$. Compute the second partial derivatives of $f$ at $(0,0)$, if they exist, and determine whether $\partial^2 f/\partial x\, \partial y = \partial^2 f/\partial y\, \partial x$ at $(0,0)$.

  2. Assume that $f:\R^n\to \R$ and $\bfg = (g_1,\ldots, g_n):\R^m\to \R^n$ are both $C^2$, and let $\phi = f\circ \bfg$. Prove that for all $i,j\in \{1,\ldots, m\}$, \begin{equation}\label{D2cr} \partial_i \partial_j \phi \ = \ \sum_{k,l=1}^n\partial_k \partial_l f \ \partial_i g_k \partial_j g_l + \sum_{k=1}^n\partial_k f \ \partial_i \partial_j g_k \end{equation} where the first and second derivatives of $f$ are evaluated at $\bfg(\bfx)$.

  3. Above you were asked to compute $\phi''(s)$, where $\phi(s) = f(\bfa+s\bfh)$. If you have not done so already, express the answer using matrix-vector notation (including also the notation $H$ for the Hessian matrix of $f$).

    $\Leftarrow$  $\Uparrow$  $\Rightarrow$