2.1: Differentiation

$\newcommand{\R}{\mathbb R }$ $\newcommand{\N}{\mathbb N }$ $\newcommand{\Z}{\mathbb Z }$ $\newcommand{\bfa}{\mathbf a}$ $\newcommand{\bfb}{\mathbf b}$ $\newcommand{\bff}{\mathbf f}$ $\newcommand{\bfu}{\mathbf u}$ $\newcommand{\bfx}{\mathbf x}$ $\newcommand{\bfy}{\mathbf y}$ $\newcommand{\bfm}{\mathbf m}$ $\newcommand{\bfh}{\mathbf h}$ $\newcommand{\ep}{\varepsilon}$

Differentiation of real-valued functions

Differentiability and the gradient

Partial derivatives

Differentiability vs. partial differentiability

Directional derivatives, and the meaning of the gradient

Problems

First, a remark.
In high school, a function $\ell:\R\to \R$ is called linear if it has the form $\ell(x) = mx+b$. One could make the same definition for functions $\R^n\to \R^m$; then a function $\ell:\R^n\to \R^m$ is linear if it has the form $\ell(\bfx) = M\bfx+\bfb$ wbere $M$ is a $m\times n$ matrix and $\bfb\in \R^m$.

In linear algebra, by contrast, a function $\ell:\R^n\to \R^m$ is called linear if it has the form \begin{equation}\label{linear.def} \ell(\bfx)= M\bfx, \mbox{ where }M \mbox{ is a }m\times n \mbox{ matrix.} \end{equation}

Thus, these two definitions of the word linear do not coincide.

In this class, we will adopt the linear algebra usage: for us, linear means \eqref{linear.def}. Another way of saying this is: a function $\ell:\R^n\to \R^m$ is linear if $$ \ell(a\bfx+b\bfy) = a \ell (\bfx) + b \ell(\bfy)\quad\mbox{ for all }a,b\in \R\mbox{ and } \bfx,\bfy \in \R^n. $$ If a function has the form $f(\bfx) = M\bfx + \bfb$, we may say that it is affine. We may also sometimes call it a first-order polynomial or a polynomial of degree 1. (You are also free to refer to a constant as a polynomial of degree $0$, if you like.....)

Differentiability and the gradient

A quick review

Assume that $S$ is an open subset of $\R^n$, and consider a function $f:S\to \R$.

Our first goal is to define what it mean for $f$ to be differentiable. Since our defnition should generalize what we know from functions of a single variable, let's first recall how that goes.

The familiar definition is this: for $f: (a,b)\to \R$, if $x\in (a,b)$ and \begin{equation}\label{der1} \lim_{h\to 0} \frac{f(x+h) - f(x)}{h} \mbox{ exists}, \end{equation} then we say that $f$ is differentiable at $x$, and we say that the above limit is the derivative of $f$ at $x$. We then write $f'(x)$ to denote the derivative of $f$ at $x$.

An equivalent definition is: If there exists a number $m$ such that \begin{equation}\label{der2} \lim_{h\to 0} \frac{ f(x+h) - f(x)-mh}{h} = 0, \end{equation} then we say that $f$ is differentiable at $x$, and that the number $m$ is the derivative of $f$ at $x$, denoted $f'(x)$.

Another way of writing the same thing is: If there exists a number $m$ and a function $E(h)$ such that \begin{equation}\label{der3} f(x+h) = f(x) + mh +E(h), \quad\mbox{ and }\lim_{h\to 0} \frac {E(h)}{h}=0. \end{equation} then we say that $f$ is differentiable at $x$, and that the number $m$ is the derivative of $f$ at $x$, denoted $f'(x)$.

The equivalent definition \eqref{der2} can be understood as follows: temporarily fix $x$, and view $h$ as a variable, and view $f(x+h) - f(x)$ as a function of $h$. Then $f$ is differentiable at $x$ if the linear function (of $h$, with $x$ fixed) $ mh$ approximates $f(x+h)-f(x)$, with errors that are smaller than linear as $h\to 0$. When this holds, $f'(x)=m$.

To see that these definitions are all the same, note that \begin{align} &\lim_{h\to 0} \frac{f(x+h) - f(x)}{h} = m\nonumber \\ &\qquad \iff \qquad \lim_{h\to 0} \frac{ f(x+h) - f(x)-mh}{h} = 0 \nonumber \\ &\qquad \iff \qquad E(h) := f(x+h) - f(x)-mh \quad \mbox{ satisfies } \lim_{h\to 0}\frac{E(h)}{h}=0 \nonumber \\ &\qquad \iff \qquad \mbox{\eqref{der3} holds} . \nonumber \end{align}

Definition of differentiability and the gradient

Suppose that $S$ is an open subset of $\R^n$, and consider a function $f:S\to \R$.

We will define differentiability in a way that clearly generalizes definition \eqref{der2}. Informally, ths idea is that $f$ is differentiable at a point $\bfx\in S$ if $f$ can be approximated near $\bfx$ by a linear map $\R^n\to \R$, with errors that are smaller than linear near $x$.

To make this precise, we will suppose that $\bfx\in S$ is fixed, and we will consider the function $\bfh \mapsto f(\bfx+\bfh)-f(\bfx)$ of a variable $\bfh\in \R^n$. Still infomally, we want $$ f\mbox{is differentiable at }\bfx \quad \iff \quad f(\bfx+\bfh) - f(\bfx)\approx\mbox{ a linear function of }\bfh $$ for $\bfh$ near $\bf 0$. In general, we know from linear algebra that a if $\ell$ is linear function of $\bfh\in \R^n$, then $\ell$ can be written in the form $$ \ell(\bfh) = \bfm \cdot \bfh\qquad\mbox{ for some vector }\bfm\in \R^n. $$

So, combining all these, we are led to the following basic definition:

Definition. Assume that $f$ is a function $S\to \R$, where $S$ is is an open subset of $\R^n$. We say that $f$ is differentiable at a point $\bfx\in S$ if there exists a vector $\bfm\in \R^n$ such that \begin{equation}\label{der.Rn} \lim_{\bfh \to {\bf 0}}\frac{ f(\bfx+\bfh) - f(\bfx) - \bfm \cdot \bfh}{|\bfh|} = 0. \end{equation} When this holds, we say that the vector $\bfm$ (which is uniquely determined by condition \eqref{der.Rn}) is the gradient of $f$ at $\bfx$, which is denoted $\nabla f(\bfx)$. Thus, the gradient $\nabla f$ (when it exists) is characterized by the property that $$ \lim_{\bfh \to {\bf 0}}\frac{ f(\bfx+\bfh) - f(\bfx) - \nabla f(\bfx) \cdot \bfh} {|\bfh|} = 0. $$

We can also write a definition in the style of \eqref{der3} above, $f$ is differentiable at $\bfx$ if there exists a vector $\bfm$ such that \begin{equation}\label{der.Rnb} f(\bfx+\bfh) = f(\bfx) + \bfm\cdot \bfh + E(\bfh),\quad\mbox{ where }\lim_{\bfh\to {\bf 0}} \frac{E(\bfh)}{| \bfh|} = 0. \end{equation} When this holds, we define $\nabla f(\bfx) = \bfm$.

These definitions may can be understood as follows: temporarily fix $\bfx$, view $ \bfh$ as a variable, and view $f(\bfx+\bfh) - f(\bfx)$ as a function of $\bfh$. Then $f$ is differentiable at $\bfx$ if the linear function (of $\bfh$, with $\bfx$ fixed) $ {\bf m}\cdot \bfh$ approximates $f(\bfx+\bfh)-f(\bfx)$, with errors that are smaller than linear as $\bfh\to {\bf 0}$. When this holds, $\nabla f(\bfx)={\bf m}$.

We will soon see that there is an easy way of finding out what the gradient of $f$ must be, if it exists. First, let's do one example the hard way. This will help us to appreciate the theorems that will be proved shortly.

Example 1. Consider $f:\R^2\to \R$ defined by $$ f(x,y) = (x-1)^3(y+2). $$ Is $f$ differentiable at the origin? To check, let's write a vector $\bfh$ in the form $\bfh = (h, k)$. Then \begin{align} f((0,0) + (h,k)) &= f(h,k)\nonumber \\ &= \cdots \nonumber \\ &=-2 + (6h-k) + (h^3k -3h^2k + 3hk + 2 h^3 - 6 h^2) \nonumber\\ &=f(0,0) + (6, -1)\cdot (h,k) + E(\bfh)\nonumber \\ &\qquad \mbox{ for }E(\bfh):= h^3k -3h^2k + 3hk + 2 h^3 - 6 h^2 \nonumber \end{align} and it is straightforward to check that $\lim_{\bfh \to \bf 0} E(\bfh)/|\bfh| = 0$. So $f$ is differentiable aat the origin and $\nabla f(0,0) = (6, -1)$.

The following important property of differentiability generalizes a familiar fact about functions of a single variable.

Theorem 1. Assume that $f:S\to \R$, where $S$ is an open subset of $\R^n$, and that $\bfx \in S$. If $f$ is differentiable at $\bfx$, then $f$ is continuous at $\bfx$.

Proof

Let $f$ be differentiable at $\bfx$. Then according to \eqref{der.Rnb}, $$ f(\bfx+\bfh) - f(\bfx) = \nabla f(\bfx)\cdot \bfh + E(\bfh),\quad \mbox{ where }\frac {E(\bfh)}{|\bfh|} \to 0 \mbox{ as }\bfh \to {\bf 0}. $$ The Squeeze Theorem implies that $$ \lim_{\bfh\to {\bf 0}} E(\bfh) = \lim_{\bfh\to {\bf 0}} \, |\bfh| \, \frac {E(\bfh)}{|\bfh|} = 0, $$ and it is clear that $\lim_{\bfh\to {\bf 0}} \nabla f(\bfx)\cdot \bfh = 0$. So the Limit Law for addition implies that $\lim_{\bfh\to{\bf {\bf 0}}} f(\bfx+\bfh) - f(\bfx) = 0$, which says that $f$ is continuous at $\bfx$. $\quad \Box$

Partial derivatives

Recall that we have defined ${\bf e}_j$ to be the unit vector in $\R^n$ in the $j$th coordinate direction. Thus $$ {\bf e}_1 = (1, 0, \ldots ,0), \quad {\bf e}_2 = (0,1, \ldots ,0), \quad \ldots,\quad {\bf e}_n = (0, 0, \ldots ,1). $$ If $f$ is a function defined on an open subset $S\subset \R^n$, then at a point $\bfx\in S$, we define $$ \frac{\partial f}{\partial x_j} (\bfx) := \lim_{h\to 0} \frac {f(\bfx+h {\bf e}_j) - f(\bfx)}h . $$ This is called the $j$th partial derivative of $f$, the partial derivative of $f$ in the $x_j$ direction, or the partial derivative of $f$ with respect to $x_j$. To see what it means, let's consider a function $f$ of $2$ or $3$ variables. In this case we usually write $\frac{\partial f}{\partial x}$, $\frac{\partial f}{\partial y}$, and $\frac{\partial f}{\partial z}$ instead of $\frac{\partial f}{\partial x_j}$, $j=1,2,3$. The definition says that for a function $f$ of $2$ variables, $$ \frac{\partial f}{\partial x} (x,y) = \lim_{h\to 0} \frac {f(x+h, y) - f(x,y)}h, \qquad \frac{\partial f}{\partial y} (x,y) = \lim_{h\to 0} \frac {f(x, y+h) - f(x,y)}h. $$ So for example, $\frac{\partial f}{\partial x}$ is computed by freezing $y$, that is, considering it to be a constant, and differentiating with respect to the $x$ variable.

In other words, if we want to compute $\frac{\partial f}{\partial x}$ as a point $(x,y)$, we can temporarily define a function $g(x) = f(x,y)$, that is, $f$ with the $y$ variable frozen. Then $$ \frac{\partial f}{\partial x} (x,y) = g'(x). $$ (Both sides of the above equality are limits, so it means that the limit on the left-hand side exists if and only if the limit on the right-hand side exists, and when they exist, they are equal.) Similarly, $$ \frac{\partial f}{\partial y} (x,y) = g'(y) \qquad\mbox{ for }g(y) = f(x,y), \mbox{ with }x \mbox{ "frozen"}. $$

Example 2. Consider $f:\R^2\to \R$ defined by $$ f(x,y) = (x-1)^3(y+2) $$ (that is, the same function as in Example 1). According to what we have said, to compute $\frac{\partial f}{\partial x}$, we consider $y$ to be constant, and differentiate with respect to $x$. Thus $$ \frac{\partial f}{\partial x} = 3(x-1)^2(y+2). $$ Similarly, $$ \frac{\partial f}{\partial y} = (x-1)^3. $$ If we are interested in the partial derivative at a particular point, say $(x,y)= (0,0)$, we just substitute to find that $$ (\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y})(0,0) = (6, -1) $$

For functions of 3 or more variables, the principles are exactly the same.

Note that partial derivatives at a point $\bfx$ only tell us about the behaviour of $f$ on lines through $\bfx$ parallel to the coordinate axes. For example,

Example 3. Let $f_1:\R^3\to \R$ be the function defined by $$ f_1(x,y,z) = \begin{cases}0 &\mbox{ if }xyz=0 \\ 1 &\mbox{ if not}. \end{cases} $$ Since $xyz=0$ if and only if at least one component $x,y,z$ equals zero, we can see that $f=0$ on the union of the $xy$-plane, the $yz$-plane, and the $xz$-plane. It is straightforward to check that all partial derivatives exist at the origin, and in fact that $$ \frac{\partial f_1}{\partial x}=\frac{\partial f_1}{\partial y} =\frac{\partial f_1}{\partial z} = 0 \mbox{ at }(0,0,0). $$ Similarly, let's define $$ f_2(x,y,z) = \begin{cases}0 &\mbox{ if }xyz=0 \\ \cos(x-2 yz)e^{xz} &\mbox{ if not}. \end{cases} $$ Although the behaviour of $f_2$ is moderately complicated if all components $x,y,z$ are nonzero, the partial derivaties of $f_2$ at $(0,0,0)$ do not see this, and again, all partial derivatives of $f_2$ at $(0,0,0)$ exist and equal $0$.

Notation. It is important to know that partial derivatives are often written in many ways. For example, $$ \partial_j f, \quad \partial_{x_j} f, \quad f_{x_j}, \quad f_j $$ are all alternate ways of writing $\frac{\partial f}{\partial x_j}$. Similarly, $$ \partial_x f, \quad f_x $$ are alternate ways of writing $\frac{\partial f}{\partial x}$, with corresponding notation for $\frac{\partial f}{\partial y}$, $\frac{\partial f}{\partial z}$.

Differentiability vs. partial differentiability

We will now investigate the relationship between differentiability and partial differentiability.

Theorem 2. Let $f$ be a function $S\to \R$, where $S$ is an open subset of $\R^n$. If $f$ is differentiable at a point $\bfx\in S$, then $\frac{\partial f}{\partial x_j}$ exists at $\bfx$ for all $j=1,\ldots, n$, and in addition, $$ \nabla f(\bfx) = (\frac{\partial f}{\partial x_1}, \ldots, \frac{\partial f}{\partial x_n})(\bfx). $$ That is, the partial derivatives are the components of the gradient vector.

The converse is not true; it can happen that all partial derivatives $\frac{\partial f}{\partial x_j}$ exist at $\bfx$ but that $f$ is not differentiable at $\bfx$. This is the case for the functions $f_1$ and $f_2$ in Example 3. These functions canot be differentiable at the origin, since differentiability implies continuity (by Theorem 1) and these functions are not continuous at the origin. But as we have noted, all partial derivatives exist at $(0,0,0)$.

Proof of Theorem 2

Assume that $f$ is differentiable at $\bfx\in S$. Consider any $j \in \{1,\ldots, n\}$, and define $$ g(\bfh) = \frac { f(\bfx + \bfh) - f(\bfx) - \nabla f(\bfx)\cdot \bfh}{|\bfh|} $$ whenever $\bfh \ne {\bf 0}$ and $\bfx+\bfh \in S$. Then \begin{equation}\label{ddd1} \lim_{\bfh\to {\bf 0}} g(\bfh) = 0 \end{equation} by the definition of differentiability. It follows that for any $j\in \{1,\ldots,n\}$, \begin{equation}\label{ddd2} \lim_{h\to 0} g(h {\bf e}_j) = 0. \end{equation} (One of the problems for this section is to give a detailed proof that \eqref{ddd2} follows from \eqref{ddd1}.) This says that \begin{equation}\label{ddd3} \lim_{h\to 0} \frac { f(\bfx + h {\bf e}_j ) - f(\bfx) - \nabla f(\bfx)\cdot (h {\bf e}_j) }{|h|} = 0 \end{equation} It follows that \begin{align}\label{ddd4} 0 &= \lim_{h\to 0} \frac { f(\bfx + h {\bf e}_j ) - f(\bfx) - \nabla f(\bfx)\cdot (h{\bf e}_j )} {h} \\ &= \lim_{h\to 0} \frac { f(\bfx + h {\bf e}_j ) - f(\bfx)}{h} - \nabla f(\bfx)\cdot{\bf e}_j \nonumber \end{align} (Another problem for this section is to give a detailed proof that \eqref{ddd4} follows from \eqref{ddd3}.) This says that $\frac{\partial f}{\partial x_j}(\bfx)$ exists and equals $\nabla f(\bfx)\cdot {\bf e}_j$, which is what we had to prove.

If you need to determine whether a function $f$ is differentiable at a point $\bfx$, then Theorem 2 can simplify your life. It tells you

if any partial derivatives $\partial f/\partial x_j$ fail to exist at $\bfx$, then $f$ is not differentiable there, and
if all partial derivatives exist, then the vector ${\bf m} =(\partial f/\partial x_1, \ldots, \partial f/\partial x_n)$ is the only possible vector that can work in the definition \eqref{der.Rn} of differentiability.

Example 4. Consider the function $f:\R^2\to \R$ defined by $$ f(x,y) = \begin{cases}\frac{y^3-x^8y}{x^6+y^2}&\mbox{ if }(x,y)\ne (0,0) \\ 0&\mbox{ if }(x,y) = (0,0). \end{cases} $$ Is $f$ differentiable at $(0,0)$?

Since $f(x,0)= 0$ for all $x$ and $f(0,y)= y$ for all $y$, it is clear that the partial derivatives at $(0,0)$ exist and are given by $$ \frac{\partial f}{\partial x}(0,0) = 0, \qquad \frac{\partial f}{\partial y}(0,0) = 1. $$ So if $\nabla f(\bf 0)$ exists, it must equal $(0,1)$, and to check differentiability, we must check whether $$ 0 = \lim_{(x,y)\to (0,0)} \frac{ f(x,y) - f(0,0) - (0,1)\cdot(x,y)}{\sqrt{x^2+y^2}} = \lim_{(x,y)\to (0,0)} \frac{ f(x,y) - y}{\sqrt{x^2+y^2}} \ . $$ This is a problem in Section 1.2, and also in this section. It is not an easy limit, but it would be much harder to determine differentiability if we did not know that the only possible candidate for $\nabla f(0,0)$ is the vector ${\bf m}=(0,1)$.

So Theorem 2 can simplify some complicated problems. The next theorem is even better in this respect; it makes it easy to check differentiability of many functions of several variables, by simply using calculus and basic properties of continuous functions.

Theorem 3. Assume $f$ is a function $S\to \R$ for some open $S\subset\R^n$. If all partial derivatives of $f$ exist and are continuous at every point of $S$, then $f$ is differentiable at every point of $S$.

Theorem 3 motivates the following definition.

Definition 1. A function $f:S\to \R$ is said to be of class $C^1$, or simply $C^1$ for short, if all partial derivatives of $f$ exist and are continuous at every point of $S$ (and thus, according to Theorem 3, $f$ is differentiable everywhere in $S$).

This proof will be omitted in the lectures.

Proof of Theorem 3

For notational simplicity, we will present the proof when $n=2$. The idea is exactly the same in the general case.

Let $\bfx$ be any point in $S$. Since $S$ is open, there exists $r>0$ such that $\bfx+\bfh \in S$ whenever $|\bfh|<r$. Below, we will always assume that $|\bfh|<r$.

Consider a vector $\bfh = (h,k)$. We start by writing \begin{align} f(\bfx +\bfh) - f(\bfx) &= f(x+h, y+k) - f(x,y) \label{ppp1} \\ & = [f(x+h,y+k) - f(x+h,y)] + [f(x+h,y)- f(x,y)] \nonumber \end{align} Let's temporarily write $g(x) = f(x,y)$. Then \begin{align} f(x+h,y)- f(x,y) &= g(x+h) - g(x) \nonumber \\ &= h \, g'(x+\theta_1 h)\quad\mbox{ for some }\theta_1\in (0,1) \nonumber \end{align} by the $1$-dimensional Mean Value Theorem from MAT137. Rewriting this in terms of partial derivatives of $f$, it says that $$ f(x+h,y)- f(x,y) = h \frac{\partial f}{\partial x}(x+\theta_1 h, y)\mbox{ for some }\theta_1 \in (0,1). $$ A very similar argument shows that $$ f(x+h, y+k) - f(x+h, y) = k \frac{\partial f}{\partial y}(x+ h, y+\theta_2 k)\mbox{ for some }\theta_2 \in (0,1). $$ Combining these with \eqref{ppp1}, we see that $$ f(\bfx +\bfh) = f(\bfx) + (h,k)\cdot (\frac{\partial f}{\partial x},\frac{\partial f}{\partial y})(x,y) + E(h,k), $$ where $$ E(h,k) = h \left( \frac{\partial f}{\partial x}(x+\theta_1 h, y) -\frac{\partial f}{\partial x}(x,y) \right) + k \left(
\frac{\partial f}{\partial y}(x+h, y+\theta_2 k) - \frac{\partial f}{\partial y}(x,y) \right) $$ Finally, since $|h| \le |\bfh | = \sqrt{h^2+k^2}$ and $|k|\le |\bfh|$, it follows that

$$ \lim_{\bfh\to {\bf 0}} \frac {E(\bfh)} {|\bfh|} \le \lim_{\bfh\to {\bf 0}}\left| \frac{\partial f}{\partial x}(x+\theta_1 h, y) -\frac{\partial f}{\partial x}(x,y) \right| + \lim_{\bfh\to {\bf 0}}\left|
\frac{\partial f}{\partial y}(x+h, y+\theta_2 k) - \frac{\partial f}{\partial y}(x,y) \right|. $$ The right-hand side equals zero by our assumption that the partial derivatives are continuous. (This is a straightforward $\delta-\ep$ argument.) Hence it follows that $$ \lim_{\bfh\to {\bf 0}} \frac {E(\bfh)} {|\bfh|} = 0 , $$ which proves the differentiability of $f$ at $\bfx$. Since $\bfx$ was an arbitrary point of $S$, this completes the proof. $\quad \Box$

Example 5. Let $f(x,y) = (x-1)^3(y+2)$. At which points of $\R^2$ is $f$ differentiable?

In Example 1, we proved that $f$ is differentiable at $(0,0)$, by using the definition of differentiability. That was a moderate amount of work, and it only told us about the point $(0,0)$. Now let's use Theorem 3 instead. We have already computed $$ \frac{\partial f}{\partial x} = 3(x-1)^2(y+2) \qquad \frac{\partial f}{\partial y} = (x-1)^3. $$ These are both continuous everywhere on $\R^2$, so Theorem 3 implies that $f$ is differentiable everywhere in $\R^2$, and that $\nabla f(x,y) = (3(x-1)^2(y+2), (x-1)^3)$.

Example 6. Let $f:\R^n\to \R$ be a polynomial of degree $d$. At which points of $\R^n$ is $f$ differentiable?

To answer this, note that if we freeze all variables except $x_j$, then what is left is a polynomial function of $x_j$ (whose coefficients are polynomials involving all the other variables). When we differentiate this, we get a polynomial function of $x_j$ of lower degree. When we remember that the coefficients of this polynomial are themselves polynomials involving the other variables, we see that $$ \frac{\partial f}{\partial x_j} \mbox{ exists, and is a polynomial of degree }\le d-1. $$ (To see how this works in practice, consider a concrete example, such as Example 5 above.) Since this is true for every $j$, and since polynomials are clearly continuous in all of $\R^n$, Theorem 3 implies that $f$ is differentiable everywhere in $\R^n$.

Directional derivatives, and the meaning of the gradient

A direction in $\R^n$ is naturally represented by a unit vector. (Recall that in general a vector has a direction and a magnitude; if we are only interested in directions, we can just consider vectors with magnitude equal to $1$, i.e. unit vectors).

Thus, given a unit vector $\bfu$ and a point $\bfx\in \R^n$, the point $\bfx+ h \bfu$ is the point reached by starting at $\bfx$ and traveling a distance $h$ in the direction $\bfu$. So $f(\bfx+h\bfu) - f(\bfx)$ represents the change in $f$ if we start at $\bfx$ and move a distance $h$ in the direction $\bfu$. These considerations motivate the following definition:

If $\bfu \in \R^n$ is a unit vector, then we define \begin{align}\label{dir.der} \partial_{\bfu}f(\bfx) &:= \lim_{h\to 0} \frac{f(\bfx+h\bfu) - f(\bfx)}{h} \\ &= \mbox{ the directional derivative of }f\mbox{ at }\bfx \mbox{ in the direction }\bfu, \nonumber \end{align} whenever the limit exists.

Based on what we have said above (and our knowledge of first-year calculus), we can see that $\partial_\bfu f(\bfx)$ represents the instantaneous rate of change of $f$ if we move in the direction $\bfu$ through the point $\bfx$.

By comparing the definitions of directional derivative and partial derivative, we immediately see that for any $j\in \{1,\ldots, n\}$, $$ \frac{\partial f}{\partial x_j} = \partial_{{\bf e}_j} f $$ where $\bf e_j$ is the unit vector in the $j$th coordinate direction, as usual.

Theorem 4. If $f$ is differentiable at a point $\bfx$, then $\partial_{\bfu}f(\bfx)$ exists for every unit vector $\bfu$, and moreover $$ \partial_{\bfu}f(\bfx) = \bfu \cdot \nabla f(\bfx). $$

The proof of this is almost exactly like the proof of Theorem 2 above; this is not surprising, since partial derivatives are just a special case of directional derivatives, as noted above.

In fact the proof does not require $\bfu$ to be a unit vector; this is only needed for the interpretation of $\partial_{\bfu}f$ as a directional derivative. For any vector $\bf v$, whether or not it is a unit vector, it is generally true that if $f$ is differentiable at a point $\bfx$, then \begin{equation}\label{qd} \lim_{h\to 0} \frac{ f(\bfx + h {\bf v}) - f(\bfx)}{h} = {\bf v} \cdot \nabla f(\bfx). \end{equation} The proof again is essentially the same as that of Theorem 2. See the exercises.

The converse of Theorem 4 is not true. One can find functions $f$ such that at some point $\bfx$, the directional derivative $\partial _\bfu f(\bfx)$ exists for every unit vector $\bfu$, but $f$ is not differentiable at $\bfx$. See the Problems below for some examples.

Example 7. Assume that $S$ is an open subset of $\R^n$ and that $f:S\to \R$ is differentiable. At a point $\bfx\in S$, determine the direction $\bfu^*$ in which $f$ is increasing most rapidly, in the sense that $$ \partial_{\bfu^*}f(\bfx) = \max \{ \partial_\bfu f(\bfx) : \bfu\mbox{ a unit vector} \}. $$

By Theorem 4 and basic properties of the dot product, we know that for any unit vector $\bfu$, \begin{equation}\label{qp2} \partial_\bfu f(\bfx) = \bfu \cdot \nabla f(\bfx) = |\bfu| \, |\nabla f(\bfx)| \, \cos \theta = |\nabla f(\bfx)| \, \cos \theta, \end{equation} where $\theta$ is the angle between $\nabla f(\bfx)$ and $\bfu$.

We consider two cases:

Case 1. If $\nabla f(\bfx) = 0$ then $\partial_\bfu f(\bfx)= 0$ for all $\bfu$, so every unit vector maximizes (and minimizes) $\partial_\bfu f(\bfx)$.

Case 2. If $\nabla f(\bfx) \ne 0$, then according to \eqref{qp2} we have to choose $\bfu$ so that $\cos\theta$ is as large as possible. This happens when $\cos \theta = 1$, that is, when $\bfu$ is the unit vector pointing in the same direction as $\nabla f(\bfx)$. That is, the directional derivative is maximized in the direction \begin{equation}\label{fp} \bfu^* = \frac{\nabla f(\bfx)} {|\nabla f(\bfx)|}. \end{equation}

This discussion, which we summarize below, tells us a basic fact of the utmost importance: what the gradient of a scalar function means. If you remember only one thing from MAT237, it should be this:

Fundamental Principle: when it is not equal to zero, $\nabla f(\bfx)$ points in the direction in which $f$ is increasing most rapidly at $\bfx$.

Example 8. Let $f(x,y,z) = \frac 12 x^2 + \frac 14 y^4 + \frac 16 z^6$. Find the direction in which $f$ is increasing mostly rapidly, at the point $(1,1,1)$.

To answer this, we simply use formula \eqref{fp}. We compute \begin{align} \nabla f(x,y,z) = (x, y^3, z^5) &\quad\Rightarrow \quad \nabla f(1,1,1) = (1,1,1)\nonumber \\ &\quad \Rightarrow \frac 1{\sqrt 3}(1,1,1) \mbox{ is the direction of fasteset increase.}\nonumber .\nonumber \end{align}

Example 9. Let $f(x,y,z) = xyz^2$. Find the direction n which $f$ is decreasing mostly rapidly, at the point $(1,1,1)$.

If $f$ is increasing most rapidly in the direction $\bfu^* = \nabla f(\bfx)/|\nabla f(\bfx)|$, then it is decreasing most rapidly in the direction $\bfu_* := -\bfu^*$. Indeed, it is easy to see that $\partial_{\bfu_*}f(\bfx) = -|\nabla f(\bfx)|$, which is the smallest possible value of any directional derivative of $f$ at $\bfx$.

Thus for $f(x,y,z) = xyz^2$, we have \begin{align} \nabla f(x,y,z) = (yz^2, xz^2, 2xyz) &\quad\Rightarrow \quad \nabla f(1,1,1) = (1,1,2)\nonumber \\ &\quad \Rightarrow - \frac 1{\sqrt 6}(1,1,2) \mbox{ is the direction of fastest decrease.}\nonumber \end{align}

Problems

More problems will probably be added later.

Basic skills

Determine all points where a function $f = \cdots$ is differentiable, and determine $\nabla f$ at all such points.

Problems like this are normally solved by using Theorem 3 and basic properties of continuous functions, which often allow us to recognize partial derivatives as continuous. Examples might be
- $f(x,y) = xy \cos(y^2-2x)$ for $(x,y)\in \R^2$.
- $f(x,y,z) = x^2e^{y/z}$ for $(x,y,z)\in \R^3$ such that $z\ne 0$.
- $f(x,y,z) = |(x,y,z)| = \sqrt{x^2+y^2+z^2}$ for $(x,y,z)\in \R^3$.
- $f(x,y) = ( \sin^2 x + \sin^2 y)^{1/2}$.
Given a function $f = \cdots$, find the direction in which $f$ is increasing/decreasing most rapidly at the point $\bfx = \cdots$. For example, consider any of the functions in the above question, at the point $(1,1)$ or $(1,1,1)$, depending on the dimension.
If $f$ is a complicated function (such as in Example 4), determining whether $f$ is differentiable at a given point, which might involve using the defintion of the differentiability rather than using Theorem 3, is too hard to be a Basic Skill. But determining the directional derivatives at a point (using the definition of directional derivative) can still count as a Basic Skill, if the computations are not incredibly complicated. For example
- Let $$ f(x,y) =\begin{cases} \frac{x^2y}{x^2+y^2}&\mbox{ if }(x,y)\ne (0,0) \\ 0&\mbox{ if }(x,y) = (0,0) . \end{cases} $$ For a unit vector $\bfu = (u_1, u_2)$, determine $\partial_\bfu f({\bf 0})$.
- Let $$ f(x,y,z) =\begin{cases} \frac{xyz}{x^2+y^2+z^2}&\mbox{ if }(x,y,z)\ne (0,0,0) \\ 0&\mbox{ if }(x,y,z) = (0,0,0) . \end{cases} $$ For a unit vector $\bfu = (u_1, u_2, u_3)$, determine $\partial_\bfu f({\bf 0})$.