2.1 Differentiation of Real-valued Functions

\(\renewcommand{\R}{\mathbb R }\)

2.1 Differentiation of Real-valued Functions

  1. Differentiability in \(\R\)
  2. Differentiability in \(\R^n\) and the gradient
  3. Partial derivatives
  4. Differentiability vs. partial differentiability
  5. Directional derivatives and the meaning of the gradient
  6. Problems

\(\Leftarrow\)  \(\Uparrow\)  \(\Rightarrow\)

Differentiability in \(\R\)

Suppose that \(S\) is an open subset of \(\R^n\), and let \(f\) be a function \(f:S\to \R\). Our first goal is to define what it means for \(f\) to be differentiable. Since our definition should generalize what we know from functions of a single variable, let’s first recall how that goes.

For \(f: (a,b)\to \R\), if \(x\in (a,b)\) and \[\begin{equation}\label{der1} \lim_{h\to 0} \frac{f(x+h) - f(x)}{h} \text{ exists}, \end{equation}\] then we say that \(f\) is differentiable at \(x\), and we say that the above limit is the derivative of \(f\) at \(x\). We write \(f'(x)\) to denote the derivative of \(f\) at \(x\).

Note that our domain is an open set. To generalize this to \(\mathbf f: S\to \R\) with \(S \subseteq \R^n\), we will need to replace \(h\) with a vector, and use the vector length in the denominator. But this won’t be enough. Consider \(f: \R^2\to \R\) defined by \(f(x,y)=x\). This function should be differentiable. Writing \(\mathbf h=(h,k)\), we have \[f(\mathbf x+\mathbf h)-f(\mathbf x)=h,\] and \(|\mathbf h|=\sqrt{h^2+k^2}\), but \(\displaystyle \lim_{\mathbf h\to \mathbf 0} \frac h{\sqrt{h^2+k^2}}\) does not exist, as we have seen by looking at the values along \(k=mh\). Just like completeness, we will need to look at alternative definitions of differentiable functions to find one that generalizes to \(\R^n\) in a way that makes sense.

If the derivative at \(x\) is \(m\), then \[\lim_{h\to 0} \frac{ f(x+h) - f(x)-mh}{h} = 0,\] and conversely, if this limit is \(0\) for a real number \(m\), then the derivative at \(x\) is \(m\). Thus, an equivalent definition is
If there exists a number \(m\) such that \[\begin{equation}\label{der2} \lim_{h\to 0} \frac{ f(x+h) - f(x)-mh}{h} = 0, \end{equation}\] then we say that \(f\) is differentiable at \(x\), and that the number \(m\) is the derivative of \(f\) at \(x\), denoted \(f'(x)\).

The equivalent definition \(\eqref{der2}\) can be understood as follows: temporarily fix \(x\), treat \(h\) as a variable, and view \(f(x+h) - f(x)\) as a function of \(h\). Then \(f\) is differentiable at \(x\) if \(f(x+h)-f(x)\) is approximately \(mh\), with an error that is smaller than linear as \(h\to 0\). When this holds, \(f'(x)=m\).

It’s rare that \(f(x+h)-f(x)=mh\) exactly. Instead, it will be \(mh+E(h)\) for some error term \(E(h)\), with \(\lim_{h\to 0} \frac{E(h)}h=0\). Thus, a third way of defining the derivative is:
If there exists a number \(m\) and a function \(E(h)\) such that \[\begin{equation}\label{der3} f(x+h) = f(x) + mh +E(h), \quad\text{ and }\lim_{h\to 0} \frac {E(h)}{h}=0. \end{equation}\] then we say that \(f\) is differentiable at \(x\), and that the number \(m\) is the derivative of \(f\) at \(x\), denoted \(f'(x)\).

To see that these definitions are all the same, note that \[\begin{align*} &\lim_{h\to 0} \frac{f(x+h) - f(x)}{h} = m \\ &\qquad \iff \qquad \lim_{h\to 0} \frac{ f(x+h) - f(x)-mh}{h} = 0 \\ &\qquad \iff \qquad f(x+h) - f(x)-mh =E(h) \quad \text{ satisfies } \lim_{h\to 0}\frac{E(h)}{h}=0 \\ &\qquad \iff \qquad \text{$\eqref{der3}$ holds} . \end{align*}\]

Differentiability in \(\R^n\) and the gradient

Suppose that \(S\) is an open subset of \(\R^n\) and consider a function \(f:S\to \R\).

We will define differentiability in a way that generalizes definition \(\eqref{der2}\). The idea is that \(f\) is differentiable at a point \(\mathbf x\in S\) if \(f\) can be approximated near \(\mathbf x\) by a linear map \(\R^n\to \R\), with errors that are smaller than linear near \(x\).

To make this precise, we will suppose that \(\mathbf x\in S\) is fixed, and we will consider the function \(\mathbf h \mapsto f(\mathbf x+\mathbf h)-f(\mathbf x)\) of a variable \(\mathbf h\in \R^n\).

Remark. The notation \(a \mapsto b\) defines the function \(f(a)=b\).

We want \(f\) to be differentiable at \(\mathbf x \iff f(\mathbf x+\mathbf h) - f(\mathbf x)\) is approximately a linear function of \(\mathbf h\), for \(\mathbf h\) near \(\mathbf 0\).

Recall from linear algebra:
A function \(\ell:\R^n\to \R^m\) is called linear if it has the form \[\begin{equation}\label{linear.def} \ell(\mathbf x)= M\mathbf x, \text{ where }M \text{ is a }m\times n \text{ matrix.} \end{equation}\]

Another way of saying this is: a function \(\ell:\R^n\to \R^m\) is linear if \[ \ell(a\mathbf x+b\mathbf y) = a \ell (\mathbf x) + b \ell(\mathbf y)\quad\text{ for all }a,b\in \R\text{ and } \mathbf x,\mathbf y \in \R^n. \] If a function has the form \(f(\mathbf x) = M\mathbf x + \mathbf b\), we will say that it is affine. We may also sometimes call it a first-order polynomial or a polynomial of degree 1.

In general, we know from linear algebra that if \(\ell:\R^n\to\R\) is a linear function, then \(\ell\) can be written in the form \[ \ell(\mathbf h) = \mathbf m \cdot \mathbf h\qquad\text{ for some vector }\mathbf m\in \R^n. \]

So, combining all these, we are led to the following basic definition:

Suppose that \(f\) is a function \(S\to \R\), where \(S\) is an open subset of \(\R^n\). We say that \(f\) is differentiable at a point \(\mathbf x\in S\) if there exists a vector \(\mathbf m\in \R^n\) such that \[\begin{equation}\label{der.Rn} \lim_{\mathbf h \to {\mathbf 0}}\frac{ f(\mathbf x+\mathbf h) - f(\mathbf x) - \mathbf m \cdot \mathbf h}{|\mathbf h|} = 0. \end{equation}\] When this holds, we say that the vector \(\mathbf m\) is the gradient of \(f\) at \(\mathbf x\), and denoted \(\nabla f(\mathbf x)\).

Note that \(\nabla f(\mathbf x)\) is uniquely determined by condition \(\eqref{der.Rn}\). Thus, the gradient \(\nabla f\) (when it exists) is characterized by the property that \[ \lim_{\mathbf h \to {\mathbf 0}}\frac{ f(\mathbf x+\mathbf h) - f(\mathbf x) - \nabla f(\mathbf x) \cdot \mathbf h} {|\mathbf h|} = 0. \]

We can also write a definition in the style of \(\eqref{der3}\):

The function \(f:S\to \R\) is differentiable at \(\mathbf x\) if there exists a vector \(\mathbf m\) such that \[\begin{equation}\label{der.Rnb} f(\mathbf x+\mathbf h) = f(\mathbf x) + \mathbf m\cdot \mathbf h + E(\mathbf h),\quad\text{ where }\lim_{\mathbf h\to {\mathbf 0}} \frac{E(\mathbf h)}{| \mathbf h|} = 0. \end{equation}\] When this holds, we define \(\nabla f(\mathbf x) = \mathbf m\).

These definitions can be understood as follows: temporarily fix \(\mathbf x\), view \(\mathbf h\) as a variable, and view \(f(\mathbf x+\mathbf h) - f(\mathbf x)\) as a function of \(\mathbf h\). Then \(f\) is differentiable at \(\mathbf x\) if the linear function \(\mathbf m\cdot \mathbf h\) approximates \(f(\mathbf x+\mathbf h)-f(\mathbf x)\), with errors that are smaller than linear as \(\mathbf h\to {\mathbf 0}\).

We will soon see that there is a simple computational way of finding out what the gradient of \(f\) must be, if it exists. First, let’s do one example the hard way. This will help us to appreciate the theorems that will be proved shortly.

Example 1.

Consider \(f:\R^2\to \R\) defined by \[ f(x,y) = (x-1)^3(y+2). \] Is \(f\) differentiable at the origin?

Solution To check, let’s write a vector \(\mathbf h\) in the form \(\mathbf h = (h, k)\). Then \[\begin{align*} f((0,0) + (h,k)) &= f(h,k) \\ &=-2 + (6h-k) + (h^3k -3h^2k + 3hk + 2 h^3 - 6 h^2) \\ &=f(0,0) + (6, -1)\cdot (h,k) + E(\mathbf h), \end{align*}\] where \(E(\mathbf h)= h^3k -3h^2k + 3hk + 2 h^3 - 6 h^2\). We can check that \[\lim_{\mathbf h \to \mathbf 0} \frac{E(\mathbf h)}{\sqrt{h^2+k^2}} = 0.\] So \(f\) is differentiable at the origin and \(\nabla f(0,0) = (6, -1)\).

The next important property of differentiability generalizes a familiar fact about functions of a single variable.

Assume that \(f:S\to \R\), where \(S\) is an open subset of \(\R^n\), and that \(\mathbf x \in S\). If \(f\) is differentiable at \(\mathbf x\), then \(f\) is continuous at \(\mathbf x\).
Let \(f\) be differentiable at \(\mathbf x\). Then according to \(\eqref{der.Rnb}\), \[ f(\mathbf x+\mathbf h) - f(\mathbf x) = \nabla f(\mathbf x)\cdot \mathbf h + E(\mathbf h),\quad \text{ where }\frac {E(\mathbf h)}{|\mathbf h|} \to 0 \text{ as }\mathbf h \to {\mathbf 0}. \] The limit law for multiplication implies that \[ \lim_{\mathbf h\to {\mathbf 0}} E(\mathbf h) = \lim_{\mathbf h\to {\mathbf 0}} \, |\mathbf h| \, \frac {E(\mathbf h)}{|\mathbf h|} = 0, \] and it is clear that \(\lim_{\mathbf h\to {\mathbf 0}} \nabla f(\mathbf x)\cdot \mathbf h = 0\). So the limit law for addition implies that \(\lim_{\mathbf h\to{\mathbf {\mathbf 0}}} f(\mathbf x+\mathbf h) - f(\mathbf x) = 0\), which says that \(f\) is continuous at \(\mathbf x\).

Partial derivatives

Recall that we have defined \({\mathbf e}_j\) to be the unit vector in \(\R^n\) in the \(j\)th coordinate direction. Thus \[ {\mathbf e}_1 = (1, 0, \ldots ,0), \quad {\mathbf e}_2 = (0,1, \ldots ,0), \quad \ldots,\quad {\mathbf e}_n = (0, 0, \ldots ,1). \]

If \(f\) is a function defined on an open subset \(S\subseteq \R^n\), then at a point \(\mathbf x\in S\), we define \[ \frac{\partial f}{\partial x_j} (\mathbf x) = \lim_{h\to 0} \frac {f(\mathbf x+h {\mathbf e}_j) - f(\mathbf x)}h . \] This is called the \(j\)th partial derivative of \(f\), the partial derivative of \(f\) in the \(x_j\) direction, or the partial derivative of \(f\) with respect to \(x_j\).

To see what it means, let’s consider a function \(f\) of \(2\) variables. In this case we usually write \(\frac{\partial f}{\partial x}\) and \(\frac{\partial f}{\partial y}\), instead of \(\frac{\partial f}{\partial x_j}\) for \(j=1\) or \(2\). The definition says that \[\begin{align*} \frac{\partial f}{\partial x} (x,y) &= \lim_{h\to 0} \frac {f(x+h, y) - f(x,y)}h \\ \frac{\partial f}{\partial y} (x,y) &= \lim_{h\to 0} \frac {f(x, y+h) - f(x,y)}h. \end{align*}\] So for example, \(\frac{\partial f}{\partial x}\) is computed by “freezing \(y\)” – that is, considering it to be a constant – and differentiating with respect to the variable \(x\).

In other words, if we want to compute \(\frac{\partial f}{\partial x}\) at a point \((x,y)\), we can temporarily define a function \(g(x) = f(x,y)\), that is, \(f\) with the \(y\) variable “frozen.” Then \[ \frac{\partial f}{\partial x} (x,y) = g'(x). \] Both sides of the above equality are limits, so it means that the limit on the left-hand side exists if and only if the limit on the right-hand side exists, and when they exist, they are equal. Similarly, \[ \frac{\partial f}{\partial y} (x,y) = g'(y) \qquad\text{ for }g(y) = f(x,y) \text{ with }x \text{ ``frozen''}. \]

Example 2.

Consider \(f:\R^2\to \R\) defined by \[ f(x,y) = (x-1)^3(y+2), \] the same function as in Example 1. What are the partial derivatives of \(f(x,y)\)?

Solution According to what we have said, to compute \(\frac{\partial f}{\partial x}\), we consider \(y\) to be constant, and differentiate with respect to \(x\). Thus \[ \frac{\partial f}{\partial x} = 3(x-1)^2(y+2). \] Similarly, \[ \frac{\partial f}{\partial y} = (x-1)^3. \] If we are interested in the partial derivative at a particular point, say \((x,y)= (0,0)\), we just substitute to find that \[ (\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y})(0,0) = (6, -1). \]

For functions of 3 or more variables, the principles are exactly the same.

Note that partial derivatives at a point \(\mathbf x\) only tell us about the behaviour of \(f\) on lines through \(\mathbf x\) parallel to the coordinate axes. For example,

Example 3.

Let \(f_1:\R^3\to \R\) be the function defined by \[ f_1(x,y,z) = \begin{cases}0 &\text{ if }xyz=0 \\ 1 &\text{ otherwise}. \end{cases} \] Since \(xyz=0\) if and only if at least one component \(x,y,z\) equals zero, we can see that \(f=0\) on the union of the \(xy\)-plane, the \(yz\)-plane, and the \(xz\)-plane. It is straightforward to check that all partial derivatives exist at the origin, and in fact that \[ \frac{\partial f_1}{\partial x}=\frac{\partial f_1}{\partial y} =\frac{\partial f_1}{\partial z} = 0 \text{ at }(0,0,0). \] Similarly, let’s define \[ f_2(x,y,z) = \begin{cases}0 &\text{ if }xyz=0 \\ \cos(x-2 yz)e^{xz} &\text{ otherwise}. \end{cases} \] Although the behaviour of \(f_2\) is complicated if all components \(x,y,z\) are nonzero, the partial derivatives of \(f_2\) at \((0,0,0)\) do not “see” this behaviour, and again, all partial derivatives of \(f_2\) at \((0,0,0)\) exist and equal \(0\).

Notation. It is important to know that partial derivatives are often written in many ways. For example, \(\partial_j f, \partial_{x_j} f,\) and \(f_{x_j}\) are all alternate ways of writing \(\frac{\partial f}{\partial x_j}\) so that they fit on one line. Similarly, \(\partial_x f,\) and \(f_x\) are alternate ways of writing \(\frac{\partial f}{\partial x}\), with corresponding notation for \(\frac{\partial f}{\partial y}\), \(\frac{\partial f}{\partial z}\).

Differentiability vs. partial differentiability

We will now investigate the relationship between differentiability and partial differentiability.

Let \(f\) be a function \(S\to \R\), where \(S\) is an open subset of \(\R^n\). If \(f\) is differentiable at a point \(\mathbf x\in S\), then \(\frac{\partial f}{\partial x_j}\) exists at \(\mathbf x\) for all \(j=1,\ldots, n\), and in addition, \[ \nabla f(\mathbf x) = \left(\frac{\partial f}{\partial x_1}, \ldots, \frac{\partial f}{\partial x_n}\right)(\mathbf x). \]

That is, the partial derivatives are the components of the gradient vector.

The converse is not true! It can happen that all partial derivatives \(\frac{\partial f}{\partial x_j}\) exist at \(\mathbf a\) but that \(f\) is not differentiable at \(\mathbf a\). This is the case for the functions \(f_1\) and \(f_2\) in Example 3. These functions cannot be differentiable at the origin, since differentiability implies continuity (by Theorem 1) and these functions are not continuous at the origin. But as we have noted, all partial derivatives exist at \((0,0,0)\).

Details for Theorem 2

Assume that \(f\) is differentiable at \(\mathbf x\in S\). Consider any \(j \in \{1,\ldots, n\}\), and define \[ g(\mathbf h) = \frac { f(\mathbf x + \mathbf h) - f(\mathbf x) - \nabla f(\mathbf x)\cdot \mathbf h}{|\mathbf h|} \] whenever \(\mathbf h \ne {\mathbf 0}\) and \(\mathbf x+\mathbf h \in S\). Then \[\begin{equation}\label{ddd1} \lim_{\mathbf h\to {\mathbf 0}} g(\mathbf h) = 0 \end{equation}\] by the definition of differentiability. It follows, see problem 9, that for any \(j\in \{1,\ldots,n\}\), \[\begin{equation}\label{ddd2} \lim_{h\to 0} g(h {\mathbf e}_j) = 0. \end{equation}\]

This says that \[\begin{equation}\label{ddd3} \lim_{h\to 0} \frac { f(\mathbf x + h {\mathbf e}_j ) - f(\mathbf x) - \nabla f(\mathbf x)\cdot (h {\mathbf e}_j) }{|h|} = 0 \end{equation}\] It follows that \[\begin{align}\label{ddd4} 0 &= \lim_{h\to 0} \frac { f(\mathbf x + h {\mathbf e}_j ) - f(\mathbf x) - \nabla f(\mathbf x)\cdot (h{\mathbf e}_j )} {h} \\\ &= \lim_{h\to 0} \frac { f(\mathbf x + h {\mathbf e}_j ) - f(\mathbf x)}{h} - \nabla f(\mathbf x)\cdot{\mathbf e}_j \nonumber \end{align}\]

This says that \(\frac{\partial f}{\partial x_j}(\mathbf x)\) exists and equals \(\nabla f(\mathbf x)\cdot {\mathbf e}_j\), which is what we had to prove. You should try to fill in the details of \(\eqref{ddd1} \implies \eqref{ddd2}\) and $ \(\eqref{ddd4}\) in the last two problems of this section.

If you need to determine whether a function \(f\) is differentiable at a point \(\mathbf x\), then Theorem 2 can simplify your life. It tells you

  • if any partial derivatives \(\frac {\partial f}{\partial x_j}\) do not exist at \(\mathbf x\), then \(f\) is not differentiable there, and

  • if all partial derivatives exist, then the vector \({\mathbf m} =(\partial f/\partial x_1, \ldots, \partial f/\partial x_n)\) is the only possible vector that can work in the definition \(\eqref{der.Rn}\) of differentiability.

Example 4.

Consider the function \(f:\R^2\to \R\) defined by \[ f(x,y) = \begin{cases}\frac{y^3-x^8y}{x^6+y^2}&\text{ if }(x,y)\ne (0,0) \\\ 0&\text{ if }(x,y) = (0,0). \end{cases} \] Is \(f\) differentiable at \((0,0)\)?

SolutionSince \(f(x,0)= 0\) for all \(x\) and \(f(0,y)= y\) for all \(y\), the partial derivatives at \((0,0)\) exist and are \[ \frac{\partial f}{\partial x}(0,0) = 0, \qquad \frac{\partial f}{\partial y}(0,0) = 1. \] So if \(\nabla f(\mathbf 0)\) exists, it must equal \((0,1)\). To check differentiability, we must check whether \[\begin{align*} 0 &= \lim_{(x,y)\to (0,0)} \frac{ f(x,y) - f(0,0) - (0,1)\cdot(x,y)}{\sqrt{x^2+y^2}} \\ &= \lim_{(x,y)\to (0,0)} \frac{ f(x,y) - y}{\sqrt{x^2+y^2}}. \end{align*}\] This is problem 13 in Section 1.2. It is not an easy limit, but it would be much harder to determine differentiability if we did not know that the only possible candidate for \(\nabla f(0,0)\) is the vector \({\mathbf m}=(0,1)\).

Theorem 2 can be used to simplify some complicated problems. The next theorem is even better in this respect — it makes it simple to check differentiability of many functions of several variables, by using single variable derivatives and properties of continuous functions.

Suppose \(f\) is a function \(S\to \R\) for some open \(S\subseteq\R^n\). If all partial derivatives of \(f\) exist and are continuous at every point of \(S\), then \(f\) is differentiable at every point of \(S\).

This theorem motivates the following definition.

A function \(f:S\to \R\) is said to be continuously differentiable or of class \(C^1\) (or simply \(C^1\) for short) if all partial derivatives of \(f\) exist and are continuous at every point of \(S\).

Thus by Theorem 3, any \(C^1\) function is differentiable everywhere in \(S\).

Details for Theorem 3

For notational simplicity, we will present the proof when \(n=2\). The idea is exactly the same in the general case.

Let \(\mathbf x\) be any point in \(S\). Since \(S\) is open, there exists \(r>0\) such that \(\mathbf x+\mathbf h \in S\) whenever \(|\mathbf h|<r\). Below, we will always assume that \(|\mathbf h|<r\).

Consider a vector \(\mathbf h = (h,k)\). We start by writing \[\begin{align} f(\mathbf x +\mathbf h) - f(\mathbf x) &= f(x+h, y+k) - f(x,y) \label{ppp1} \\ & = [f(x+h,y+k) - f(x+h,y)] + [f(x+h,y)- f(x,y)] \nonumber \end{align}\] Let’s temporarily write \(g(x) = f(x,y)\). Then \[\begin{align*} f(x+h,y)- f(x,y) &= g(x+h) - g(x) \\ &= h \, g'(x+\theta_1 h)\quad\text{ for some }\theta_1\in (0,1) \end{align*}\] by the single variable Mean Value Theorem from MAT 137. Rewriting this in terms of partial derivatives of \(f\), it says that \[ f(x+h,y)- f(x,y) = h \frac{\partial f}{\partial x}(x+\theta_1 h, y)\text{ for some }\theta_1 \in (0,1). \] A very similar argument shows that \[ f(x+h, y+k) - f(x+h, y) = k \frac{\partial f}{\partial y}(x+ h, y+\theta_2 k)\text{ for some }\theta_2 \in (0,1). \] Combining these with \(\eqref{ppp1}\), we see that \[ f(\mathbf x +\mathbf h) = f(\mathbf x) + (h,k)\cdot (\frac{\partial f}{\partial x},\frac{\partial f}{\partial y})(x,y) + E(h,k), \] where \[ E(h,k) = h \left( \frac{\partial f}{\partial x}(x+\theta_1 h, y) -\frac{\partial f}{\partial x}(x,y) \right) + k \left( \frac{\partial f}{\partial y}(x+h, y+\theta_2 k) - \frac{\partial f}{\partial y}(x,y) \right) \] Finally, since \(|h| \le |\mathbf h | = \sqrt{h^2+k^2}\) and \(|k|\le |\mathbf h|\), it follows that

\[ \lim_{\mathbf h\to {\mathbf 0}} \frac {E(\mathbf h)} {|\mathbf h|} \le \lim_{\mathbf h\to {\mathbf 0}}\left| \frac{\partial f}{\partial x}(x+\theta_1 h, y) -\frac{\partial f}{\partial x}(x,y) \right| + \lim_{\mathbf h\to {\mathbf 0}}\left| \frac{\partial f}{\partial y}(x+h, y+\theta_2 k) - \frac{\partial f}{\partial y}(x,y) \right|. \] The right-hand side equals zero by an \((\varepsilon, \delta)\) argument, using our assumption that the partial derivatives are continuous. Hence \[ \lim_{\mathbf h\to {\mathbf 0}} \frac {E(\mathbf h)} {|\mathbf h|} = 0 , \] which proves the differentiability of \(f\) at \(\mathbf x\). Since \(\mathbf x\) was an arbitrary point of \(S\), this completes the proof.

Example 5.

Let \(f(x,y) = (x-1)^3(y+2)\). At which points of \(\R^2\) is \(f\) differentiable?

SolutionIn Example 1, we proved that \(f\) is differentiable at \((0,0)\), by using the definition of differentiability. That was a moderate amount of work, and it only told us about the point \((0,0)\). Now let’s use Theorem 3 instead. We have already computed \[ \frac{\partial f}{\partial x} = 3(x-1)^2(y+2) \qquad \frac{\partial f}{\partial y} = (x-1)^3. \] These are both continuous everywhere on \(\R^2\), so Theorem 3 implies that \(f\) is differentiable everywhere in \(\R^2\), and that \(\nabla f(x,y) = (3(x-1)^2(y+2), (x-1)^3)\).

Example 6.

Let \(f:\R^n\to \R\) be a polynomial of total degree \(d\). At which points of \(\R^n\) is \(f\) differentiable?

SolutionTo answer this, note that if we “freeze” all variables except \(x_j\), then what is left is a polynomial function of \(x_j\) (whose coefficients are polynomials involving all the other variables). When we differentiate this, we get a polynomial function of \(x_j\) of lower degree. When we remember that the coefficients of this polynomial are themselves polynomials involving the other variables, we see that \[ \frac{\partial f}{\partial x_j} \text{ exists, and is a polynomial of degree }\le d-1. \] To see how this works in practice, consider a concrete example, such as Example 5 above. Since this is true for every \(j\), and since polynomials are continuous in all of \(\R^n\), Theorem 3 implies that polynomials are differentiable everywhere in \(\R^n\).

Contrast this with the example using a naive, incorrect definition for differentiable. The correct definition of differentiable functions eventually shows that polynomials are differentiable, and leads us towards other concepts that we might find useful, like \(C^1\). The incorrect naive definition leads to \(f(x,y)=x\) not being differentiable. Although it looks more complicated, the correct version does two important things that we look for from mathematical definitions: it includes the functions that we intuitively believe it should, and it leads us to new interesting properties.

Directional derivatives and the meaning of the gradient

A direction in \(\R^n\) is naturally represented by a unit vector. In general a vector has a direction and a magnitude; if we are only interested in directions, we can just consider vectors with magnitude equal to \(1\), i.e. unit vectors.

Thus, given a unit vector \(\mathbf u\) and a point \(\mathbf x\in \R^n\), the point \(\mathbf x+ h \mathbf u\) is the point reached by starting at \(\mathbf x\) and traveling a distance \(h\) in the direction \(\mathbf u\). So \(f(\mathbf x+h\mathbf u) - f(\mathbf x)\) represents the change in \(f\) if we start at \(\mathbf x\) and move a distance \(h\) in the direction \(\mathbf u\). This motivates the following definition:

If \(\mathbf u \in \R^n\) is a unit vector, then we define the directional derivative of \(f\) at \(\mathbf x\) in the direction \(\mathbf u\) to be \[\begin{equation} \label{dir.der} \partial_{\mathbf u}f(\mathbf x) = \lim_{h\to 0} \frac{f(\mathbf x+h\mathbf u) - f(\mathbf x)}{h}, \end{equation}\] whenever the limit exists.

Based on our knowledge of first-year calculus, we can see that \(\partial_\mathbf u f(\mathbf x)\) represents the instantaneous rate of change of \(f\) as we move in the direction \(\mathbf u\) through the point \(\mathbf x\).

By comparing the definitions of directional derivative and partial derivative, we see that for any \(j\in \{1,\ldots, n\}\), \[ \frac{\partial f}{\partial x_j} = \partial_{{\mathbf e}_j} f \] where \(\mathbf e_j\) denotes the unit vector in the \(j\)th coordinate direction.

If \(f\) is differentiable at a point \(\mathbf x\), then \(\partial_{\mathbf u}f(\mathbf x)\) exists for every unit vector \(\mathbf u\), and moreover \[ \partial_{\mathbf u}f(\mathbf x) = \mathbf u \cdot \nabla f(\mathbf x). \]

The proof of this is almost exactly like the proof of Theorem 2. This is not surprising, since partial derivatives are just a special case of directional derivatives.

The proof does not require \(\mathbf u\) to be a unit vector; this is only needed for the interpretation of \(\partial_{\mathbf u}f\) as a directional derivative. For any vector \(\mathbf v\), whether or not it is a unit vector, it is generally true that if \(f\) is differentiable at a point \(\mathbf x\), then \[\begin{equation}\label{qd} \lim_{h\to 0} \frac{ f(\mathbf x + h {\mathbf v}) - f(\mathbf x)}{h} = {\mathbf v} \cdot \nabla f(\mathbf x). \end{equation}\] The proof again is essentially the same as that of Theorem 2. It is left as an exercise.

The converse of Theorem 4 is not true. One can find functions \(f\) such that at some point \(\mathbf x\), the directional derivative \(\partial _\mathbf u f(\mathbf x)\) exists for every unit vector \(\mathbf u\), but \(f\) is not differentiable at \(\mathbf x\). See the exercises below for some examples.

Example 7.

Assume that \(S\) is an open subset of \(\R^n\) and that \(f:S\to \R\) is differentiable. At a point \(\mathbf x\in S\), determine the direction \(\mathbf u^*\) in which \(f\) is increasing most rapidly, in the sense that \[ \partial_{\mathbf u^*}f(\mathbf x) = \max \{ \partial_\mathbf u f(\mathbf x) : \mathbf u\text{ a unit vector} \}. \]

SolutionBy Theorem 4 and basic properties of the dot product, we know that for any unit vector \(\mathbf u\), \[\begin{equation}\label{qp2} \partial_\mathbf u f(\mathbf x) = \mathbf u \cdot \nabla f(\mathbf x) = |\mathbf u| \, |\nabla f(\mathbf x)| \, \cos \theta = |\nabla f(\mathbf x)| \, \cos \theta, \end{equation}\] where \(\theta\) is the angle between \(\nabla f(\mathbf x)\) and \(\mathbf u\).

We consider two cases:

Case 1. If \(\nabla f(\mathbf x) = \mathbf 0\) then \(\partial_\mathbf u f(\mathbf x)= 0\) for all \(\mathbf u\), so every unit vector maximizes (and minimizes) \(\partial_\mathbf u f(\mathbf x)\).

Case 2. If \(\nabla f(\mathbf x) \ne \mathbf 0\), then according to \(\eqref{qp2}\) we have to choose \(\mathbf u\) so that \(\cos\theta\) is as large as possible. This happens when \(\cos \theta = 1\), that is, when \(\mathbf u\) is the unit vector pointing in the same direction as \(\nabla f(\mathbf x)\). That is, the directional derivative is maximized in the direction \[\begin{equation}\label{fp} \mathbf u^* = \frac{\nabla f(\mathbf x)} {|\nabla f(\mathbf x)|}. \end{equation}\]

This example tells us what the gradient of a real-valued function means. If you remember only one thing from MAT 237, it should be this:

If it is not zero, \(\nabla f(\mathbf x)\) points in the direction where \(f\) has the greatest increase.

This is the principle that allows machine learning by gradient descent, determines seam carving for image resizing, and will play an important role in the rest of this course.

Example 8.

Let \(f(x,y,z) = \frac 12 x^2 + \frac 14 y^4 + \frac 16 z^6\). Find the direction in which \(f\) has the greatest increase at the point \((1,1,1)\).

SolutionTo answer this, we use formula \(\eqref{fp}\). We compute \[\begin{align*} \nabla f(x,y,z) = (x, y^3, z^5) &\quad\Rightarrow \quad \nabla f(1,1,1) = (1,1,1) \\ &\quad \Rightarrow \frac 1{\sqrt 3}(1,1,1) \text{ is the direction of greatest increase.} \end{align*}\]

Example 9.

Let \(f(x,y,z) = xyz^2\). Find the direction in which \(f\) is decreasing most rapidly at the point \((1,1,1)\).

Solution If \(f\) has the greatest decrease in the direction \(\mathbf u\), then \(-f\) is increasing most rapidly in the direction \(\mathbf u\). By linearity of the dot product and gradient, \(\mathbf u \cdot \nabla(-f)=-\mathbf u \cdot \nabla(f)\), so \(f\) is decreasing most rapidly in the direction of \(-\nabla f\).

Thus for \(f(x,y,z) = xyz^2\), we have \[\begin{align*} \nabla f(x,y,z) = (yz^2, xz^2, 2xyz) &\quad\Rightarrow \quad \nabla f(1,1,1) = (1,1,2) \\ &\quad \Rightarrow \frac {-1}{\sqrt 6}(1,1,2) \text{ is the direction of fastest decrease.} \end{align*}\]

Problems

Basic

  1. Determine all points where a function \(f\) is differentiable, and determine \(\nabla f\) at those points.

Problems like this are normally solved by using Theorem 3 and properties of continuous functions which allow us to recognize partial derivatives as continuous. Examples might be

  • \(f(x,y) = xy \cos(y^2-2x) \qquad\) for \((x,y)\in \R^2\).
  • \(f(x,y,z) = x^2e^{y/z} \qquad\) for \((x,y,z)\in \R^3\) such that \(z\ne 0\).
  • \(f(x,y,z) = |(x,y,z)| = \sqrt{x^2+y^2+z^2} \qquad\) for \((x,y,z)\in \R^3.\)
  • \(f(x,y) = ( \sin^2 x + \sin^2 y)^{1/2}\qquad\) for \((x,y)\in \R^2\).
  1. Given a function \(f\), find the direction in which \(f\) is increasing/decreasing most rapidly at the point \(\mathbf x\). For example, consider any of the functions in the above question, at the point \((1,1)\) or \((1,1,1)\), depending on the dimension.

  2. If \(f\) is a complicated function like Example 4, determining whether \(f\) is differentiable at a given point using the defintion rather than using Theorem 3 is difficult. But determining the directional derivatives at a point using their definition is not. For example

    • Let \[ f(x,y) =\begin{cases} \frac{x^2y}{x^2+y^2}&\text{ if }(x,y)\ne (0,0) \\\ 0&\text{ if }(x,y) = (0,0) . \end{cases} \] For the unit vector \(\mathbf u = \frac1{\sqrt2}(1, -1)\), determine \(\partial_\mathbf u f({\mathbf 0})\).
    • Let \[ f(x,y,z) =\begin{cases} \frac{xyz}{x^2+y^2+z^2}&\text{ if }(x,y,z)\ne (0,0,0) \\\ 0&\text{ if }(x,y,z) = (0,0,0) . \end{cases} \] For the unit vector \(\mathbf u = \frac 13(1, 2, -2)\), determine \(\partial_\mathbf u f({\mathbf 0})\).

Advanced

  1. Let \[ f(x,y) =\begin{cases} \frac{x^2y}{x^2+y^2}&\text{ if }(x,y)\ne (0,0) \\\ 0&\text{ if }(x,y) = (0,0) . \end{cases} \]
    • For any unit vector \(\mathbf u = (u_1, u_2)\), determine \(\partial_\mathbf u f({\mathbf 0})\).

    • Prove that \(f\) is not differentiable at the origin.

      Hint If \(f\) were differentiable at the origin, then necessarily \(\partial_\mathbf u f({\mathbf 0}) = u_1\partial_1 f({\mathbf 0}) + u_2 \partial_2 f({\mathbf 0})\).

  1. Define \(f:\R^2\to \R\) by \[ f(x,y) = \begin{cases} \frac{x^3 y}{x^4+y^2} &\text{ if }(x,y)\ne (0,0)\\\ 0&\text{ if }(x,y) = (0,0) \end{cases} \]
  • Is \(f\) continuous at \((0,0)\)? Determine the answer using material from Section 1.2.

  • Show that for every unit vector \({\mathbf u} = (u_1, u_2)\), the directional derivative \(\partial_{\mathbf u}f(0,0)\) exists and equals zero.

  • Prove that \(f\) is not differentiable at \((0,0)\).

    Hint Consider \(\dfrac { f(t, t^2) - f(0,0) }{|(t,t^2)|}\) for small values of \(t\). Also, note that if \(|t|<1\), then \(|t| \le |(t, t^2)| \le \sqrt 2 |t|\).

  1. Consider the function \(f:\R^2\to \R\) defined by \[ f(x,y) = \begin{cases}\frac{y^3-x^8y}{x^6+y^2}&\text{ if }(x,y)\ne (0,0) \\\ 0&\text{ if }(x,y) = (0,0). \end{cases} \] Prove that
    • all partial derivatives of \(f\) exist everywhere in \(\R^2\),
    • at least one partial derivative of \(f\) is not continuous at \((0,0)\),
    • \(f\) is differentiable at \((0,0)\).
      Hint See Example 4.
  2. A function \(f:\R^n\to \R\) is said to be homogeneous of degree \(d\) if \[ f(\lambda \mathbf x) = \lambda^d f(\mathbf x)\text{ for every nonzero } \mathbf x\in \R^n \text{ and every }\lambda>0. \] The same definition holds if the domain of \(f\) is \(\R^n\setminus \{ {\mathbf 0}\}\).

Prove that if \(f\) is differentiable away from the origin and homogeneous of degree \(d\), then for every unit vector \(\mathbf u\), the directional derivative \(\partial_\mathbf u f\) is homogeneous of degree \(d-1\).

In particular this imples that all partial derivatives are homogeneous of degree \(d-1\), since partial derivatives are a special case of directional derivatives.

HintTo get started, note that for any \(\mathbf x\ne {\mathbf 0}\) and \(\lambda>0\), \[\begin{align*} \partial_\mathbf u f(\lambda \mathbf x) &= \lim_{h\to 0} \frac{f(\lambda\mathbf x+ h \mathbf u) - f(\lambda\mathbf x)}h \\ &= \lim_{h\to 0} \frac{f(\lambda(\mathbf x+ \frac h \lambda \mathbf u)) - f(\lambda\mathbf x)}h. \end{align*}\]

  1. Give a detailed proof of \(\eqref{qd}\), and hence of Theorem 4 (by small modifications of the proof of Theorem 2.)

  2. Suppose that \(S\) is an open subset of \(\R^n\) that contains the origin, and that \(g\) is a function \(S\to \R\). Prove that if \(\lim_{\mathbf x\to {\mathbf 0}} g(\mathbf x) = L,\) then \(\displaystyle\lim_{h\to 0} g(h\mathbf u) = L\) for every unit vector \(\mathbf u\).

This was used in the proof of Theorem 2. It is good practice to write out the proof. You should be able to supply all relevant definitions from memory. This would be considered an easy proof question for a test.

  1. Suppose that \(a<0<b\), and that \(g : (a,b)\to \R\). Prove that \[ \lim_{h\to 0} \frac{g(h)}{|h|} = 0 \qquad \Longleftrightarrow \qquad \lim_{h\to 0} \frac{g(h)}{h} = 0. \] This was also used in the proof of Theorem 2.

\(\Leftarrow\)  \(\Uparrow\)  \(\Rightarrow\)

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Canada License.