2.2: Differentiation (continued)

$\newcommand{\R}{\mathbb R }$ $\newcommand{\N}{\mathbb N }$ $\newcommand{\Z}{\mathbb Z }$ $\newcommand{\bfa}{\mathbf a}$ $\newcommand{\bfb}{\mathbf b}$ $\newcommand{\bff}{\mathbf f}$ $\newcommand{\bfg}{\mathbf g}$ $\newcommand{\bfh}{\mathbf h}$ $\newcommand{\bfu}{\mathbf u}$ $\newcommand{\bfx}{\mathbf x}$ $\newcommand{\bfy}{\mathbf y}$ $\newcommand{\ep}{\varepsilon}$

Differentiation (continued)

  1. Differentiation of vector-valued functions
  2. Some important special cases
  3. the differential
  4. Problems

Differentiation of vector-valued functions

Definition of the derivative

Suppose that $S$ is an open subset of $\R^n$, and consider a vector-valued function $\bff:S\to \R^m$.

The idea of differentiability is essentially the same as in the case of real-valued functions: $\bff$ is differentiable at a point $\bfa\in S$ if $\bff$ can be approximated by a linear map $\R^n\to \R^m$ near $\bfa$, with errors that are smaller than linear.

In order to write this down, we recall that it is natural to represent a linear map $\R^n\to \R^m$ by a $m\times n$ matrix, which acts on (column) vectors by matrix multiplication. Explicitly, a $m\times n$ matrix $$ M = \left(\begin{array}{ccc} M_{11}& \cdots &M_{1n}\\ M_{21}& \cdots &M_{2n}\\ \vdots&\ddots&\vdots\\ M_{m1}& \cdots &M_{mn} \end{array} \right) $$ represents the linear map $\ell:\R^n\to \R^m$ defined by $$ \ell(\bfx)= M\bfx \in \R^m. $$

We also remember from linear algebra that every linear map from $\R^n$ to $\R^m$ can be represented by matrix-vector multipication, as above.

Since it is natural to represent linear maps in this way, we will write the derivative at a point $\bfa\in S\subset \R^n$ of a vector-valued function $\bff: S\to \R^m$ as a $m\times n$ matrix.

Once we have accepted this point, the definition is very much like things we have already seen.

Definition Assume that $S$ is an open subset of $\R^n$. Given a function $\bff:S\to \R^m$, we say that $\bff$ is differentiable at a point $\bfa\in S$ if there exists a $m\times n$ matrix $M$ such that \begin{equation}\label{diffnm} \bff(\bfa+\bfh) = \bff(\bfa) + M \bfh + {\bf E}(\bfh), \qquad\mbox{ where }\lim_{\bfh\to \bf 0}\frac{{\bf E}(\bfh)}{|\bfh|} = {\bf 0}\in \R^m. \end{equation} When this holds, we say that $M$ is the derivative of $\bff$ at $\bfa$, and we write $M = D\bff(\bfa)$. $M$ is also sometimes called the Jacobian matrix.

Recall also that for a vector-valued function ${\bf E}(\bfh) = (E_1(\bfh), \ldots, E_m(\bfh))$ of a variable $\bfh\in \R^n$, $$ \lim_{\bfh\to \bf 0} \frac {{\bf E}(\bfh)}{|\bfh|} = {\bf 0} \mbox{ if and only if } \lim_{\bfh\to \bf 0} \frac {E_j(\bfh)}{|\bfh|} = 0 \mbox{ for all }j\in \{1,\ldots, m \}, $$ as we know from Theorem 3 in Section 1.2

The definition \eqref{diffnm} can also be written: there exists a $m\times n$ matrix $M$ such that $$ \lim_{\bfh\to {\bf 0}}\frac{\bff(\bfa+\bfh) - \bff(\bfa) - M \bfh}{|\bfh|} = {\bf 0}. $$

We will see soon that at places where $\bff$ is differentiable, there are easy expression for the components of $D\bff(\bfa)$ in terms of partial derivatives of the components of $\bff$.

The derivative vs. partial derivatives

Recall that if forced to specify whether a vector is a row- or a column-vector, our default rule is that every vector is a column vector unless explicitly stated otherwise (even if we write it in a way that makes it look like a row vector, such as $\bfx = (x,y,z)$.)

Applying this rule, if we write out all the components of \eqref{diffnm}, it becomes \begin{equation}\label{diffnm.comp} \left( \begin{array}{c} f_1(\bfa+\bfh)\\ \vdots\\ f_m(\bfa+\bfh) \end{array} \right) = \left( \begin{array}{c} f_1(\bfa)\\ \vdots\\ f_m(\bfa) \end{array} \right) + \left(\begin{array}{ccc} M_{11}& \cdots &M_{1n}\\ \vdots&\ddots&\vdots\\ M_{m1}& \cdots &M_{mn} \end{array} \right) \left( \begin{array}{c} h_1\\ \vdots\\ h_n \end{array} \right) + \left( \begin{array}{c} E_1(\bfh)\\ \vdots\\ E_m(\bfh) \end{array} \right) , \end{equation} where all the components of ${\bf E}(\bfh)$ are smaller than linear as $\bf h\to 0$, by which we mean as usual that $\lim_{\bfh\to \bf 0}\frac{ E_j(\bfh)} {|\bfh|} = 0$ for every $j$.

By writing out the the $j$th row of \eqref{diffnm.comp}, we can see that that $\bff:S\to \R^m$ is differentiable at $\bfa$ if and only if, for every $j\in \{1,\ldots, m\}$, \begin{equation}\label{diffnm.row} f_j(\bfa+\bfh) = f_j(\bfa) + \left(\begin{array}{ccc} M_{j1}& \cdots &M_{jn} \end{array} \right) \left( \begin{array}{c} h_1\\ \vdots\\ h_n \end{array} \right) + E_j(\bfh), \end{equation} where $$ \lim_{\bf h\to 0} \frac{E_j(\bfh)}{|\bfh|} = 0. $$ Comparing \eqref{diffnm.row} to the definition of differentiability of a real-valued function, we see that it is equivalent to the assertion that the $j$th component $f_j$ of $\bff$ is differentiable at $\bfa$, for every $j\in {1,\ldots, m}$. Moreover, from what we already know about differentiation of real-valued functions, we know that if \eqref{diffnm.row} holds, then it must be the case that $$ \left(\begin{array}{ccc} M_{j1}& \cdots &M_{jn} \end{array} \right) \ = \
\left(\begin{array}{ccc} \partial_1 f_j(\bfa)& \cdots &\partial_n f_j(\bfa) \end{array} \right) $$ By combining this with things we already know about differentiation of scalar functions, we arrive at the following conclusions.

Theorem 1. Suppose that $S$ is an open subset of $\R^n$. Then a function $\bff :S\to \R^m$ is differentiable at a point $\bfa\in S$ if and only if the component functions $f_j$ are differentiable at $\bfa$ for every $j\in {1,\ldots, m}$. Moreover, $$ D\bff(\bfa) = \left(\begin{array}{ccc} \partial_1 f_1& \cdots &\partial_n f_1\\ \partial_1 f_2& \cdots &\partial_n f_2\\ \vdots&\ddots&\vdots\\ \partial_1 f_m& \cdots &\partial_n f_m \end{array} \right) $$ (where all the partial derivatives are evaluated at $\bfa$.) Furthermore, if all partial derivatives $\partial_i f_j$ (for $i=1,\ldots, n$ and $j=1,\ldots, m$) exist and are continuous in $S$, then $\bff$ is differentiable in $S$.

As in the case of scalar functions, this theorem very often provides the easiest way to check differentiability of a vector-valued function: compute all partial derivatives of all components and see where they exist and where they are all continuous. In many cases, the answer to both questions is everywhere.

As with scalar functions, we say that $\bff$ is of class $C^1$ in $S$ (or sometimes just $\bff$ is $C^1$) if all partial derivatives of all components of $\bff$ exist and are continuous everywhere in $S$.

Example 1. Consider the function $\bff:\R^3\to \R^2$ defined by $$ \bff(x,y,z) = \binom{f_1(x,y,z)}{f_2(x,y,z)} = \binom{ |x| +z }{ |y-1| +xz } $$ Note that $f_1$ is not differentiable when $x=0$, and $f_2$ is not differentiable when $y=1$. Taking these into account, the matrix of partial derivatives at a point $(x,y,z)$ is given by $$ \left( \begin{array}{ccc}\frac x{|x|} &0&1\\ z&\frac{y-1}{|y-1|}&x \end{array} \right)\quad\mbox{ if }x\ne 0\mbox{ and }y\ne 1 $$ The entries of this matrix are all continuous everywhere in $$S = \{(x,y,z)\in \R^3 : x\ne 0\mbox{ and }y\ne 1\},$$ which is an open set, so we conclude that $\bff$ is differentiable everywhere in this set, and that the derivative $D\bff(x,y,z)$ is given by the above matrix.

some important special cases

The special case $m=1$

We can view the case of scalar functions as the $m=1$ special case of $\R^m$-valued functions. Then, according to what we have said, for $f:\R^n\to \R^1$, we should view $Df(\bfa)$ as the $1\times n$ matrix -- if you like, a row vector -- with entries \begin{equation}\label{Df.scalar} Df(\bfa) = (\partial_1 f(\bfa), \ldots, \partial_n f(\bfa)). \end{equation} We can recognize that this is basically the same as the gradient (although now we are insisting that $Df$ is a row vector, which we did not do before with $\nabla f$; see the pedantic remark below for more about this).

This is the common exception to our convention that, by default, we identify vectors as column vectors. The reason this exception occurs is that $Df(\bfa)$ arises not exactly as an ordinary element of $\R^n$ (a column vector), but instead as a (representation of a) linear map $\R^n\to \R$, and this is naturally a $1\times n$ matrix or row vector.

A pedantic remark.

For a differentiable scalar function $f:S\to \R$, we have defined the derivatve $Df(\bfx)$ as the $1\times n$ matrix - row vector $M$ such that $$ \lim_{|\bfh|\to {\bf 0}}\frac{ f(\bfx+\bfh) - f(\bfx) - M\bfh}{|\bfh|} = 0. $$ We have also defined the gradient $\nabla f(\bfx)$ as the vector ${\bf m}$ such that $$ \lim_{|\bfh|\to {\bf 0}}\frac{ f(\bfx+\bfh) - f(\bfx) - {\bf m}\cdot \bfh}{|\bfh|} = 0. $$ These definitions are almost identical. The difference is that, whereas $Df(\bfx) = M$ is a row vector, according to our conventions, $\nabla f(\bfx) = {\bf m}$ is a column vector (because $\bfh$ is a column vector so $\bf m$ should be the same kind of vector, for the dot product $\bf m \cdot \bfh$ to make sense.) In particular, the derivative $Df$ is the row vector given by \eqref{Df.scalar} above, whereas $\nabla f$ is the column vector $\nabla f = Df^T$.

the special case $n=1$.

Another special case that arises often is the case $n=1$, when the domain is $1$-dimensional. For $(a,b)\subset \R$, a function $\bff:(a,b)\to \R^m$ has the form $$ \bff(t) = \left( \begin{array}{c} f_1(t)\\ \vdots\\ f_m(t) \end{array} \right) . $$ Then if $\bff$ is differentiable at a point $c\in (a,b)$, its derivative is a $m\times 1$ matrix, of the form $$ \bff'(c) \ = \ \ \left( \ \begin{array}{c} f_1'(c)\\ \vdots\\ f_m'(c) \end{array} \ \right) . $$

Here are some geometric interpretations.

  1. a parametrized curve is the image of a function $\bff:(a,b)\to \R^m$.

  2. If $\bff$ is differentiable and $\bff'(t)\ne {\bf 0}$, then it is a vector that is tangent to the parametrized curve at $\bff(t)$.

  3. Any (nonzero) multiple of $\bff'(t)$ is also tangent to the parametrized curve at $\bff(t)$. In particular, $\frac{\bff'(t)}{|\bff'(t)|}$ is a unit tangent vector (still assuming that $\bff'(t)\ne {\bf 0}$.)

  4. If we fix $t$ and let $h$ vary, then the straight line parametrized by $\bff(t) + h\bff'(t)$ is the tangent line to the curve at $\bff(t)$.

  5. If $\bff(t)$ represents the position of a particle at time $t$, then $|\bff'(t)|$ is the speed of the particle at time $t$, and $\bff'(t)$ is the velocity vector of the particle at time $t$. The unit tangent $\frac{\bff'(t)}{|\bff'(t)|}$ is the direction of motion at time $t$.

These are at least partly illustrated by the following example.

Example 2. Define $\bff:\R\to \R^2$ by $$ \bff(t) = t\binom{\cos t}{\sin t} . $$ We know from precalculus that for every $t$, $\bff(t)$ is a point whose distance from the origin is $t$ (that is, $|\bff(t)|=t$), such that the line from the origin to $\bff(t)$ forms an angle of $t$ radians with the positive $x$-axis. The image of this curve in the $x-y$ plane is shown below, for $t\in (-\pi/4, 6\pi)$

drawing

We can see that the components of $\bff$ are differentiable everywhere. Thus $\bff$ is differentiable everywhere, and $$ \bff'(t) = \binom{\cos t - t \sin t}{\sin t + t \cos t} . $$ The definition of the derivative that if we fix $t$ and consider $\bff(t+h)$ as a function of $h$, $$ \bff(t+h) \approx \bff(t) + h \bff'(t) \qquad\mbox{ for }h\mbox{ small} $$ in the sense that the error is smaller than linear. (That is, ${\bf E}(h)/h\to 0$ as $h\to 0$.)

This can be seen in the picture below, which shows (in red) the sets \begin{equation}\label{tl} \{ \bff(t) + h \bff'(t), \ \ -1 < h< 1\} \end{equation} for several choices of $t$, together with the same curve as above,

drawing

Above, in red, the set \eqref{tl} for $t = 0, 2\pi$ and $4\pi+\pi/2$, along with the curve defined above. From these one can see/understand the following:

Example 3. The picture below shows a portion of the curve in $\R^3$ parametrized by $\bff(t) = (t\cos t, t\sin t, t)$, together with a segment of a tangent line $$ \{ \bff(t) + h \bff'(t) : -1< h < 1\} $$ for a particular choice of $t$. This is similar to Example 2.

drawing

the differential

Our discussion of the differential followed very closey the brief discussion in Section 2.2 of Folland's Advanced Calculus.

We discussed this for real-valued functions rather than vector-valued functions.

The main points are:

  1. the definition: Given a diferentiable function $f:S\to \R$, where $S$ is an open subset of $\R^n$, at a point $\bfa\in S$ we define $df\big|_{\bfa}$ to be the linear map $\R^n\to \R$ given by \begin{equation}\label{df.def} df\big|_{\bfa}(\bfh) = \nabla f(\bfa)\cdot \bfh. \end{equation} This is also sometimes written $df(\bfa;\bfh)$. We often also simply neglect to mention the point $\bfa\in S$ and write $df(\bfh)$.

  2. Notation for differentials. It is common to write \begin{equation}\label{df.notation} df = \frac{\partial f}{\partial x_1}dx_1+\cdots +\frac{\partial f}{\partial x_n}dx_n \end{equation} or for example in $3$ dimensions, $$ df = \frac{\partial f}{\partial x}dx +\frac{\partial f}{\partial y}dy +\frac{\partial f}{\partial z}dz. $$ It is simplest just to think of this as conventional notation, but it can also be defended as reasonable, if one chooses.
    One important reason for introducing the differential is that this notation will later allow us state things like the chain rule in useful ways.

    rationale for the notation \eqref{df.notation} Formula \eqref{df.notation} makes sense if we declare that $dx_j$ is the linear function $\R^n\to \R$ defined by $$ dx_j(\bfh) = h_j, $$ Why is this reasonable? Well, let $g(\bfx) := \bfx\cdot {\bf e}_j = x_j$. Then one can check from the definition \eqref{df.def} that $dg(\bfh) = h_j$. But since $g(\bfx) = x_j$, it is not a great stretch to write this as $dx_j(\bfh) = h_j$.

  3. linear approximation. The definition of the differential implies that if $f$ is differentiable at $\bfa$, then $$ f(\bfa+\bfh) \approx f(\bfa) + df\big|_{\bfa}(\bfh)\mbox{ for $\bfh$ small}. $$ This can be used to compute approximate values of functions.

Example 3. Let $f(x,y,z) := x y \sin(x^2 z)$, and estimate the numerical value of $f(1.1,1.9,3)$.

Solution. Let's write $(1.1, 1.9, 3) = \bfa + \bfh$ for $$ \bfa = ( 1, 2, \pi)\quad\mbox{ and }\quad\bfh = (.1, -.1, 3-\pi) \approx (.1, -.1, -.14). $$ Note that it is easy to compute $f$ and $df$ at $\bfa$. Indeed, $f(\bfa)= 0$ and $\nabla f(\bfa)= (-4\pi , 0,-2 )$, so $$ f(1.1, 1.9, 3) \approx 0 + (-4\pi , 0,-2 )\cdot (.1, -.1, -.14) \approx -1.27+ .28 = -.99. $$ For comparison, using a calculator we find that $f(1.1, 1.9, 3) \approx -0.98067$, so the approximation was off by about 1%.

Problems

Basic skills

  1. Determine all points where a function $\bff = \cdots$ is differentiable, and determine $D \bff$ at all such points.

  2. If $\bff:\R\to \R^n$ be defined by $\ldots$, and consider the curve parametrized by $\bff$.

  3. Use the differential to compute the approximate value of the function $f = \ldots$ at the point $\ldots$. For example

More advanced questions

  1. Assume that $S$ is an open subset of $\R^n$, and that $\bff, \bfg:S\to \R^m$ are functions of class $C^1$ in $S$. Prove that $\phi := \bff\cdot \bfg:S\to \R$ is of class $C^1$, and that \begin{equation}\label{pr} \nabla\phi = \nabla(\bff\cdot \bfg) = (D\bff)^T \bfg + (D\bfg)^T \bff \end{equation} where ${\ }^T$ denotes transpose.
    This fact is useful to remember, whether or not you do the exercise.
    Note: In \eqref{pr} above, $\nabla \phi, \bff$ and $\bfg$ are all column vectors. What is the corresponding formula for the row vector $D\phi$ (still assuming, as usual, that $\bff$ and $\bfg$ are column vectors)?

  2. Now assume that $S$ is an open subset of $\R^n$, and that $\bff, \bfg:S\to \R^m$ are functions in $S$ at are differentiable at a point $\bfa\in S$. Prove that $\phi := \bff\cdot \bfg:S\to \R$ is differentiable in $\bfa$, and that \eqref{pr} holds at $\bfa$.

    The difference between these two questions is that Theorem 1 is relevant for the first one, whereas for the second, since our assumptions only give us information about differentiability at the point $\bfa$, all we can use is the definition of differentiable and of the derivative matrix.

    $\Leftarrow$  $\Uparrow$  $\Rightarrow$