2.3 The Chain Rule

Recall the assumptions:

\(S\) and \(T\) are open subsets of \(\R^n\) and \(\R^m\) respectively,
\(\mathbf g:S\to \R^m\) is differentiable at \(\mathbf a \in S\).
\(\mathbf f:T\to \R^\ell\) is differentiable at \(\mathbf b = \mathbf g(\mathbf a)\in T\).

The definition of differentiability involves error terms which we typically write as \(\mathbf E({\bf h})\). In this proof we have to keep track of several different error terms, so we will use subscripts to distinguish between them. For example, we will write \(\mathbf E_{\mathbf g, \mathbf a}( {\bf h})\) to denote the error term for \(\mathbf g\) near the point \(\mathbf a\).

It turns out to make the notation easier to write \(M\) instead of \(D\mathbf g(\mathbf a)\) and \(N\) instead of \(D\mathbf f(\mathbf g(\mathbf a)) = D\mathbf f(\mathbf b)\). Thus, \(M\) is the (unique) \(m\times n\) matrix such that \[\begin{equation}\label{dga} \mathbf g(\mathbf a +{\bf h}) = \mathbf g(\mathbf a ) + M \mathbf h + \mathbf E_{\mathbf g, \mathbf a}({\bf h})\qquad\text{ where } \lim_{\mathbf h \to \mathbf 0}\frac {\mathbf E_{\mathbf g, \mathbf a}({\bf h})}{|\bf h|} = \mathbf 0, \end{equation}\] and similarly, \(N\) is characterized by \[\begin{equation}\label{dfb} \mathbf f(\mathbf b +{\bf k}) = \mathbf f(\mathbf b ) + N {\bf k} + \mathbf E_{\mathbf f, \mathbf b}({\bf k})\qquad\text{ where } \lim_{\bf k \to \bf0}\frac {\mathbf E_{\mathbf f, \mathbf b}({\bf k})}{|\bf k|} = \mathbf 0. \end{equation}\] Using \(\eqref{dga}\), we find that \[\begin{align} \mathbf f(\mathbf g(\mathbf a +{\bf h})) &= \mathbf f\Big( \overbrace{\mathbf g(\mathbf a) }^{\mathbf b}+ \overbrace{M \, {\bf h} + \mathbf E_{\mathbf g, \mathbf a}({\bf h}) }^{\bf k}\Big) \\ &\overset{\eqref{dfb}}= \mathbf f(\overbrace {\mathbf g(\mathbf a)}^\mathbf b) + N( \overbrace{M \, {\bf h} + \mathbf E_{\mathbf g, \mathbf a}({\bf h}) }^{\bf k}) + \mathbf E_{\mathbf f, \mathbf b}({\bf k})\\ &= \mathbf f(\mathbf g(\mathbf a)) + NM{\bf h} \ + \ N \mathbf E_{\mathbf g, \mathbf a}({\bf h}) +\mathbf E_{\mathbf f, \mathbf b}({\bf k}) \end{align}\] We can rewrite this as \[\begin{multline} \mathbf f\circ \mathbf g(\mathbf a+{\mathbf h}) = \mathbf f\circ \mathbf g(\mathbf a) + N M\, {\bf h} + \mathbf E_{\mathbf f\circ \mathbf g, \mathbf a}(\mathbf h),\\ \qquad\text{ where } \mathbf E_{\mathbf f\circ \mathbf g, \mathbf a}(\mathbf h) : = N\mathbf E_{\mathbf g, \mathbf a}({\bf h}) + \mathbf E_{\mathbf f, \mathbf b}({\bf k}), \qquad {\bf k} = M{\bf h}+ \mathbf E_{\mathbf g, \mathbf a}(\bf h). \nonumber \end{multline}\] Since \(N M = D\mathbf f(\mathbf g(\mathbf a)) D\mathbf g(a)\), this will imply the chain rule, after we verify that \[\begin{equation}\label{cr.proof} \lim_{\bf h\to \bf0} \frac 1{|\bf h|} \mathbf E_{\mathbf f\circ \mathbf g, \mathbf a}(\mathbf h) = \bf0. \end{equation}\]

Details for\(\eqref{cr.proof}\)

This is is even more optional than the rest of the proof, but if you are interested it can be found here.

We consider separate pieces of the error term one after another. We will be terse.

First, since \(N\) is a fixed matrix, there exists a number \(C\) such that \(|N {\bf v}| \le C |\bf v|\) for all \(\bf v\in \R^m\). Since \(\mathbf E_{\mathbf g, \mathbf a}({\bf h})\) is a vector in \(\R^m\), it follows that \[\begin{equation}\label{cr.p1} \frac 1{|\bf h|} |N\mathbf E_{\mathbf g, \mathbf a}({\bf h})| \le \frac C{|\bf h|} |\mathbf E_{\mathbf g, \mathbf a}({\bf h})| \to 0 \text{ as }{\bf h}\to \bf0. \end{equation}\] Also, \[ \frac 1{|\bf h|} |\mathbf E_{\mathbf f, \mathbf b}({\bf k})| = \frac {|\bf k|} {|\bf h|} \frac 1{|\bf k|} |\mathbf E_{\mathbf f, \mathbf b}({\bf k})| . \] Since \(M\) is a fixed matrix and \(\lim_{\bf h\to \bf0} \frac{\mathbf E_{\mathbf g, \mathbf a}(\mathbf h)}{|\bf h|} = \bf0\), one can check that there exists some constant \(D\) such that \[ |{\bf k}| \le D {|\bf h} | \qquad \text{ whenever } 0 <|{\bf h}| < 1. \] It follows that \({\bf k}\to \bf0\) as \(\bf h \to \bf0\), and hence that \[\begin{equation}\label{cr.p2} \lim_{\mathbf h \to {\bf 0}}\frac 1{|\bf h|} |\mathbf E_{\mathbf f, \mathbf b}({\bf k})| \le \lim_{\bf k \to {\bf 0}}D \frac 1{|\bf k|} |\mathbf E_{\mathbf f, \mathbf b}({\bf k})| = 0. \end{equation}\] Finally, we deduce \(\eqref{cr.proof}\) by adding up \(\eqref{cr.p1}\) and \(\eqref{cr.p2}\).

2.3 The Chain Rule

The Chain Rule

Important special cases

The case \(\ell=1\).

The case \(\ell=n=1\).

Some examples

Example 4: Level sets and the gradient

Example 5.

Proof of the Chain Rule (optional)

Differentiate, then substitute

Summary of this discussion

Problems

Basic

Advanced