2.6: Taylor’s Theorem

\(\renewcommand{\R}{\mathbb R }\)

2.6: Taylor’s Theorem

  1. Taylor’s Theorem in one variable
  2. Taylor’s Theorem in higher dimensions
  3. The Quadratic Case
  4. The General Case
  5. Computing Taylor polynomials
  6. Problems

\(\Leftarrow\)  \(\Uparrow\)  \(\Rightarrow\)

Taylor’s Theorem in one variable

Recall from MAT 137, the one dimensional Taylor polynomial gives us a way to approximate a \(C^k\) function with a polynomial.

Suppose that \(I\subseteq \R\) is an open interval and that \(f:I\to \R\) is a function of class \(C^k\) on \(I\). For a point \(a\in I\), the \(k\)th order Taylor polynomial of \(f\) at \(a\) is the unique polynomial of order at most \(k\), denoted \(P_{a,k}(x)\), such that \[\begin{align*} f(a) &= P_{a,k}(0) \\ f'(a) &= P'_{a,k}(0) \\ \vdots & \qquad \vdots \\ f^{(k)}(a) &= P^{(k)}_{a,k}(0). \end{align*}\]

Since the \(j\)th derivative of a polynomial evaluated at \(0\) gives the \(j\)th coefficient times \(j!\), we can show that \[\begin{align} P_{a,k}(x) &= f(a) + f'(a)x + f''(a)\frac {x^2}{2} + \cdots + f^{(k)}(a)\frac {x^k}{k!} \label{ttr1}\\\ &= \sum_{j=0}^k f^{(j)}(a)\frac {x^j}{j!} \nonumber \end{align}\]

Taylor’s Theorem guarantees that \(P_{a,k}(h)\) is a very good approximation of \(f(a+h)\) for small \(h\), and that the quality of the approximation increases as \(k\) increases.

Suppose that \(I\subseteq \R\) is an open interval and that \(f:I\to \R\) is a function of class \(C^k\) on \(I\). Let \(a\in I\) and \(h\in \R\) such that \(a+h\in I\), let \(P_{a,k}(h)\) denote the \(k\)th-order Taylor polynomial at \(a\), and define the remainder, \(R_{a,k}(h)\), to be \(f(a+h) - P_{a,k}(h).\) Then \[ \lim_{h\to 0}\frac{R_{a,k}(h)}{h^k} = 0. \]

When \(k=1\), we have \(P_{a,1}(x)=f(a)+f'(a)x\), and so \[R_{a,1}(h)=f(a+h)-f(a)-f'(a)h.\] Our alternative definition of the derivative tells us that \(\displaystyle{\lim_{h\to 0}\frac{R_{a,1}(h)}{h}} = 0.\) Next, we will show that this extends to higher values of \(k\). Then we will generalize Taylor polynomials to give approximations of multivariable functions, provided their partial derivatives all exist and are continuous up to some order.

The case \(k=2\).

In this case, Taylor’s Theorem relies on

Suppose that \(I\subseteq \R\) is an open interval and that \(f:I\to \R\) is a function of class \(C^2\) on \(I\). For \(a\in I\) and \(h\in \R\) such that \(a+h\in I\), there exists some \(\theta\in (0,1)\) such that \[\begin{equation}\label{ttlr} f(a+h) = f(a) + hf'(a) + \frac {h^2}2 f''(a+\theta h). \end{equation}\]

This can be considered to be a second-order Mean Value Theorem.

This lemma implies the \(k=2\) case of Taylor’s Theorem, since we have \[\begin{align*} R_{a,2}(h) &= f(a+h) - \left[ f(a) + h f'(a) +\frac {h^2}2f''(a)\right] \\ &= \frac {h^2}2 \left[ f''(a+\theta h) - f''(a)\right]. \end{align*}\] Thus \[ \frac{R_{a,2}(h)}{h^2} = \frac 12\left[ f''(a+\theta h) - f''(a)\right] \] which tends to \(0\) as \(h\to 0\), since \(f''\) is continuous by assumption.

The proof of Lemma 2 is harder.

Details (optional)

Step 1.

First, to make things as easy as possible, let’s suppose that \(f'(a)=0\) and that \(h\) is a point such that \(f(a+h)= f(a)\). We will consider the general case later. Then the statement we want to prove, see \(\eqref{ttlr}\), reduces to the following: \[\begin{equation}\label{sc1} \text{ if }f(a) = f'(a) = f(a+h) = 0, \ \ \text{ then } \ \exists \theta\in (0,1) \text{ such that }f''(a+\theta h) = 0. \end{equation}\]

We will establish this using use Rolle’s Theorem, which we recall is a special case of the single variable Mean Value Theorem. It implies that if \(g\) is differentiable on an interval \((c,d)\), and if both \(a\) and \(a+h\) are points in \((c,d)\) such that \(g(a)=g(a+ h)\), then there exists \(\alpha\in (0,1)\) such that \(g'(a+\alpha h)=0\).

Now, since \(f(a) = f(a+h)\), Rolle’s Theorem implies that there is a some \(\theta_1\in (0,1)\) such that \(f'(a+\theta_1 h) = 0\).

Next, note that \(f'(a) = 0\) by hypothesis, and we have just shown that \(f'(a+\theta_1h)= 0\). So we can apply Rolle’s Theorem again, this time to \(f'\), and with \(\theta_1 h\) in place of \(h\), to find that there exists some \(\theta_2\in (0,1)\) such that \(f''(a+ \theta_2\theta_1 h)= 0\). If we define \(\theta = \theta_2\theta_1\), then this is exactly \(\eqref{sc1}\). So we have finished Step 1.

Step 2: the general case

Now given \(f\) of class \(C^2\) in \(I\) and points \(a\) and \(a+h \in I\), we want to modify \(f\) to reduce to the special case from Step 1. We start by defining \[ g_1(x) = f(x) - f(a) - (x-a) f'(a). \] Then \(g_1(a) = g_1'(a) = 0\), but \(g_1(a+h) \ne 0\) in general. To fix this, we define \[ g_2(x) = g_1(x) - \left(\frac {x-a}h\right)^2 g_1(a+h) \] Then \[ g_2(a) = g_1(a)= 0, \quad g'_2(a) = g'_1(a)= 0, \quad g_2(a+h) = 0, \] and \(g_2\) is \(C^2\). Then by applying Step 1 to \(g_2\), we find that there exists some \(\theta\in (0,1)\) such that \(g_2''(a+\theta h) = 0\). Writing out what this means in terms of \(f\), says that \(\eqref{ttlr}\) holds.

The general case (very much optional!)

For completeness, we outline the proof of Taylor’s Theorem for \(k\ge 3\).

Click here only if interested.

First we need the following generalization of Lemma 2.

Suppose that \(I\subseteq \R\) is an open interval and that \(f:I\to \R\) is a function of class \(C^k\) on \(I\). For every \(a\in I\) and \(h\in \R\) such that \(a+h\in I\), there exists \(\theta\in (0,1)\) such that \[ f(a+h) = f(a) + h f'(a) + \frac {h^2}{2}f''(a) + \cdots + \frac {h^{k-1}}{(k-1)!} f^{(k-1)}(a) + \frac {h^k}{k!} f^{(k)}(a+\theta h). \]

Once this is known, it follows that \[ \frac 1{h^k}R_{a,k}(h) = \frac 1{k!}\left[ f^{(k)}(a+\theta h) - f^{(k)}(a)\right], \] and the right-hand side tends to \(0\) as \(h\to 0\), since \(f^{(k)}\) is continuous. The proof of this lemma is similar in spirit to the basic version, but more complicated. It too can be broken into \(2\) steps:

Step 1.

The special case when \(f(a) = f'(a) = \cdots = f^{(k-1)}(a) = f(a+h)=0\). Then we must show the existence of some \(\theta\in (0,1)\) such that \(f^{(k)}(a+\theta h)=0\).

Step 2.

Reduction of the general case to Step 1. One does this by defining \[ g_1(x) = f(x) - \left[f(a) + (x-a)f'(a) + \cdots + \frac{(x-a)^{k-1}}{(k-1)!}f^{(k-1)}(a)\right], \] (note that this is exactly \(R_{a,k-1}(x-a)\) – not a coincidence!) and \[ g_2(x) = g_1(x) - \left(\frac {x-a} h\right)^k g_1(a+h). \] This implies the conclusion. Write out the details if you’d like!

Taylor’s Theorem in higher dimensions

Suppose that \(S\subseteq \R^n\) is an open set and that \(f:S\to \R\) is a function of class \(C^k\) on \(S\). For a point \(\mathbf a\in S\), the \(k\)th order Taylor polynomial of \(f\) at \(\mathbf a\) is the unique polynomial of order at most \(k\), denoted \(P_{\mathbf a,k}(\mathbf h)\), such that \[\begin{align}\label{tkRn} f(\mathbf a) &= P_{\mathbf a,k}({\bf 0}) \\\ \partial^\alpha f(\mathbf a) &= \partial^\alpha P_{\mathbf a,k}({\bf 0})\ \ \ \text{ for all partial derivatives of order up to }k.\nonumber \end{align}\]

One of the main difficulties with the theory is just the notation needed to write down explicit formulas for \(P_{\mathbf a,k}(\mathbf h)\). They will require \(\binom {n+k}k\) terms. For example, with \(n=2\) and \(k=3\), there are \(10\) (the value at the point, \(2\) first derivatives, \(3\) second derivatives, and \(4\) third derivatives). This makes notation either very complicated, or simple but incomprehensible. For this reason we will focus on the case of quadratic Taylor polynomials, \(k=2\), which is the most important after linear approximation, and the simplest. First we will state the general result, which guarantees that \(P_{\mathbf a,k}(\mathbf h)\) is a very good approximation of \(f(\mathbf a+\mathbf h)\), and that the quality of the approximation increases as \(k\) increases.

Suppose that \(S\subseteq \R^n\) is an open set and that \(f:S\to \R\) is a function of class \(C^k\) on \(S\). Let \(\mathbf a\in S\) and \(\varepsilon>0\) such that \(B(\mathbf a;\varepsilon)\subseteq S\). For any \(\mathbf h\in B(\mathbf 0;\varepsilon)\), let \(P_{\mathbf a,k}(\mathbf h)\) denote the \(k\)th-order Taylor polynomial at \(\mathbf a\), and define the remainder, \(R_{\mathbf a,k}(\mathbf h)\), to be \(f(\mathbf a+\mathbf h) - P_{\mathbf a,k}(\mathbf h).\) Then \[ \lim_{\mathbf h\to {\bf 0}}\frac{R_{\mathbf a,k}(\mathbf h)}{|\mathbf h|^k} = 0. \]

The main ideas are outlined in the case \(k=2\).

The Quadratic Case

A formula for \(P_{\mathbf a, 2}(\mathbf h)\)

According to the definition we have given, the second order Taylor polynomial \(P_{\mathbf a, 2}(\mathbf h)\) of \(f\) at \(\mathbf a\) is the quadratic polynomial such that \(f(\mathbf a) = P_{\mathbf a, 2}({\bf 0})\), and all first and second partial derivatives of \(P_{\mathbf a, 2}(\mathbf h)\) at \(\bf h = 0\) equal the first and second partial derivatives of \(f\) at \(\mathbf a\). Note that \(\mathbf h\) represents a “small” vector, that is, \(\mathbf h=\mathbf x-\mathbf a\) for some \(\mathbf x\) near \(\mathbf a\). We have a condensed notation for these terms by using the gradient and Hessian:

\[\begin{align} P_{\mathbf a, 2}(\mathbf h) &= f(\mathbf a) + \sum_{i=1}^n h_i \partial_i f(\mathbf a) +\frac 12 \sum_{i,j=1}^n h_i h_j \partial_i\partial_j f(\mathbf a) \nonumber \\ & = f(\mathbf a) + \nabla f(\mathbf a)\cdot \mathbf h + \frac 12 (H(\mathbf a) \mathbf h)\cdot \mathbf h \label{Pa2} \\ & = f(\mathbf a)+\nabla f(\mathbf a)\cdot\mathbf h +\frac{1}{2}\mathbf h^t H(\mathbf a)\mathbf h\nonumber \end{align}\] where \(H(\mathbf a)\) denotes the Hessian.

The proof is exercise 5.

In view of its importance, we restate Taylor’s Theorem in the case \(k=2\) (including some extra details that we did not mention in the general case).

Suppose that \(S\subseteq \R^n\) is an open set and that \(f:S\to \R\) is a function of class \(C^2\) on \(S\). Then for \(\mathbf a\in S\) and \(\mathbf h\in \R^n\) such that the line segment connecting \(\mathbf a\) and \(\mathbf a+\mathbf h\) is contained in \(S\), there exists \(\theta\in (0,1)\) such that \[\begin{equation}\label{ttlr2} f(\mathbf a+\mathbf h) = f(\mathbf a) + \nabla f(\mathbf a)\cdot \mathbf h + \frac 12 (H(\mathbf a+\theta\mathbf h) \mathbf h)\cdot \mathbf h. \end{equation}\] As a result (see exercise 7), \[\begin{equation}\label{tt2} \lim_{\mathbf h\to {\bf 0}}\frac{R_{\mathbf a,2}(\mathbf h)}{|\mathbf h|^2} = 0, \quad \text{ for }R_{\mathbf a,2} = f(\mathbf a+\mathbf h) - P_{\mathbf a,2}(\mathbf h) \end{equation}\] where the formula for \(P_{\mathbf a,2}(\mathbf h)\) is given in \(\eqref{Pa2}\).

Sketch of the proof

See the exercise 8 for the details.

The General Case

For completeness, we state the formula for the \(k\)th order Taylor polynomial, for arbitrary \(k\in \mathbb N\).

First, recall that in Section 2.5 we introduced the notation \[\begin{equation}\label{mix} \partial^\alpha f = \left(\frac{\partial}{\partial x_1 }\right)^{\alpha_1}\left(\frac{\partial}{\partial x_{2} }\right)^{\alpha_{2}}\cdots \left(\frac{\partial}{\partial x_n }\right)^{\alpha_n}f, \end{equation}\] where \(\alpha\) is a mutli-index; that is \(\alpha\) has the form \((\alpha_1,\ldots, \alpha_n)\), where each \(\alpha_j\) is a nonnegative integer. For such a multi-index, we will also use the notation \[ \alpha! = \alpha_1!\alpha_2! \cdots \alpha_n!\ , \qquad \mathbf h^\alpha = h_1^{\alpha_1}h_2^{\alpha_2}\ldots h_n^{\alpha_n}. \] With this notation, the Taylor polynomial of order \(k\) has the formula \[\begin{equation}\label{tkRn2} P_{\mathbf a, k}(\mathbf h) = \sum_{\{\alpha : |\alpha|\le k\}} \frac {\mathbf h^\alpha}{\alpha!} {\partial^\alpha f(\mathbf a)}. \end{equation}\] Recall that \(|\alpha| = \alpha_1+\ldots + \alpha_n\) is the order of the multi-index \(\alpha\). Thus the formula involves all derivatives of order up to \(k\), including the value at the point, when \(\alpha = (0,\ldots, 0)\).

As in the quadratic case, the idea of the proof of Taylor’s Theorem is

  1. Define \(\phi(s) = f(\mathbf a+s\mathbf h)\).

  2. Apply the \(1\)-dimensional Taylor’s Theorem or formula \(\eqref{ttlr}\) to \(\phi\).

  3. Use the chain rule and induction to express the resulting facts about \(\phi\) in terms of \(f\). This is the hard part of the proof and involves showing that \[ \phi^{(j)}(s) = \sum_{\{\alpha : |\alpha| = j\}} \frac{h!}{\alpha!} \partial^\alpha f(\mathbf a+s\mathbf h) \quad\text{ for every }j\le k. \]

Computing Taylor polynomials

While the \(k\)th order Taylor polynomial can always be computed using formula \(\eqref{tkRn2}\), in practice it is a laborious task and you will not be asked to use this formula for any \(k>2\). Instead, there is a trick to computing higher order Taylor polynomials that avoids having to compute all the partial derivatives \(\partial^\alpha f(\mathbf a)\) appearing in \(\eqref{tkRn2}\).

Recall from definition \(\eqref{tkRn}\) and the following theorem that the Taylor Polynomial \(P_{\mathbf a,k}(\mathbf h)\) is a polynomial of order at most \(k\) such that \[f(\mathbf a+\mathbf h)=P_{\mathbf a, k}(\mathbf h) + R_{\mathbf a,k}(\mathbf h)\] where \[\lim_{\mathbf h\to\mathbf 0} \frac{R_{\mathbf a,k}(\mathbf h)}{|\mathbf h|^k}=\mathbf 0.\]

It can be shown that if \(Q(\bf h)\) is any degree \(k\) or lower polynomial such that \(f(\mathbf a+\mathbf h)=Q(\mathbf h) + R(\mathbf h)\) and \(\lim_{\mathbf h\to\bf0} R(\mathbf h)/|\mathbf h|^k=\bf0\), then \(Q(\mathbf h)=P_{\mathbf a,k}(\bf h)\). In other words, there is a unique, best \(k\)th order approximation to \(f\), which is why we have been writing the Taylor polynomial instead of a Taylor polynomial. Instead of using the standard formula and computing all \(k\)th order and lower partial derivatives, we can look for any polynomial \(Q(\bf h)\) that satisfies these properties. This is only useful if we have a good idea for a guess, which we will get by using our knowledge of one variable Taylor polynomials. When a multivariable function is built out out of simpler one-variable functions, we can manipulate the one variable Taylor polynomials as demonstrated in the example below.

Example 1.

Compute \(P_{\mathbf a,5}(\bf h)\) for \(\mathbf a=(0,0,0)\) and \[ f(x,y,z)=e^{x^2-yz^2}\cos(xz+y^2). \]

Solution To do this, recall the Taylor expansions \[ e^s=1+s+\frac{s^2}{2}+\frac{s^3}{3!}+\frac{s^4}{4!}+\frac{s^5}{5!}+\cdots \] and \[ \cos(t)=1-\frac{t^2}{2!}+\frac{t^4}{4!}+\cdots. \]

We want to compute \[ f(\mathbf a+\mathbf h)=f(h_1,h_2,h_3)=e^{h_1^2-h_2h_3^2} \cos\left(h_1h_3+h_2^2 \right) \] so let’s substitute \(s=h_1^2-h_2h_3^2\) into the Taylor expansion for \(e^s\) and \(t=h_1h_3 + h_2^2\) into the Taylor expansion for \(\cos(t)\) and only keep track of terms which have total degree in the \(h_i\)’s of 5 or lower: \[ e^{h_1^2-h_2h_3^2}= 1+\left(h_1^2-h_2h_3^2\right) +\frac{\left(h_1^2-h_2h_3^2\right)^2}{2} +\cdots = 1+h_1^2-h_2h_3^2+\frac{h_1^4}{2}-h_1^2h_2h_3^2+\cdots \] \[ \cos(h_1h_3+h_2^2)= 1-\frac{\left(h_1h_3+h_2^2\right)^2}{2}+\cdots = 1-\frac{h_1^2h_3^2+h_2^4}{2}-h_1h_3h_2^2+\cdots \]

Multiplying these and again only keeping track of terms of total degree \(\leq 5\) in the \(h_i\)’s, \[\begin{align*} f(h_1,h_2,h_3)&= \left(1+h_1^2-h_2h_3^2+\frac{h_1^4}{2}-h_1^2h_2h_3^2+\cdots\right) \left(1-\frac{h_1^2h_3^2+h_2^4}{2}-h_1h_3h_2^2+\cdots\right) \\\ &=1+h_1^2-h_2h_3^2+\frac{h_1^4}{2}-h_1^2h_2h_3^2 -\frac{h_1^2h_3^2+h_2^4}{2}-h_1h_3h_2^2+\cdots \\ &= Q(\mathbf h)+R(\mathbf h) \end{align*}\] where \[ Q(\mathbf h)=1+h_1^2-h_2h_3^2+\frac{h_1^4}{2}- \frac{h_1^2h_3^2+h_2^4}{2}-h_1h_3h_2^2-h_1^2h_2h_3^2 \] and \(R(\mathbf h)\) contains all the remaining terms of degree 6 or higher we have been ignoring in this computation. Since only degree 6 or higher terms appear in \(R(\mathbf h)\), \[ \lim_{\mathbf h\to{\bf 0}} \frac{R(\mathbf h)}{|\mathbf h|^5} = 0 \] and therefore \[ P_{{\bf0},5}(\mathbf h)=1+h_1^2-h_2h_3^2+\frac{h_1^4}{2}- \frac{h_1^2h_3^2+h_2^4}{2}-h_1h_3h_2^2-h_1^2h_2h_3^2 \] by the uniqueness of Taylor polynomials.

Before deciding that the computation in Example 1 is complicated, try to use formula \(\eqref{tkRn2}\) directly by computing all partial derivatives up to order \(5\) at \(\mathbf 0\). There are \(\binom 85=56\) of them. Is the construction from single variable functions more or less work?

Problems

Basic

You will be asked to compute the second-order Taylor polynomial \(P_{\mathbf a, 2}\) of a function at a point \(\mathbf a\). These questions ask you to

Recall that the formula for \(P_{\mathbf a, 2}\) has the zero dimensional approximation using the function, the one dimensional approximation using the gradient, and the two dimensional approximation using the Hessian, and that all approximations are evaluated at \(a\).

For example:

  1. Compute the second-order Taylor polynomial of \(f(x,y) = \frac{xy +y^2}{1+\cos^2 x}\) at \(\mathbf a = (0,2)\).

  2. Compute the second-order Taylor polynomial of \(f(x,y,z) = xy^2e^{z^2}\) at the point \(\mathbf a = (1,1,1)\).

You will also need to compute a higher order Taylor polynomial \(P_{\mathbf a, k}\) of a function at a point. Questions of this type involve using your knowledge of one variable Taylor polynomials to compute a higher order Taylor polynomial.

For example:

  1. Compute the fifth-order Taylor polynomial of \(f(x,y) = \frac{xy +y^2}{1-xy}\) at \(\mathbf a = (0,0)\).

  2. Compute the fourth-order Taylor polynomial of \(f(x,y,z) = xy^2e^{z^2}\) at the point \(\mathbf a = (1,1,1)\).

Advanced

Here you will practice proof writing by filling in details or completing proofs from this section.

5. Complete the proof of Lemma 2.

Hint A quadratic polynomial can always be written in the form \[ q(\mathbf h) = \frac 12 (A \mathbf h)\cdot \mathbf h + \mathbf b \cdot \mathbf h + c, \] where \(A\) is a symmetric \(n\times n\) matrix with entries \((a_{ij})\), \(\mathbf b\in \R^n\), and \(c\in \R\). One can then differentiate \(q\) and see what conditions the coefficients \(A,\mathbf b, c\) must satisfy in order for all derivatives of order up to \(2\) at \(\mathbf h = \bf 0\) to equal the corresponding derivatives of \(f\) at \(\mathbf a\).

  1. Complete the proof of Theorem 3 by proving formula \(\eqref{ttt}\).

  2. Use \(\eqref{ttlr2}\) to prove \(\eqref{tt2}\).

  3. Prove that the formula in \(\eqref{tkRn2}\) satisfies \(\eqref{tkRn}\).

HintSince you have to differentiate \(P_{\mathbf a, k}\) many times, you many find it easier to write it as a function of \(\mathbf x\) rather than \(\mathbf h\). This is just a notational change. When differentiating a function of \(\mathbf h\), we have to interpret \[ \partial^\alpha = \left(\frac{\partial}{\partial h_1 }\right)^{\alpha_1}\left(\frac{\partial}{\partial h_{2} }\right)^{\alpha_{2}}\cdots \left(\frac{\partial}{\partial h_n }\right)^{\alpha_n}, \] whereas for function of \(\mathbf x\), \(\partial^\alpha\) is understood exactly as in \(\eqref{mix}\). The key point is that if \(\alpha = (\alpha_1,\ldots, \alpha_n)\) and \(\beta = (\beta_1,\ldots, \beta_n)\) are multi-indices, then \[ \text{ for }g^\beta(\mathbf x) = \mathbf x^\beta,\quad \partial^\alpha g^\beta({\bf 0}) = \begin{cases} \alpha! &\text{ if }\alpha = \beta \\\ 0&\text{ if not. } \end{cases} \]

\(\Leftarrow\)  \(\Uparrow\)  \(\Rightarrow\)

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Canada License.