2.4: The Mean Value Theorem

\(\renewcommand{\R}{\mathbb R }\)

2.4: The Mean Value Theorem

The Mean Value Theorem

Some consequences

Problems

\(\Leftarrow\) \(\Uparrow\) \(\Rightarrow\)

The Mean Value Theorem

The Mean Value theorem of single variable calculus tells us that if we connect two points \((a, f(a))\) and \((b, f(b))\) with a straight line \(\ell\) on the graph of a differentiable function \(f\), then there is a point \(c\in [a,b]\) where the tangent line is parallel to \(\ell\), i.e. \[ f'(c)=\frac{f(b)-f(a)}{b-a}.\] To generalize this in higher dimensions, we must rewrite it as \((b-a)f'(c)=f(b)-f(a)\) so that we don’t divide by vectors. Then after replacing the derivative with the gradient and multiplication with the dot product, we have

Suppose that \(f\) is a real-valued differentiable function defined on an open set \(S\subseteq\R^n\). For two points \(\mathbf a,\mathbf b\in S\), let \(L_{\mathbf a, \mathbf b}\) denote the line segment that connects them. If \(L_{\mathbf a,\mathbf b}\subseteq S\), then there exists \(\mathbf c\in L_{\mathbf a,\mathbf b}\) such that \[ f(\mathbf b)-f(\mathbf a) = (\mathbf b-\mathbf a)\cdot \nabla f(\mathbf c). \]

For \(t\in [0,1]\), let \(\gamma(t) = t\mathbf b+(1-t)\mathbf a\), the straight line segment from \(\mathbf a\) to \(\mathbf b\). Note that \[ L_{\mathbf a,\mathbf b} = \{ \gamma(s) : 0\le t \le 1 \}. \] Next, define \[ \phi(t) = f(\gamma(t)). \] According to the single-variable Mean Value Theorem from MAT137, there exist \(\tau\in (0,1)\) such that \[ \frac{\phi(1) - \phi(0)}{1-0} = \phi'(\tau). \] The Chain Rule implies that \(\phi'(\tau) = \nabla f(\gamma(\tau))\cdot (\mathbf b-\mathbf a)\). So if we define \(\mathbf c = \gamma(\tau)\), then \(\mathbf c\in L_{\mathbf a,\mathbf b}\), and since \(\phi(0)=f(\mathbf a)\) and \(\phi(1) = f(\mathbf b)\), the above identity becomes \[ f(\mathbf b) - f(\mathbf a) = \nabla f(\mathbf c)\cdot(\mathbf b-\mathbf a). \]

Note that we require the line \(L_{\mathbf a, \mathbf b}\) to be contained in the domain of \(f\), and that it must be a line segment, not an arbitrary path. Try to explain where the fact that it is a line segment is used.

Some consequences

The single variable Mean Value Theorem is used to prove things like

if \(|f'(t)|\le M\) for all \(t\) in an interval \((a,b)\), then the slope between any two points on the graph of \(f\) is at most \(M\), that is, \[ |f(t) - f(s)| \le M|t-s| \text{ for all }s,t\in (a,b). \]
if \(f'(t)=0\) for all \(t\) in an interval \((a,b)\), then \(f\) is constant on \((a,b)\).

In this section we will show how the Mean Value Theorem can be used to prove similar facts in higher dimensions.

Since it was important that the domain of \(f\) contained an entire line segment between \(\mathbf a\) and \(\mathbf b\), we will name those sets where this holds for any two points.

A set \(S\subseteq \R^n\) is said to be convex if, for every \(\mathbf a, \mathbf b\in S\), the line segment \(L_{\mathbf a,\mathbf b}\) is contained in \(S\). That is, \[ \forall \mathbf a,\mathbf b\in S,\forall s\in [0,1], \qquad s\mathbf b+ (1-s)\mathbf a \in S. \]

In other words, if \(S\) is convex, then the geometric assumption in the Mean Value Theorem is satisfied for every pair of points \(\mathbf a\) and \(\mathbf b\) in \(S\).

Example 1.

A ball \(B(\mathbf p; r)\) is convex.

The proof is in Section 1.5, where we proved that \(B(\mathbf p; r)\) is path-connected. Since the path we described was the line segment between points, this showed it is also convex.

Examples 2.

Here are a number of other examples of convex sets. The proofs are exercises.

A solid ellipsoid is convex. This is a set of the form \[ S = \{ \mathbf x \in \R^n : (x_1/a_1)^2+ \cdots + (x_n/a_n)^2 \le 1\} \] where \(a_1,\ldots, a_n\) are nonzero constants.
An intersection of convex sets is convex. A union of convex sets may not be convex; try to drawing an example of this in \(\R^2\).
Any subspace of \(\R^n\) is convex. Recall that a subspace is nonempty, and closed under vector addition and scalar multiplication. In particular, the range and the nullspace of a linear transformation are both convex.
If \(L:\R^n \to \R^m\) is an affine function, i.e., a function of the form \[ L(\mathbf x) = A\mathbf x + \mathbf b \]where \(A\) is an \(m\times n\) matrix and \(\mathbf b\in \R^m\), and if \(S\) is a convex subset of \(\R^n\), then the image \(L(S)\).

Suppose that \(S\) is an open, convex subset of \(\R^n\). Let \(f:S\to \R\) be a differentiable function, such that there exists \(M\ge 0\) with \(|\nabla f(\mathbf x)|\le M\) for all \(\mathbf x\in S\). Then for every \(\mathbf a, \mathbf b\in S\), \[ f(\mathbf b)- f(\mathbf a) \le M |\mathbf b - \mathbf a|. \]

This generalizes the first single variable application that we gave.

Fix any \(\mathbf a,\mathbf b\in S\). The Mean Value Theorem implies that there exists some \(\mathbf c\in L_{\mathbf a,\mathbf b}\subseteq S\) such that \[ f(\mathbf b) - f(\mathbf a) = \nabla f(\mathbf c)\cdot (\mathbf b - \mathbf a). \]Since they are real numbers, we have \[f(\mathbf b) - f(\mathbf a) \leq |f(\mathbf b) - f(\mathbf a)|.\] Then Cauchy-Schwarz inequality implies that \[ |\nabla f(\mathbf c)\cdot (\mathbf b - \mathbf a)| \leq |\nabla f(\mathbf c)| \ |\mathbf b - \mathbf a|. \] Our hypotheses imply that \(|\nabla f(\mathbf c)|\le M\), so the conclusion of the theorem follows.

Suppose that \(S\) is an open, convex subset of \(\R^n\) and that \(f:S\to \R\) is a differentiable function. If \(\nabla f(\mathbf x )=\mathbf 0\) for every \(\mathbf x\in S\), then \(f\) is constant on \(S\).

This generalizes the second application we gave.

Let \(\mathbf a \in S\) and let \(c = f(\mathbf a)\). Applying Theorem 2 with \(M=0\), we have for every \(\mathbf b\in S\), \[ |f(\mathbf b) - c| = |f(\mathbf b)-f(\mathbf a)| \le 0\cdot |\mathbf b-\mathbf a| = 0. \]Thus \(f(\mathbf b) = c\) for every \(\mathbf b\in S\).

In this example, the hypothesis that \(S\) is convex is much stronger than necessary, and can be replaced by a weaker geometric condition.

Suppose that \(S\) is an open, path-connected subset of \(\R^n\) and that \(f:\R^n\to \R\) is a function that is differentiable in \(S\). If \(\nabla f(\mathbf x )=\mathbf 0\) for every \(\mathbf x\in S\), then \(f\) is constant on \(S\).

The proof uses an \((\varepsilon, \delta)\) argument to apply Theorem 3 to a small ball around each point on a path.

We need to show that if \(\mathbf a, \mathbf b\) are any two points in \(S\), then \(f(\mathbf a) = f(\mathbf b)\). So, fix any \(\mathbf a,\mathbf b\). By the hypothesis of path-connectedness, there exists \(\gamma:[0,1]\to S\) that is continuous such that \(\gamma(0)=\mathbf a\) and \(\gamma(1)=\mathbf b\).

Define \(\phi(s) = f(\gamma(s))\). We will show that \(\phi(s)\) is constant, and hence \(\phi(0)=\phi(1)\). Note that we cannot use the chain rule, since we only know that \(\gamma\) is continuous, not differentiable, and even if it were differentiable, we do not know its derivative.

Fix \(s\in [0,1]\). Since \(S\) is open, there exists \(\varepsilon>0\) such that \(B(\gamma(s); \varepsilon)\subseteq S\). Since \(\gamma\) is continuous, there exists \(\delta>0\) such that if \(|h|<\delta\) and \(s+h\in [0,1]\), then \(\gamma(s+h) \in B(\gamma(s);\varepsilon)\).

Now we can apply Theorem 3 to \(B(\gamma(s); \varepsilon)\), since it is a convex open set on which \(\nabla f = \mathbf 0\) everywhere, hence \(f(\mathbf x) = f(\gamma(s))\) for every \(\mathbf x\in B(\varepsilon, \gamma(s))\). In particular, for all \(|h|<\delta\), \[ \phi(s+h) = f(\gamma(s+h)) = f(\gamma(s)) = \phi(s). \] Thus, \(\phi\) is constant, by using compactness of the interval \([0,1]\).

Problems

Basic

You will need to recognize and apply Mean Value Theorem, and apply the definition of convex set.

Suppose that \(f:\R^n\to \R\) is a \(C^1\) function and that there exists a vector \({\bf v}\in \R^n\) such that \[ {\bf v}\cdot \nabla f(\mathbf x) = 0\qquad\text{ for all }\mathbf x\in \R^n. \] Prove that for every \(\mathbf x \in \R^n\) and every \(t\in \R\), \[ f(\mathbf x + t{\bf v}) = f(\mathbf x). \] That is, moving along a vector that is orthogonal to the gradient does not change the value of the function (like walking in a circle around a hill).
Prove that every convex set is path-connected.
Draw a picture of the following sets and determine whether they are convex

\(S = \{ (x,y)\in \R^2 : (x/2)^2+ (y/3)^2 \le 1\}\).
\(S = \{ (x,y)\in \R^2 : (x/2)^2- (y/3)^2 \le 1\}\).
\(S = \{ (x,y)\in \R^2 : y \ge e^{x} \}\).
\(S = \{ (x,y)\in \R^2 : x < e^{-y^2} \}\).
\(S = \{ (x,y)\in \R^2 : xy <1 \}\).
\(S = \{ (x,y)\in \R^2 : y> k - x/k^2 \text{ for all }k\in \mathbb N \}\).

Advanced

Assume that \(S\) is an open subset of \(\R^2\), and that \(f:S\to \R\) is a differentiable function such that \(\partial_1 f = 0\) everywhere in \(S\).
- If \(S\) is convex, is it true that \(f\) depends only on the \(y\) variable, in other words, that \(f(x,y)= f(x', y)\) whenever \((x,y)\) and \((x',y)\) belong to \(S\)?
- Same question if \(S\) is not convex. As an example, try \[ S = \{ (x,y)\in \R^2 : 2x^2< y <1+x^2 \}. \]

Assume that \(S\) is a convex subset of \(\R^n\) and that \(f:\R^n\to \R^m\) is an affine function, i.e., of the form \[ f(\mathbf x) = A \mathbf x + \mathbf b \]for some \(m\times n\) matrix \(A\) and some \(b\in \R^m\). Prove that \(f(S) = \{ f(\mathbf x) : \mathbf x \in S\}\) is convex.
Prove that if \(S_1, S_2, \ldots,\) are convex sets, then
- \(S_1\cap S_2\) is convex.
- \(S_1\cap S_2\cap \cdots \cap S_k\) is convex for any \(k\ge 3\).
- \(\cap_{k=1}^\infty S_k\) is convex. Recall that a point \(\mathbf x\) belongs to \(\cap_{k=1}^\infty S_k\) if and only if it belongs to \(S_k\) for every \(k \in \mathbb N\).
Prove that a set \(S\) of the form \[ S = \{ \mathbf x \in \R^n : (x_1/a_1)^2+ \cdots + (x_n/a_n)^2 \le 1\} \] is convex, where \(a_1,\ldots, a_n\) are nonzero constants.

Hint
Combining one of the exercises above and the unit ball in \(\R^n\).
Let \(g:\R^n\to [0,\infty)\) be a function that is homogeneous of degree \(1\), and such that \(g(\mathbf x+\mathbf y) \le g(\mathbf x) + g(\mathbf y)\) for all \(\mathbf x, \mathbf y\in \R^n\). Prove that \[ \{ \mathbf x\in \R^n : g(\mathbf x) < 1 \} \]is convex.

\(\Leftarrow\) \(\Uparrow\) \(\Rightarrow\)

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Canada License.