2.4: the Mean Value Theorem

$\newcommand{\R}{\mathbb R }$ $\newcommand{\N}{\mathbb N }$ $\newcommand{\Z}{\mathbb Z }$ $\newcommand{\bfa}{\mathbf a}$ $\newcommand{\bfb}{\mathbf b}$ $\newcommand{\bfc}{\mathbf c}$ $\newcommand{\bff}{\mathbf f}$ $\newcommand{\bfg}{\mathbf g}$ $\newcommand{\bfG}{\mathbf G}$ $\newcommand{\bfh}{\mathbf h}$ $\newcommand{\bfu}{\mathbf u}$ $\newcommand{\bfx}{\mathbf x}$ $\newcommand{\bfp}{\mathbf p}$ $\newcommand{\bfy}{\mathbf y}$ $\newcommand{\ep}{\varepsilon}$

the Mean Value Theorem

  1. The Mean Value Theorem
  2. Some consequences
  3. Problems

The Mean Value Theorem

Theorem 1. the Mean Value Theorem. Assume that $f$ is a real-valued function of class $C^1$ defined on an open set $S\subset\R^n$. For two points $\bfa,\bfb\in S$, let $L_{\bfa, \bfb}$ denote the line segment that connects them. If $L_{\bfa,\bfb}\subset S$, then there exists $\bfc\in L_{\bfa,\bfb}$ such that $$ f(\bfb)-f(\bfa) = (\bfb-\bfa)\cdot \nabla f(\bfc) $$

Proof.

For $s\in [0,1]$, let $\gamma(s) := s\bfb+(1-s)\bfa$. Note that $$ L_{\bfa,\bfb} = \{ \gamma(s) : 0\le s \le 1 \}. $$ Next, define $$ \phi(s) := f(\gamma(s)). $$ According to the single-variable Mean Value Theorem from MAT137, there exist $\sigma\in (0,1)$ such that $$ \frac{\phi(1) - \phi(0)}{1-0} = \phi'(\sigma). $$ Also, the Chain Rule implies that $\phi'(\sigma) = \nabla f(\gamma(\sigma))\cdot (\bfb-\bfa)$. So if we define $\bfc := \gamma(\sigma)$, then $\bfc\in L_{\bfa,\bfb}$, and since $\phi(0)=f(\bfa)$ and $\phi(1) = f(\bfb)$, the above identity becomes $$ f(\bfb) - f(\bfa) = \nabla f(\bfc)\cdot(\bfb-\bfa). \qquad \qquad\Box $$

Some consequences

The $1d$ Mean Value Theorem, familar from MAT137, is used to prove things like

In this section we will show how the Mean Value Theorem can be used to prove similar facts in higher dimensions.

First, we introduce a class of sets on which the Mean Value Theorem is particularly useful.

Definition 1: A set $S\subset \R^n$ is said to be convex if, for every $\bfa, \bfb\in S$, the line segment $L_{\bfa,\bfb}$ is contained in $S$. That is, $$ \forall \bfa,\bfb\in S,\forall s\in [0,1], \qquad s\bfb+ (1-s)\bfa \in S. $$

In other words, if $S$ is convex, then the geometric assumption in the Mean Value Theorem is satisfied for every pair of points $\bfa$ and $\bfb$ in $S$.

Example 1. A ball $B(r,\bfp)$ is convex.

The proof below is essentially copied from Section 1.5, where we proved that $B(r,\bfp)$ is path-connected. As you can see, the proof we gave there actually shows that it is convex.

Proof. Let's write $$ \gamma(s) = \bfa + s(\bfb - \bfa) = (1-s) \bfa + s\bfb . $$ We have to show that $|\gamma(s) - \bfp|<r$ for all $s\in [0,1]$. In fact this is the case, because for $s\in [0,1]$, \begin{align*} |\gamma(s) - \bfp| &= |(1-s) \bfa + s\bfb - \bfp| &\mbox{(definiton of $\gamma(s)$)}\\ &= |(1-s) (\bfa-\bfp) + s(\bfb - \bfp)| &\mbox{(rewrite)}\\ &\le |(1-s) (\bfa-\bfp)| + |s(\bfb - \bfp)| &\mbox{(triangle ineq.)}\\ %&= %|1-s|\, |\bfa-\bfp| + |s|\ |\bfb - \bfp| &\mbox{(triangle ineq.)}\\ &< (1-s) r + s r = r &\mbox{ since $\bfa, \bfb \in B(r,\bfp)$}. \end{align*}

Thus $B(r,\bfp)$ is convex. $\qquad\qquad \Box$

Examples 2. Here are a number of other examples of convex sets. The proofs are execises.

Theorem 2. Assume that $S$ is an open, convex subset of $\R^n$ and that $f:\R^n\to \R$ is a function that is differentiable in $S$, and moreover that there exists $M\ge 0$ such that $|\nabla f(\bfx)|\le M$ for all $\bfx\in S$. Then for every $\bfa, \bfb\in S$, $$ |f(\bfb)- f(\bfa)| \le M |\bfb - \bfa|. $$

This is very similar to a standard application of the 1d-mean value theorem.

Proof.

Fix any $\bfa,\bfb\in S$. The Mean Value Theorem implies that there exists some $\bfc\in L_{\bfa,\bfb}\subset S$ such that $$ f(\bfb) - f(\bfa) = \nabla f(\bfc)\cdot (\bfb - \bfa). $$ Then Cauchy's inequality implies that $$ |f(\bfb) - f(\bfa)| = |\nabla f(\bfc)\cdot (\bfb - \bfa)| \le |\nabla f(\bfc)| \ |\bfb - \bfa|. $$ Our hypotheses imply that $|\nabla f(\bfc)|\le M$, so the conclusion of the theore follows.

Theorem 3. Assume that $S$ is an open, convex subset of $\R^n$ and that $f:\R^n\to \R$ is a function that is differentiable in $S$. If $\nabla f(\bfx )={\bf 0}$ for every $\bfx\in S$, then $f$ is constant on $S$.

This is the multi-variable version of a familiar theorem from first-year calculus: if $f'=0$ everywhere on an interval, then $f$ is constant on that interval. (Recall, the proof of that theorem uses the 1d version of the the mean value theorem.)

Proof. Fix some $\bfa \in S$ and let $c = f(\bfa)$. Apply Theorem 2 with $M=0$ to find that for every $\bfb\in S$, $$ |f(\bfb) - c| = |f(\bfb)-f(\bfa)| \le 0\cdot |\bfb-\bfa| = 0. $$ Thus $f(\bfb) = c$ for every $\bfb\in S$. $\quad \Box$

In fact, the hypothesis of convexity is much stronger than necessary, and it can be replaced by a much weaker geometric condition.

Theorem 4. Assume that $S$ is an open, path-connected subset of $\R^n$ and that $f:\R^n\to \R$ is a function that is differentiable in $S$. If $\nabla f(\bfx )={\bf 0}$ for every $\bfx\in S$, then $f$ is constant on $S$.

The proof is not very difficult, but it is a slightly sneaky.

Proof.

We need to show that if $\bfa, \bfb$ are any two points in $S$, then $f(\bfa) = f(\bfb)$. So, fix any $\bfa,\bfb$. By the hypothesis of path-connectedness, there exists $\gamma:[0,1]\to S$ that is continuous such that $\gamma(0)=\bfa$ and $\gamma(1)=\bfb$.

Define $\phi(s) = f(\gamma(s))$. We will show that $\phi'(s)=0$ for every $s\in (0,1)$. Note that we cannot use the chain rule, since we only know that $\gamma$ is continuous, not differentiable.

To do this, fix $s\in (0,1)$. Since $S$ is open, there exists $\ep>0$ such that $B(\ep,\gamma(s))\subset S$. Since $\gamma$ is continuous, there exists $\delta>0$ such that if $|h|<\delta$, then $s+h\in (0,1)$ and $|\gamma(s+h)-\gamma(s)|<\ep$. In other words, $$ |h|<\delta \quad\Rightarrow\quad \gamma(s+h)\in B(\ep,\gamma(s)) $$ However, $B(\ep,\gamma(s))$ is a convex open set on which $\nabla f = {\bf 0}$ everywhere, so Theorem 3 implies that $f(\bfx) = f(\gamma(s))$ for every $\bfx\in B(\ep, \gamma(s))$. In particular, it follows that $$ |h|<\delta \quad \Rightarrow \quad \phi(s+h) - \phi(s) = f(\gamma(s+h)) - f(\gamma(s)) = 0. $$ It easily follows that $\phi'(s)=0$. Since $s$ was arbitrary, we conclude that $\phi'=0$ everywhere in $(0,1)$. Finally, the $1$-d Mean Value Theorem implies that $$ f(\bfb) - f(\bfa) = \phi(1)-\phi(0) = 0. $$

Problems

Basic skills

There are not really any Basic Skills connected to the material in this section.

More advanced questions

  1. (This question was discussed in Tutorial 5.) Assume that $S$ is an open subset of $\R^2$, and that $f:S\to \R$ is a differentiable function such that $\partial_1 f = 0$ everywhere in $S$.

  2. Assume that $f:\R^n\to \R$ is a $C^1$ function and that there exists a vector ${\bf v}\in \R^n$ such that $$ {\bf v}\cdot \nabla f(\bfx) = 0\qquad\mbox{ for all }\bfx\in \R^n. $$ Prove that for every $\bfx \in \R^n$ and every $t\in \R$, $$ f(\bfx + t{\bf v}) = f(\bfx). $$

  3. Prove that every convex set is path-connected. (This should be easy).

  4. Draw a picture of the following sets and determine whether they are convex

  5. Assume that $S$ is a convex subset of $\R^n$ and that $f:\R^n\to \R^m$ is a function of the form $$ f(\bfx) = A \bfx + \bfb $$ for some $m\times n$ matrix $A$ and some $b\in \R^m$.
    Prove that $$ f(S) := \{ f(\bfx) : \bfx \in S\} $$ is convex.

  6. Prove that if $S_1, S_2, \ldots, $ are convex sets, then

  7. Prove that a set $S$ of the form $$ S = \{ \bfx \in \R^n : (x_1/a_1)^2+ \cdots + (x_n/a_n)^2 \le 1\} $$ is convex, where $a_1,\ldots, a_n$ are nonzero constants. Hint: a relatively easy way to do this is by combining one of the exercises above and the fact that the unit ball in $\R^n$ is convex,

  8. Let $g:\R^n\to [0,\infty)$ be a function that is homogeneous of degree $1$, and such that $g(\bfx+\bfy) \le g(\bfx) + g(\bfy)$ for all $\bfx, \bfy\in \R^n$. Prove that $$ \{ \bfx\in \R^n : g(\bfx) < 1 \} $$ is convex.
    Recall, homogenoeus of degree $1$ means that $f(\lambda \bfx) = \lambda f(\bfx)$ for all $\bfx\in \R^n$ and $\lambda>0$.

    $\Leftarrow$  $\Uparrow$  $\Rightarrow$