\newcommand{\R}{\mathbb R } \newcommand{\N}{\mathbb N } \newcommand{\Z}{\mathbb Z } \newcommand{\bfa}{\mathbf a} \newcommand{\bfb}{\mathbf b} \newcommand{\bfc}{\mathbf c} \newcommand{\bff}{\mathbf f} \newcommand{\bfg}{\mathbf g} \newcommand{\bfG}{\mathbf G} \newcommand{\bfh}{\mathbf h} \newcommand{\bfu}{\mathbf u} \newcommand{\bfx}{\mathbf x} \newcommand{\bfp}{\mathbf p} \newcommand{\bfy}{\mathbf y} \newcommand{\ep}{\varepsilon}
Theorem 1. the Mean Value Theorem. Assume that f is a real-valued function of class C^1 defined on an open set S\subset\R^n. For two points \bfa,\bfb\in S, let L_{\bfa, \bfb} denote the line segment that connects them. If L_{\bfa,\bfb}\subset S, then there exists \bfc\in L_{\bfa,\bfb} such that f(\bfb)-f(\bfa) = (\bfb-\bfa)\cdot \nabla f(\bfc)
For s\in [0,1], let \gamma(s) := s\bfb+(1-s)\bfa. Note that L_{\bfa,\bfb} = \{ \gamma(s) : 0\le s \le 1 \}. Next, define \phi(s) := f(\gamma(s)). According to the single-variable Mean Value Theorem from MAT137, there exist \sigma\in (0,1) such that \frac{\phi(1) - \phi(0)}{1-0} = \phi'(\sigma). Also, the Chain Rule implies that \phi'(\sigma) = \nabla f(\gamma(\sigma))\cdot (\bfb-\bfa). So if we define \bfc := \gamma(\sigma), then \bfc\in L_{\bfa,\bfb}, and since \phi(0)=f(\bfa) and \phi(1) = f(\bfb), the above identity becomes f(\bfb) - f(\bfa) = \nabla f(\bfc)\cdot(\bfb-\bfa). \qquad \qquad\Box
The 1d Mean Value Theorem, familar from MAT137, is used to prove things like
if a function f has the property that f'(t)=0 for all t in an interval (a,b), then f is constant on (a,b).
if a function f has the property that |f'(t)|\le M for all t in an interval (a,b), then the slope of f between any two points is at most M. More precisely, |f'(t)|\le M\mbox{ for all }t\in (a,b) \quad\Rightarrow \quad |f(t) - f(s)| \le M|t-s| \mbox{ for all }s,t\in (a,b).
In this section we will show how the Mean Value Theorem can be used to prove similar facts in higher dimensions.
First, we introduce a class of sets on which the Mean Value Theorem is particularly useful.
Definition 1: A set S\subset \R^n is said to be convex if, for every \bfa, \bfb\in S, the line segment L_{\bfa,\bfb} is contained in S. That is, \forall \bfa,\bfb\in S,\forall s\in [0,1], \qquad s\bfb+ (1-s)\bfa \in S.
In other words, if S is convex, then the geometric assumption in the Mean Value Theorem is satisfied for every pair of points \bfa and \bfb in S.
Example 1. A ball B(r,\bfp) is convex.
The proof below is essentially copied from Section 1.5, where we proved that B(r,\bfp) is path-connected. As you can see, the proof we gave there actually shows that it is convex.
Thus B(r,\bfp) is convex. \qquad\qquad \Box
Examples 2. Here are a number of other examples of convex sets. The proofs are execises.
A solid ellipsoid is convex. By this we mean a set of the form S = \{ \bfx \in \R^n : (x_1/a_1)^2+ \cdots + (x_n/a_n)^2 \le 1\} where a_1,\ldots, a_n are nonzero constants.
An intersection of convex sets is convex. (A union of convex sets need not be convex; you can easily convince yourself of this by drawing a picture.)
A subspace of \R^n is convex. In particular, the range and the nullspace of a matrix are both convex.
If L:\R^n \to \R^m is a function of the form L(\bfx) = A\bfx + \bfb\qquad\mbox{ where }A\mbox{ is an }m\times n\mbox{ matrix, and }\bfb\in \R^m and if S is a convex subset of \R^n, then L(S) := \{L(\bfx) : \bfx\in S\}\quad\mbox{ is convex}.
Theorem 2. Assume that S is an open, convex subset of \R^n and that f:\R^n\to \R is a function that is differentiable in S, and moreover that there exists M\ge 0 such that |\nabla f(\bfx)|\le M for all \bfx\in S. Then for every \bfa, \bfb\in S, |f(\bfb)- f(\bfa)| \le M |\bfb - \bfa|.
This is very similar to a standard application of the 1d-mean value theorem.
Fix any \bfa,\bfb\in S. The Mean Value Theorem implies that there exists some \bfc\in L_{\bfa,\bfb}\subset S such that f(\bfb) - f(\bfa) = \nabla f(\bfc)\cdot (\bfb - \bfa). Then Cauchy's inequality implies that |f(\bfb) - f(\bfa)| = |\nabla f(\bfc)\cdot (\bfb - \bfa)| \le |\nabla f(\bfc)| \ |\bfb - \bfa|. Our hypotheses imply that |\nabla f(\bfc)|\le M, so the conclusion of the theore follows.
Theorem 3. Assume that S is an open, convex subset of \R^n and that f:\R^n\to \R is a function that is differentiable in S. If \nabla f(\bfx )={\bf 0} for every \bfx\in S, then f is constant on S.
This is the multi-variable version of a familiar theorem from first-year calculus: if f'=0 everywhere on an interval, then f is constant on that interval. (Recall, the proof of that theorem uses the 1d version of the the mean value theorem.)
Proof. Fix some \bfa \in S and let c = f(\bfa). Apply Theorem 2 with M=0 to find that for every \bfb\in S, |f(\bfb) - c| = |f(\bfb)-f(\bfa)| \le 0\cdot |\bfb-\bfa| = 0. Thus f(\bfb) = c for every \bfb\in S. \quad \Box
In fact, the hypothesis of convexity is much stronger than necessary, and it can be replaced by a much weaker geometric condition.
Theorem 4. Assume that S is an open, path-connected subset of \R^n and that f:\R^n\to \R is a function that is differentiable in S. If \nabla f(\bfx )={\bf 0} for every \bfx\in S, then f is constant on S.
The proof is not very difficult, but it is a slightly sneaky.
We need to show that if \bfa, \bfb are any two points in S, then f(\bfa) = f(\bfb). So, fix any \bfa,\bfb. By the hypothesis of path-connectedness, there exists \gamma:[0,1]\to S that is continuous such that \gamma(0)=\bfa and \gamma(1)=\bfb.
Define \phi(s) = f(\gamma(s)). We will show that \phi'(s)=0 for every s\in (0,1). Note that we cannot use the chain rule, since we only know that \gamma is continuous, not differentiable.
To do this, fix s\in (0,1). Since S is open, there exists \ep>0 such that B(\ep,\gamma(s))\subset S. Since \gamma is continuous, there exists \delta>0 such that if |h|<\delta, then s+h\in (0,1) and |\gamma(s+h)-\gamma(s)|<\ep. In other words, |h|<\delta \quad\Rightarrow\quad \gamma(s+h)\in B(\ep,\gamma(s)) However, B(\ep,\gamma(s)) is a convex open set on which \nabla f = {\bf 0} everywhere, so Theorem 3 implies that f(\bfx) = f(\gamma(s)) for every \bfx\in B(\ep, \gamma(s)). In particular, it follows that |h|<\delta \quad \Rightarrow \quad \phi(s+h) - \phi(s) = f(\gamma(s+h)) - f(\gamma(s)) = 0. It easily follows that \phi'(s)=0. Since s was arbitrary, we conclude that \phi'=0 everywhere in (0,1). Finally, the 1-d Mean Value Theorem implies that f(\bfb) - f(\bfa) = \phi(1)-\phi(0) = 0.
There are not really any Basic Skills connected to the material in this section.
(This question was discussed in Tutorial 5.) Assume that S is an open subset of \R^2, and that f:S\to \R is a differentiable function such that \partial_1 f = 0 everywhere in S.
If S is convex, is it true that f depends only on the y variable, in other words, that f(x,y)= f(x', y) whenever (x,y) and (x',y) belong to S?
Same question if S is not convex. For concreteness, assume that S = \{ (x,y)\in \R^2 : 2x^2< y <1+x^2 \}.
Assume that f:\R^n\to \R is a C^1 function and that there exists a vector {\bf v}\in \R^n such that {\bf v}\cdot \nabla f(\bfx) = 0\qquad\mbox{ for all }\bfx\in \R^n. Prove that for every \bfx \in \R^n and every t\in \R, f(\bfx + t{\bf v}) = f(\bfx).
Prove that every convex set is path-connected. (This should be easy).
Draw a picture of the following sets and determine whether they are convex
S = \{ (x,y)\in \R^2 : (x/2)^2+ (y/3)^2 \le 1\}.
S = \{ (x,y)\in \R^2 : (x/2)^2- (y/3)^2 \le 1\}.
S = \{ (x,y)\in \R^2 : y \ge e^{x} \}.
S = \{ (x,y)\in \R^2 : x < e^{-y^2} \}.
S = \{ (x,y)\in \R^2 : xy <1 \}.
S = \{ (x,y)\in \R^2 : y> k - x/k^2 \mbox{ for all }k\in \mathbb N \}.
(In this question, the issue of convexity is really just an excuse to
ask you to draw pictures of various subsets of \R^2, something you shoud be able to do without the assistance of an electronic device. If you find it difficult, put away your phone and practice.)
Assume that S is a convex subset of \R^n and that
f:\R^n\to \R^m is a function of the form
f(\bfx) = A \bfx + \bfb
for some m\times n matrix A and some b\in \R^m.
Prove that
f(S) := \{ f(\bfx) : \bfx \in S\}
is convex.
Prove that if S_1, S_2, \ldots, are convex sets, then
Prove that a set S of the form S = \{ \bfx \in \R^n : (x_1/a_1)^2+ \cdots + (x_n/a_n)^2 \le 1\} is convex, where a_1,\ldots, a_n are nonzero constants. Hint: a relatively easy way to do this is by combining one of the exercises above and the fact that the unit ball in \R^n is convex,
Let g:\R^n\to [0,\infty) be a function that is homogeneous of degree
1, and such that g(\bfx+\bfy) \le g(\bfx) + g(\bfy)
for all \bfx, \bfy\in \R^n. Prove that
\{ \bfx\in \R^n : g(\bfx) < 1 \}
is convex.
Recall, homogenoeus of degree 1
means that f(\lambda \bfx) = \lambda f(\bfx) for all \bfx\in \R^n and \lambda>0.