3.1: the Imploicit Function Theorem I

$\newcommand{\R}{\mathbb R }$ $\newcommand{\N}{\mathbb N }$ $\newcommand{\Z}{\mathbb Z }$ $\newcommand{\bfa}{\mathbf a}$ $\newcommand{\bfb}{\mathbf b}$ $\newcommand{\bfc}{\mathbf c}$ $\newcommand{\bff}{\mathbf f}$ $\newcommand{\bfF}{\mathbf F}$ $\newcommand{\bfk}{\mathbf k}$ $\newcommand{\bfg}{\mathbf g}$ $\newcommand{\bfG}{\mathbf G}$ $\newcommand{\bfh}{\mathbf h}$ $\newcommand{\bfu}{\mathbf u}$ $\newcommand{\bfv}{\mathbf v}$ $\newcommand{\bfx}{\mathbf x}$ $\newcommand{\bfp}{\mathbf p}$ $\newcommand{\bfy}{\mathbf y}$ $\newcommand{\ep}{\varepsilon}$

The Implicit Function Theorem I

Statement of the Implicit Function Theorem

derivatives of implicitly defined functions

Why is the theorem true?

Problems

Statement of the Implicit Function Theorem

The implicit function theorem addresses a question that has two versions

the analytic version --- a question about finding solutions of a system of nonlinear equations.
the geometric version --- a question about the geometric structure of a certain sets.

The connection between them is that the certain sets mentioned in the geometric formulation can be seen as the set of solutions of a system of nonlinear equations.

Notation. The theorem considers a function $\bfF:S\to \R^k$ is a function of class $C^1$, where $S$ is an open subset of $\R^{n+k}$.

In this situation, we will write points in $\R^{n+k}$ in the form $(\bfx,\bfy)$, where $\bfx = (x_1, \ldots, x_n)\in \R^n$ and $\bfy = (y_1,\ldots, y_k)\in \R^k$.

It is also convenient to write \begin{align} D_\bfx \bfF &:=\mbox{ matrix of partial derivatives with respect to }x_1, \ldots, x_n\nonumber \\ &= \left( \begin{array}{ccc} \frac{\partial F_1}{\partial x_1} &\cdots &\frac{\partial F_1}{\partial x_n} \\ \vdots&\ddots&\vdots \\ \frac{\partial F_k}{\partial x_1} &\cdots &\frac{\partial F_k}{\partial x_n}
\end{array} \right)\nonumber \end{align} and \begin{align} D_\bfy\bfF &:=\mbox{ matrix of partial derivatives with respect to }y_1, \ldots, y_k \nonumber \\ &=\left( \begin{array}{ccc} \frac{\partial F_1}{\partial y_1} &\cdots &\frac{\partial F_1}{\partial y_k} \\ \vdots&\ddots&\vdots \\ \frac{\partial F_k}{\partial y_1} &\cdots &\frac{\partial F_k}{\partial y_k} \nonumber \end{array} \right) \end{align}

Implicit Function Theorem. Assume that $S$ is an open subset of $\R^{n+k}$ and that $\bfF:S\to \R^k$ is a function of class $C^1$. Assume also that $(\bfa, \bfb)$ is a point in $S$ such that $$ \bfF(\bfa, \bfb) = {\bf 0} \qquad\mbox{ and } \qquad \det D_\bfy \bfF(\bfa, \bfb) \ne 0. $$

i. Then there exist $r_0,r_1>0$ such that for every $\bfx\in \R^n$ such that $|\bfx-\bfa|< r_0$, there exists a unique $\bfy\in \R^k$ such that $|\bfy - \bfb|< r_1$ \begin{equation}\label{ImFT.eq1} \bfF(\bfx, \bfy) = \bf0. \end{equation} In other words, equation \eqref{ImFT.eq1} implicitly defines a function $\bfy = \bff(\bfx)$ for $\bfx\in \R^n$ near $\bfa$, with $\bfy = \bff(\bfx)$ close to $\bfb$. Note in particular that $\bff(\bfa) = \bfb$.

ii. Moreover, the function $\bff:B(r_0, \bfa)\to B(r_1,\bfb)\subset \R^k$ from part (i) above is of class $C^1$, and its derivatives may be determined by differentiating the identity $$ \bfF(\bfx, \bff(\bfx)) = \bf0 $$ (a consequence of the definition of $\bff$) and solving to find the partial derivatives of $\bff$.

Part (ii) of the theorem is discussed in greater detail below.

A general formula for $D\bff$ is given in \eqref{gFinv} below, but it may not be comprehensible without first looking at concrete examples that precede it.

The analytic content of the theorem is this: suppose we want to solve the equation $\bfF(\bfx, \bfy)= {\bf 0}$ for $\bfy$ as a function of $\bfx$, say $\bfy = \bff(\bfx)$. If we have a solution $\bfb = \bff(\bfa)$, then in principle it is possible to solve for $\bfx$ near $\bfa$, if the crucial hypothesis $\det D_\bfy \bfF(\bfa,\bfb)\ne0$ holds. Thus it is a theorem about the possibility of solving systems of (in general nonlinear) equations.

The geometric content of the theorem is discussed in more detail in Section 3.2. (See also Sections 3.2 and 3.3 of Folland's Advanced Calculus.)

Why the Implicit Function Theorem is a great theorem

In general, if someone gives you a system of $k$ nonlinear equations in $k$ unknowns, it is not just impossible to solve; it is (in practice) impossible to determine whether it has any solutions.

This is in stark contrast to a system of $k$ linear equations in $k$ unknowns, for which there are we completely understand when an equation is solvable and how to find a solution, when it exists.

The Implicit Function Theorem allows us to (partly) reduce impossible questions about systems of nonlinear equations to straightforward questions about systems of linear equations. This is great!

The theorem is great, but it is not miraculous, so it has some limitations. These include

in order to get information about the equation $$ \bfF(\bfx, \bfy)=\bf0, $$ (which we can think of as a system of $k$ equation for $\bfy= (y_1,\ldots, y_k)$, with coefficients etc that depend on $\bfx$) we have to start by knowing one solution $\bfy = \bfb$ for some specific $\bfx = \bfa$.
The equation only tells us about solvability of the system for values of $\bfx$ close to $\bfa$, with $\bfy$ close to $\bfb$. That is, it only gives local information near the point $(\bfa, \bfb)$. Even worse, it does not tell us how close we need to be for the conclusions of the theorem to hold.
The theorem does not give a formula for $\bfy = \bff(\bfx)$ solving $\bfF(\bfx, \bfy)= 0$. It does however, tell us how to compute $D\bff(\bfa)$, and this allows us to approximate the solution $$ \bff(\bfa+\bfh) \approx \bff(\bfa) + D\bff(\bfa) \bfh =\bfb + D\bff(\bfa) \bfh $$ for $\bfh$ small. (But, again, the theorem does not tell us how small $\bfh$ has to be for this approximation to be a good one.)

some special cases of the implicit function theorem

Below are several specific instances of the Implicit Function Theorem. For simplicity we will focus on part (i) of the theorem and omit part (ii). In every case, however, part (ii) implies that the implicitly-defined function is of class $C^1$, and that its derivatives may be computed by implicit differentaition.

As you can see, we have just cut and pasted the same block of text, with small changes from one iteration to the next. There is nothing new going on here. But.... one reason to do this is to emphasize that in practice, we are rarely given a function written, as above, as a function of variables $\bfx\in \R^n$ and $\bfy \in \R^k$. Instead, normally we are given a function of variables $(x,y)\in \R^2$, or $(x,y,z)\in \R^3$, or $(x_1,\ldots, x_m)\in \R^m$, and some mental translation is sometimes needed to apply the theorem. This typically involves deciding which variables play the roles of $\bfx$ and of $\bfy$ in the theorem as stated above.

$n=k=1$

Assume that $F$ is a scalar function of class $C^1$ defined for all $(x,y)$ in an open set $U\subset \R^2$.

If $F(a,b) = 0$ and $\partial_y F(a,b)\ne 0$, then the equation $$ F(x,y)= 0 $$ implicitly determines $y$ as a $C^1$ function of $x$, i.e. $y=f(x)$, for $x$ near $a$. Moreover, $f(a)= b$.
If $F(a,b) = 0$ and $\partial_x F(a,b)\ne 0$, then the equation $$ F(x,y)= 0 $$ implicitly determines $x$ as a $C^1$ function of $y$, i.e., $x=f(y)$, for $y$ near $b$. Moreover, $f(b)= a$. (Here $x$ plays the role of $\bfy$ and $y$ of $\bfx$ in the general version of the theorem.)

$n=2, k=1$.

Assume that $F$ is a scalar function of class $C^1$ defined for all $(x,y,z)$ in an open set $U\subset \R^3$.

If $F(a,b,c) = 0$ and $\partial_z F(a,b,c)\ne 0$, then the equation $$ F(x,y,z)= 0 $$ implicitly determines $z$ as a $C^1$ function of $(x,y)$, i.e. $z=f(x,y)$, for $(x,y)$ near $(a,b)$. Moreover, $f(a,b)= c$.
If $F(a,b,c) = 0$ and $\partial_y F(a,b,c)\ne 0$, then the equation $$ F(x,y,z)= 0 $$ implicitly determines $y$ as a $C^1$ function of $(x,z)$, i.e. $y=f(x,z)$, for $(x,z)$ near $(a,c)$. Moreover, $f(a,c)= b$.
If $F(a,b,c) = 0$ and $\partial_x F(a,b,c)\ne 0$, then the equation $$ F(x,y,z)= 0 $$ implicitly determines $x$ as a $C^1$ function of $(y,z)$, i.e. $x=f(y,z)$, for $(y,z)$ near $(b,c)$. Moreover, $f(b,c)= a$.

$n=1, k=2$.

Assume that $\bfF = (F_1, F_2)$ is function $U\to \R^2$ of class $C^1$, defined for all $(x,y,z)$ in an open set $U\subset \R^3$.

If $\bfF(a,b,c) = \bf 0$ and $\det \left(\begin{array}{cc} \partial_y F_1 &\partial_z F_1\\ \partial_y F_2 &\partial_z F_2 \end{array} \right) \ne 0$, then the equation $$ \bfF(x,y,z)= \bf0, \qquad\mbox{ or equivalently } \qquad \begin{array}{c} F_1(x,y,z)= 0\\ F_2(x,y,z)= 0 \end{array} $$ implicitly determines $(y,z)$ as a $C^1$ function of $x$, i.e. $(y,z) = \bff(x)$, for $x$ near $a$. Moreover, $\bff(a)= (b,c)$.
If $\bfF(a,b,c) = \bf 0$ and $\det \left(\begin{array}{cc} \partial_x F_1 &\partial_z F_1\\ \partial_x F_2 &\partial_z F_2 \end{array} \right) \ne 0$, then the equation $$ \bfF(x,y,z)= \bf0, \qquad\mbox{ or equivalently } \qquad \begin{array}{c} F_1(x,y,z)= 0\\ F_2(x,y,z)= 0 \end{array} $$ implicitly determines $(x,z)$ as a $C^1$ function of $y$, i.e. $(x,z) = \bff(y)$, for $y$ near $b$. Moreover, $\bff(b)= (a,c)$.
There is one other case that we omit, to save bits in our file. As an exercise, write it out.

derivatives of implicitly defined functions

Whenever the conditions of the Implicit Function Theorem are satisfied, and the theorem guarantees the existence of a function $\bff:B(r_0, \bfa)\to B(r_1,\bfb)\subset \R^k$ such that \begin{equation}\label{ift.repeat} \bfF(\bfx, \bff(\bfx)) = \bf0, \end{equation} (among other properties), the Theorem also tell us how to compute derivatives of $\bff$. What it tells us is: the partial derivatives of $\bff$ may be determined by differentiating the identity \eqref{ift.repeat} and solving to find the partial derivatives of $\bff$.

An example

To understand what this means, we first consider a concrete example. Consider the equation $$ F(x,y,z) := xy+ xz \ln(yz) =1. $$ Note that $(1,1,1)$ is a solution. We will answer the questions: Does the equation implicitly determine $z$ as a function $f(x,y)$ for $(x,y)$ near $(1,1)$, with $f(1,1) = 1$? If so, find a formula for $\partial_x f(x,y)$, and evaluate it at $(x,y) = (1,1)$. (For simplicity, we will omit the computation of $\partial _y f$, but the procedure is exactly the same as for the $x$ derivative.)

Solution. First, $$ \partial_z F = x\ln(yz)+x $$ and at $(x,y,z) = (1,1,1)$, this equals $1$. So the Implicit Function Theorem guarantees that there is a function $f(x,y)$, defined for $(x,y)$ near $(1,1)$, such that $$ F(x,y,z)= 1\mbox{ when }z = f(x,y). $$

Next we will find $\partial_x f$. We start by recopying the equation that defines $z$ as a function of $(x,y)$: $$ xy+ x z \ln(yz) = 1 %\qquad\mbox{ when }z = f(x,y). $$ when $z=f(x,y)$. Now we differentiate both sides with respect to $x$. Clearly the derivative of the right-hand side is $0$. Since $z$ is a function of $(x,y)$, we have to use the chain rule for the left-hand side. It looks nicer to write $\frac{\partial z}{\partial x}$ instead of $\partial_x f$, so that is what we will do. We get $$ y + z\ln(yz) + x \frac{\partial z}{\partial x}\ln(yz) + x z \frac{y\ \partial z/\partial x}{yz} = 0, $$ so after rearranging, $$ y+ z\ln(yz) +\left[ x\ln(yz) + x \right] \frac{\partial z}{\partial x} = 0. $$ Evaluating at $(x,y,z) = (1,1,1)$ and solving for $\frac{\partial z}{\partial x}$, we get $$ \partial_x f(1,1) = \frac {\partial z}{\partial x}(1,1) = -1. $$

In principle we can also solve for $\partial z/\partial x$ at points other than $(x,y,z) = (1,1,1)$ to get the formula $$ \frac{\partial z}{\partial x} = -\frac{y+z\ln(yz)}{x+x\ln (yz)} $$ But this is of limited use, since we only know the value of $z(x,y)$ at the point $(x,y,z)= (1,1,1)$ where we started.

It is possible to find $\partial_y f = \frac{\partial z}{\partial y}$ at $(x,y)=(1,1)$ by similar computations.

A more general computation

Now let's consider a general $F:\R^3\to \R$ of class $C^1$. In what follows it will be convenient to use the notation $\partial_1$ etc, instead of $\partial_x$ or $\frac{\partial}{\partial x}$ etc. (Similarly $\partial_2$ instead of $\partial_y$ etc...).

Here we are solving exactly the same problem as above, except that we are considering a general function $F$ of $3$ variables rather than a specifi example.

Assume that at a point $(a,b,c)\in \R^3$, $$ F(a,b,c) = 0, \qquad \partial_3 F(a,b,c)\ne 0. $$ Then the Implicit Function Theorem guarantees that there exists an $T\subset \R^2$ containing $(a,b)$, and a function $f:T\to \R$, such that $$ f(a,b)=c, \qquad F(x,y,z)=0\mbox{ for }(x,y)\in T\mbox{ and }z = f(x,y). $$ Now let us compute $\partial_1 f$. We differentiate both sides of the identity $F(x,y,f(x,y))=0$ with respect to $x$, to obtain $$ \partial_1 F(x,y,f(x,y)) + \partial_3 F(x,y,f(x,y)) \partial_1 f(x,y) = 0. $$ Substituting $(x,y) = (a,b)$ and $f(a,b) = c$ and rearranging, we get $$ \partial_1f(a,b) = - \frac{\partial_1 F(a,b,c)}{ \partial_3 F(a,b,c)}. $$ Note that this makes sense, because in order to guarantee the existence of $f(x,y)$, we already had to know that $\partial_3 F(a,b,c)\ne 0$.

If we substitute in $F(x,y,z) = xy+xz\ln(yz)-1$ and $(a,b,c) = (1,1,1)$, this would yield exactly the formula for $\frac{\partial z}{\partial x} = \partial_1 f$ that we found before.

Remark 1. Note that in order to solve for $\partial_x f$ or $\partial_y f$, we needed $\partial_zF\ne0$. This is exactly the hypothesis of the implcit function theorem i.e. the main condition that, according to the theorem, guarantees that the equation $F(x,y,z)=0$ implicitly determines $z$ as a function of $(x,y)$. As we will see beow, this is true in general.

A harder example.

Consider the system of equations \begin{align} F_1(x,y,u,v) := xye^u + \sin(v-u) &= 0\\ F_2(x,y,u,v) :=(x+1)(y+2)(u+3)(v+4) - 24 &=0 \end{align} Note that $(0,0,0,0)$ is a solution.

We will answer the following questions:

Does the system of equations implicitly determine $(u,v)$ as a function of $(x,y)$, i.e. $(u,v) = \bff(x,y)$ for $(x,y)$ near $(0,0)$, and with $f(0,0) = (0,0)$?

If so, find a formula for $\partial_x \bff(x,y)$ at $(x,y) = (0,0)$.

Solution. First, let $\bfF = \binom{F_1}{F_2}$. Then we consider the matrix $$ \left(\begin{array}{ll} \partial_u F_1&\partial_v F_1\\ \partial_u F_2&\partial_v F_2 \end{array} \right) \ = \ \left(\begin{array}{cc} xye^u - \cos(v-u)&\cos(v-u)\\ (x+1)(y+2)(v+4)&(x+1)(y+2)(u+3) \end{array} \right) $$ At $(x,y,u,v) = (0,0,0,0)$ this becomes \begin{equation}\label{matrix} \left(\begin{array}{rr} -1&1\\8&6 \end{array} \right). \end{equation} This matrix is invertible, so the theorem guarantees that the equations implicitly determine $(u,v)$ as function of $(x,y)$.

Next we find $\partial_x \bff = \binom{\partial_x f_1}{\partial_x f_2}$, where $\binom uv = \bff(x,y) = \binom{f_1(x,y)}{f_2(x,y)}$ is the implicitly defined function.

We start with the equations \begin{align} xye^u + \sin(v-u) &= 0\\ (x+1)(y+2)(u+3)(v+4) - 24 &=0. \end{align} We next implicitly differentiate everything with respect to $x$, taking care to remember that we are considering $x,y$ as independent variables and $u,v$ as dependent variables, so $\frac{\partial y}{\partial x}= 0$. Then after gathering terms we get \begin{align} ye^u + \left( xy e^u - \cos(v-u)\right) \frac{\partial u}{\partial x} + \cos(v-u)\frac{\partial v}{\partial x} &= 0\\ (y+2)(u+3)(v+4) +(x+1)(y+2)(v+4)\frac{\partial u}{\partial x} +(x+1)(y+2)(u+3)\frac{\partial v}{\partial x} &=0. \end{align} At $(x,y,u,v) = (0,0,0,0)$ this reduces to \begin{align} \left(\begin{array}{rr} -1&1\\ 8&6 \end{array} \right)\binom {\frac{\partial u}{\partial x}} {\frac{\partial v}{\partial x}} = \binom 0{-24}. \label{concrete}\end{align} This can be solved to find that $$ \binom{\frac{\partial u}{\partial x}} {\frac{\partial v}{\partial x}} = \frac{1}{-14} \left( \begin{array}{rr} 6&-1 \\ -8&-1 \end{array} \right)\binom0 {-24} = -\binom{12/7}{12/7}. $$ In solving this, we have used the forumla for the inverse of a $2\times 2$ matrix: $$ \left( \begin{array}{cc} a&b \\ c&d \end{array} \right)^{-1} \ = \ \frac {1}{ad-bc} \left(\begin{array}{rr} d&-b \\ -c&a \end{array}\right) , $$ which we have committed to memory.

A more general computation

Now let's consider $F:\R^4\to \R^2$ of class $C^1$. Assume that $\bfa, \bfb\in \R^2$ are points such that $\bfF(\bfa,\bfb) = \bf0$ and $D_\bfy \bfF(\bfa,\bfb)$ is invertible. Then the Implicit Function Theorem guarantees that the eqution $\bfF(\bfx, \bfy) = \bf0$ implicitly defines a function $\bfy = \bff(\bfx)$ for $\bfx$ near $\bfa$, such that $\bfF(\bfx, \bff(\bfx)) = \bf0$.

We will now compute derivatives of $\bff$. The procedure is exactly the same as above, except that we are now considering a general $\bfF:\R^4\to \R^2$ rather than a specific example.

We will write the components of $\bfF$ as $F_1$ and $F_2$. Then the equation $\bfF(\bfx, \bff(\bfx))$ that characterizes $\bff$ can be written as $$ \begin{array}{cc} F_1(\bfx, \bff(\bfx)) &=0\ \, \\ F_2(\bfx, \bff(\bfx)) &=0 \ . \end{array} $$ where $\bfx= (x_1,x_2)$ and $\bff(\bfx) = (f_1(\bfx), f_2(\bfx))$.
If we differentiate this with respect to $x_1$ and use the chain rule, we can write the result as \begin{align} \partial_1 F_1 +\partial_3 F_1 \partial_1 f_1 + \partial_4 F_1 \partial_1f_2&=0\\ \partial_1 F_2 +\partial_3 F_2 \partial_1 f_1 + \partial_4 F_2 \partial_1f_2&=0 \end{align} (where all derivatives of $F$ are evaluated at $(\bfx, \bff(\bfx))$ ) or equivalently, $$ \left. \left( \begin{array} {cc} \partial_3 F_1 &\partial_4 F_1\\ \partial_3 F_2 &\partial_4 F_2 \end{array} \right) \right|_{(\bfx, \bff(\bfx))} \binom{\partial_1 f_1}{\partial_1 f_2} \ = \ -\left.\binom{\partial_1 F_1}{\partial_1 F_2}
\right|_{(\bfx, \bff(\bfx))} $$ where the notation means (first differentiate then) evaluate at $(\bfx, \bff(\bfx))$. We will use this notation below.

We can write this in more condensed form as $$ \left. D_\bfy \bfF\right|_{(\bfx, \bff(\bfx))} \, \partial_1 \bff(\bfx) = - \left.\partial_1 \bfF\right|_{(\bfx, \bff(\bfx))} $$ Wherever $D_\bfy\bfF$ is invertible, it follows that $$ \partial_1 \bff(\bfx) = - (D_\bfy \bfF )^{-1} \partial_1 \bfF \big|_{(\bfx, \bff(\bfx))}. $$ In particular, this holds when $(\bfx, \bff(\bfx))= (\bfa, \bfb)$, since $D_\bfy\bfF$ is invertible there by assumption.

If we wish, we could do the same computation, but differentiating with respect to $x_2$ instead of $x_1$. The result of this computation and the one above can be assembled into a single matrix equation, which can be written $$ \left.\left( \begin{array} {cc} \partial_3 F_1 &\partial_4 F_1\\ \partial_3 F_2 &\partial_4 F_2 \end{array} \right)\right|_{(\bfx, \bff(\bfx))} \left( \begin{array} {cc} \partial_1 f_1 &\partial_2 f_1\\ \partial_1 f_2 &\partial_2 f_2 \end{array} \right) \ = \ -\left.\left( \begin{array} {cc} \partial_1 F_1 &\partial_2 F_1\\ \partial_1 F_2 &\partial_2 F_2 \end{array} \right)\right|_{(\bfx, \bff(\bfx))} $$ or more concisely, $$ \left.D_\bfy \bfF\right|_{(\bfx, \bff(\bfx))} \ D\bff(\bfx) = - \left.D_\bfx\bfF\right|_{(\bfx, \bff(\bfx))}. $$ As above, we note that the condition that allows us to solve for $D\bff(\bfa)$, that is, the invertibility of $D_\bfy\bfF(\bfa, \bfb)$, is exactly the mai hypothesis of the Implicit Function Theorem.

The general case

For a general $\bfF:U\to \R^k$, where $U$ is an open subset of $\R^{n+k}$ exactly the same considerations as above show that under the conditions of the Implicit Function Theorem, $$ \left.D_\bfy \bfF\right|_{(\bfx, \bff(\bfx))} \ D\bff(\bfx) = - \left.D_\bfx\bfF\right|_{(\bfx, \bff(\bfx))}. $$ Substituting $(\bfx, \bff(\bfx)) = (\bfa,\bfb) $ we get \begin{equation}\label{gFinv} D_\bfy \bfF (\bfa, \bfb) \ D\bff(\bfa) = - D_\bfx \bfF(\bfa, \bfb). \end{equation} The main hypothesis of the theorem --- the invertibility of $D_\bfy \bfF(\bfa, \bfb)$ --- is exactly what we need to solve for $D\bff(\bfa)$ in terms of derivatives of $\bfF$ evaluated at $(\bfa,\bfb)$. This leads to $$ D\bff(\bfa) = - [D_\bfy \bfF (\bfa, \bfb)]^{-1}D_\bfx \bfF(\bfa, \bfb). $$ (A similar formula holds if we evaluate at $(\bfx, \bff(\bfx))$ instead of the particular point $(\bfa, \bfb)$, as long as $D_\bfy\bfF$ remains invertible, but it is of limited use, since we generally don't know what $\bff(\bfx)$ is.)

Why is the theorem true?

We will give several answers to the question: why is the theorem true?

Answer 1. In fact, you yourself have proved large parts of the theorem in Assignments 2 and 3. We will discuss this more and fill in some gaps in Section 3.3.

Some underlying ideas are:

You first proved the Contraction Mapping Principle in $\R^n$. (Assignment 2).
You then used the Contraction Mapping Principle to prove something (in Assignment 3) that turns out to be the core of a theorem called the Inverse Function Theorem (to be discussed in Section 3.3.)
The Implicit Function Theorem can be deduced from the Inverse Function Theorem. (Again, wait for Section 3.3.)

Answer 2. You can also find a different proof of the Implicit Function Theorem in Folland's Advanced Calculus. The proof presented there is probably clearer than ours for $k=1$ but more opaque for $k\ge 2$.

Answer 3. If we don't insist on a complete proof, we can say that the underlying idea of the theorem, as with so much of calculus, is to approximate nonlinear functions by linear functions. We can do this (at least locally) if the nonlinear functions are of class $C^1$.

So: in our situation, we have $\bfF:U\to \R^k$ of class $C^1$, where $U$ is an open subset of $\R^{n+k}$, and we know that $$ \bfF(\bfa, \bfb) = \bf0. $$ We hope to change $\bfa$ to some nearby point $\bfx$, then find $\bfy$ near $\bfb$ such that $\bfF(\bfx, \bfy)= \bf0$.

Let us write $\bfx = \bfa +\bfh$ and $\bfy = \bfb+\bfk$ for $\bfh\in \R^n$ and $\bfk\in \R^k$ small. Then we want to solve \begin{equation}\label{oa} \bfF(\bfa+\bfh, \bfb+\bfk) = \bf0 \end{equation} for $\bfk$ as a function of $\bfh$. In general this is an impossible problem. But it becomes possible if we replace $\bfF$ by its first-order Taylor approximation, which is \begin{align} \left( \begin{array}{c} F_1\\ \vdots \\ F_k \end{array} \right)(\bfa+\bfh,\bfb+\bfk) &\approx P_{(\bfa,\bfb), 1}(\bfh, \bfk) \nonumber \\ &= \left( \begin{array}{ccc} \frac{\partial F_1}{\partial x_1} &\cdots &\frac{\partial F_1}{\partial x_n} \\ \vdots&\ddots&\vdots \\ \frac{\partial F_k}{\partial x_1} &\cdots &\frac{\partial F_k}{\partial x_n}
\end{array} \right)%(\bfx,\bfy) \left( \begin{array}{c} h_1\\ \vdots \\ h_n \end{array} \right)\nonumber %\\ & \qquad\qquad \qquad\qquad\qquad \qquad \ + \ \left( \begin{array}{ccc} \frac{\partial F_1}{\partial y_1} &\cdots &\frac{\partial F_1}{\partial y_k} \\ \vdots&\ddots&\vdots \\ \frac{\partial F_k}{\partial y_1} &\cdots &\frac{\partial F_k}{\partial y_k}
\end{array} \right) %(\bfx,\bfy) \left( \begin{array}{c} k_1\\ \vdots \\ k_k \end{array} \right). \end{align} where, on the right-hand side, partial derivatives of $\bfF$ are evaluated at $(\bfa, \bfb)$. (Note that $\bfF(\bfa,\bfb)$ does not appear on the right-hand side because $\bfF(\bfa,\bfb)= \bf0$.)

Suppose that, instead of considering the equation $\bfF(\bfa+\bfh, \bfb+\bfk) = \bf0$, we try to solve the simpler equation $P_{(\bfa,\bfb), 1}(\bfh, \bfk) = \bf0$ for $\bfk$ as a function of $\bfh$. This is a system of linear equations \begin{equation}\label{la} \left( \begin{array}{ccc} \frac{\partial F_1}{\partial y_1} &\cdots &\frac{\partial F_1}{\partial y_k} \\ \vdots&\ddots&\vdots \\ \frac{\partial F_k}{\partial y_1} &\cdots &\frac{\partial F_k}{\partial y_k}
\end{array} \right) %(\bfx,\bfy) \left( \begin{array}{c} k_1\\ \vdots \\ k_k \end{array} \right) = -\left( \begin{array}{ccc} \frac{\partial F_1}{\partial x_1} &\cdots &\frac{\partial F_1}{\partial x_n} \\ \vdots&\ddots&\vdots \\ \frac{\partial F_k}{\partial x_1} &\cdots &\frac{\partial F_k}{\partial x_n}
\end{array} \right)%(\bfx,\bfy) \left( \begin{array}{c} h_1\\ \vdots \\ h_n \end{array} \right), \end{equation} where $\bfF$ and all its derivatives are evaluated at $(\bfa, \bfb)$. We know that this can be solved for $\bfk$ in terms of $\bfh$ if the matrix on the left-hand side is invertible, i.e. if $\det D_\bfy\bfF(\bfa, \bfb)\ne 0$. This is exactly the main hypothesis of the Implicit Function Theorem.

So we can see that what the theorem says is, roughly speaking: if the equation \eqref{la} arising from linear approximation of $\bfF$ is solvable for $\bfk$ as a function of $\bfh$, then the original equation \eqref{oa} is (in principle) solvable for $\bfk$ as a function of $\bfh$, as long as ${\bfh}$ is small enough.

The $\bfh$ is small enough condition makes sense, because it is needed for the linear approxiation to be a good approximation.

Problems

Basic skills

Some of the problems below involve a fair amount of computation. It is not very important to be able to do this sort of routine computation extremely quickly or accurately. It is however important to know what you have to do to solve such questions.

Let $$F(x,y,z) := x^3+3xyz+z^2y -4y -5z +4,$$ and note that $F(1,1,1)=0$.
- Does the equation $F(x,y,z)= 0$ implicitly determine $z$ as a $C^1$ function of $(x,y)$, near the point $(x,y,z) = (1,1,1)$?
  (This means: does there exist $r>0$ and a $C^1$ function $f:U\to \R$ for some open $U\subset \R^2$, such that $S \cap B(r,(1,1,1)) = \{ (x,y,z) : (x,y)\in U, z = f(x,y)\}$?)
- Does the equation $F(x,y,z)= 0$ implicitly determine $y$ as a $C^1$ function of $(x,z)$, near the point $(x,y,z) = (1,1,1)$?
- Does the equation $F(x,y,z)= 0$ implicitly determine $x$ as a $C^1$ function of $(y,z)$, near the point $(x,y,z) = (1,1,1)$?
Consider the equation $(x^2+y^2+z^4)^{1/2} - \cos y - \cos z = 0$.
- Does this implicitly determine $y$ as a $C^1$ function of $(x,z)$, say $y = f(x,z)$ near $(2,0,0)$? If so, compute $\partial f/\partial x$ and $\partial f/\partial z$ at $(x,z) = (2,0)$.
- Does this implicitly determine $x$ as a $C^1$ function of $(y,z)$, say $x = f(y,z)$ near $(2,0,0)$? If so, compute $\partial f/\partial y$ and $\partial f/\partial z$ at $(y,z) = (0,0)$.
Consider the system of equations $$ \begin{aligned} x^2 y + 3 y + z^3-z&= 8 \\ 2x +2y+\cos(xz) &= 7 \end{aligned} $$ Determine, for any pair of variables $(x,y), (x,z)$, or $(y,z)$, whether it is in principle possible to solve for this system for those two variables as $C^1$ functions of the third variable, near the point $(x,y,z) = (1,2,0)$.

If $(y,z)$ can be written as functions of $x$, say $\binom y z = \bff(x)$ for $x$ close to $1$, then determine $\bff'(1)$. (Equivalently, determine $d z/d x$ and $d y/d x$ at $x=1$.)
Consider the system of equations $$ \begin{aligned} xy^2 + xv - uvz &= 1 \\ ux+yvz + uz &= 3 . \end{aligned} $$ Determine whether it is in principle possible to solve for this system for $(u,v)$ as $C^1$ functions of $(x,y,z)$ near $(x,y,z,u,v) = (1,1,1,1,1)$, say $\binom u v = \bff(x,y,z)$ for $(x,y,z)$ near $(1,1,1)$

If the answer is yes, then write down the equation you would need to solve in order to determine $\partial_1 \bff$ at $\bfx = (1,1,1)$. (Equivalently, to determine $\partial u/\partial x$ and $\partial v/\partial x$. at $(x,y,z) = (1,1,1)$. In fact, solve them if you like -- it's not very hard.
Consider the system of equations $$ \begin{aligned} xy^2 - xz + uy&= 1 \\ yu^2 - z &= 0 \\ ux+yz + u &= 3 . \end{aligned} $$ Determine whether it is in principle possible to solve for this system for $(x,y,z)$ as $C^1$ functions of $u$ near $(x,y,z,u) = (1,1,1,1)$, say $\left( \begin{array}{l}x\\y\\z\end{array}\right) = \bff(u)$ for $u$ near $1$.

If the answer is yes, then write down the equation you would need to solve in order to determine $\bff'(1)$.

The Implicit Function Theorem I

Statement of the Implicit Function Theorem

Why the Implicit Function Theorem is a great theorem

some special cases of the implicit function theorem

$n=k=1$

$n=2, k=1$.

$n=1, k=2$.

derivatives of implicitly defined functions

An example

A more general computation

A harder example.

A more general computation

The general case

Why is the theorem true?

Problems

Basic skills

other questions