3.1 The Implicit Function Theorem

\(\newcommand{\R}{\mathbb R }\)

3.1 The Implicit Function Theorem

The Implicit Function Theorem

Derivatives of implicitly defined functions

Why is the theorem true?

Problems

\(\Leftarrow\) \(\Uparrow\) \(\Rightarrow\)

The Implicit Function Theorem

The Implicit Function Theorem addresses a question that has two versions:

the analytic version — given a solution to a system of equations, are there other solutions nearby?
the geometric version — what does the set of all solutions look like near a given solution?

The theorem considers a \(C^1\) function \(\mathbf F:S\to \R^k\), where \(S\) is an open subset of \(\R^{n+k}\).

In this situation, we will write points in \(\R^{n+k}\) in the form \((\mathbf x,\mathbf y)\), where \(\mathbf x = (x_1, \ldots, x_n)\in \R^n\) and \(\mathbf y = (y_1,\ldots, y_k)\in \R^k\).

It is also convenient to write \[\begin{align} D_\mathbf x \mathbf F &=\text{ matrix of partial derivatives with respect to }x_1, \ldots, x_n\nonumber \\\ &= \left( \begin{array}{ccc} \frac{\partial F_1}{\partial x_1} &\cdots &\frac{\partial F_1}{\partial x_n} \\\ \vdots&\ddots&\vdots \\\ \frac{\partial F_k}{\partial x_1} &\cdots &\frac{\partial F_k}{\partial x_n} \end{array} \right)\nonumber \end{align}\] and \[\begin{align} D_\mathbf y\mathbf F &=\text{ matrix of partial derivatives with respect to }y_1, \ldots, y_k \nonumber \\\ &=\left( \begin{array}{ccc} \frac{\partial F_1}{\partial y_1} &\cdots &\frac{\partial F_1}{\partial y_k} \\\ \vdots&\ddots&\vdots \\\ \frac{\partial F_k}{\partial y_1} &\cdots &\frac{\partial F_k}{\partial y_k} \nonumber \end{array} \right) \end{align}\]

Suppose that \(S\) is an open subset of \(\R^{n+k}\) and that \(\mathbf F:S\to \R^k\) is a function of class \(C^1\). Suppose also that \((\mathbf a, \mathbf b)\) is a point in \(S\) such that \(\mathbf F(\mathbf a, \mathbf b) = {\bf 0}\) and \(\det D_\mathbf y \mathbf F(\mathbf a, \mathbf b) \ne 0.\) Then

There exist \(r_0,r_1>0\) such that for every \(\mathbf x\in B(\mathbf a; r_0)\), there exists a unique \(\mathbf y\in B(\mathbf b;r_1)\) with \[\begin{equation}\label{ImFT.eq1} \mathbf F(\mathbf x, \mathbf y) = \bf0. \end{equation}\]
The function \(\mathbf f:B(\mathbf a; r_0)\to B(\mathbf b;r_1)\subseteq \R^k\) above is of class \(C^1\), and its derivatives can be computed by differentiating \[ \mathbf F(\mathbf x, \mathbf f(\mathbf x)) = \bf 0 \] and solving to find the partial derivatives of \(\mathbf f\).

In other words, the equation \(\eqref{ImFT.eq1}\) implicitly defines a function \(\mathbf y = \mathbf f(\mathbf x)\) for \(\mathbf x\in \R^n\) near \(\mathbf a\), with \(\mathbf y = \mathbf f(\mathbf x)\) close to \(\mathbf b\). Note in particular that \(\mathbf f(\mathbf a) = \mathbf b\).

The second part of the theorem is discussed in greater detail below.

A general formula for \(D\mathbf f\) is given in \(\eqref{gFinv}\) below, but it may not be comprehensible without first looking at the concrete examples that precede it.

The analytic content of the theorem is this: suppose we want to solve the equation \(\mathbf F(\mathbf x, \mathbf y)= {\bf 0}\) for \(\mathbf y\) as a function of \(\mathbf x\), say \(\mathbf y = \mathbf f(\mathbf x)\). If we have a solution \(\mathbf b = \mathbf f(\mathbf a)\), then in principle it is possible to solve for \(\mathbf x\) near \(\mathbf a\), if the crucial hypothesis \(\det D_\mathbf y \mathbf F(\mathbf a,\mathbf b)\ne0\) holds. Thus it is a theorem about the possibility of solving a system of nonlinear equations.

The geometric content of the theorem is discussed in more detail in Section 3.2.

Why the Implicit Function Theorem is a great theorem

In general, if someone gives you a system of \(k\) nonlinear equations in \(k\) unknowns, it is not just impossible to solve; it is often impossible to determine whether it has any solutions.

This is in stark contrast to a system of \(k\) linear equations in \(k\) unknowns, for which there are we completely understand when an equation is solvable and how to find a solution, when it exists.

The Implicit Function Theorem allows us to (partly) reduce impossible questions about systems of nonlinear equations to straightforward questions about systems of linear equations. This is great!

The theorem is great, but it is not miraculous, so it has some limitations. These include

In order to get information about the equation \[ \mathbf F(\mathbf x, \mathbf y)=\bf0, \] (which we can think of as a system of \(k\) equation for \(\mathbf y= (y_1,\ldots, y_k)\), with coefficients etc that depend on \(\mathbf x\)) we have to start by knowing one solution \(\mathbf y = \mathbf b\) for some specific \(\mathbf x = \mathbf a\).
The equation only tells us about solvability of the system for values of \(\mathbf x\) close to \(\mathbf a\), with \(\mathbf y\) close to \(\mathbf b\). That is, it only gives local information near the point \((\mathbf a, \mathbf b)\). Even worse, it does not tell us how close we need to be for the conclusions of the theorem to hold.
The theorem does not give a formula for \(\mathbf y = \mathbf f(\mathbf x)\) solving \(\mathbf F(\mathbf x, \mathbf y)= 0\). It does however, tell us how to compute \(D\mathbf f(\mathbf a)\), and this allows us to approximate the solution \[ \mathbf f(\mathbf a+\mathbf h) \approx \mathbf f(\mathbf a) + D\mathbf f(\mathbf a) \mathbf h =\mathbf b + D\mathbf f(\mathbf a) \mathbf h \] for \(\mathbf h\) small. But, again, the theorem does not tell us how small \(\mathbf h\) has to be for this approximation to be a good one.

Some special cases of the implicit function theorem

Below are several specific instances of the Implicit Function Theorem. For simplicity we will focus on part (i) of the theorem and omit part (ii). In every case, however, part (ii) implies that the implicitly-defined function is of class \(C^1\), and that its derivatives may be computed by implicit differentaition.

\(n=k=1\)

Suppose that \(F\) is a real-valued \(C^1\) function defined for all \((x,y)\) in an open set \(U\subseteq \R^2\).

If \(F(a,b) = 0\) and \(\partial_y F(a,b)\ne 0\), then the equation \[ F(x,y)= 0 \] implicitly determines \(y\) as a \(C^1\) function of \(x\), i.e. \(y=f(x)\), for \(x\) near \(a\). Moreover, \(f(a)= b\).
If \(F(a,b) = 0\) and \(\partial_x F(a,b)\ne 0\), then the equation \[ F(x,y)= 0 \] implicitly determines \(x\) as a \(C^1\) function of \(y\), i.e., \(x=f(y)\), for \(y\) near \(b\). Moreover, \(f(b)= a\).)

\(n=2, k=1\)

Suppose that \(F\) is a scalar function of class \(C^1\) defined for all \((x,y,z)\) in an open set \(U\subseteq \R^3\).

If \(F(a,b,c) = 0\) and \(\partial_z F(a,b,c)\ne 0\), then the equation \[ F(x,y,z)= 0 \] implicitly determines \(z\) as a \(C^1\) function of \((x,y)\), i.e. \(z=f(x,y)\), for \((x,y)\) near \((a,b)\). Moreover, \(f(a,b)= c\).
If \(F(a,b,c) = 0\) and \(\partial_y F(a,b,c)\ne 0\), then the equation \[ F(x,y,z)= 0 \] implicitly determines \(y\) as a \(C^1\) function of \((x,z)\), i.e. \(y=f(x,z)\), for \((x,z)\) near \((a,c)\). Moreover, \(f(a,c)= b\).
If \(F(a,b,c) = 0\) and \(\partial_x F(a,b,c)\ne 0\), then the equation \[ F(x,y,z)= 0 \] implicitly determines \(x\) as a \(C^1\) function of \((y,z)\), i.e. \(x=f(y,z)\), for \((y,z)\) near \((b,c)\). Moreover, \(f(b,c)= a\).

\(n=1, k=2\)

Suppose that \(\mathbf F = (F_1, F_2)\) is function \(U\to \R^2\) of class \(C^1\), defined for all \((x,y,z)\) in an open set \(U\subseteq \R^3\).

If \(\mathbf F(a,b,c) = \bf 0\) and \(\det \left(\begin{array}{cc} \partial_y F_1 &\partial_z F_1\\\ \partial_y F_2 &\partial_z F_2 \end{array} \right) \ne 0\), then the equation \[ \mathbf F(x,y,z)= \bf0, \qquad\text{ or equivalently } \qquad \begin{array}{c} F_1(x,y,z)= 0\\\ F_2(x,y,z)= 0 \end{array} \] implicitly determines \((y,z)\) as a \(C^1\) function of \(x\), i.e. \((y,z) = \mathbf f(x)\), for \(x\) near \(a\). Moreover, \(\mathbf f(a)= (b,c)\).
If \(\mathbf F(a,b,c) = \bf 0\) and \(\det \left(\begin{array}{cc} \partial_x F_1 &\partial_z F_1\\\ \partial_x F_2 &\partial_z F_2 \end{array} \right) \ne 0\), then the equation \[ \mathbf F(x,y,z)= \bf0, \qquad\text{ or equivalently } \qquad \begin{array}{c} F_1(x,y,z)= 0\\\ F_2(x,y,z)= 0 \end{array} \] implicitly determines \((x,z)\) as a \(C^1\) function of \(y\), i.e. \((x,z) = \mathbf f(y)\), for \(y\) near \(b\). Moreover, \(\mathbf f(b)= (a,c)\).
There is one other case that we omit, to save bits in our file. As an exercise, write it out.

Derivatives of implicitly defined functions

Whenever the conditions of the Implicit Function Theorem are satisfied, and the theorem guarantees the existence of a function \(\mathbf f:B(\mathbf a;r_0)\to B(\mathbf b;r_1)\subseteq \R^k\) such that \[\begin{equation}\label{ift.repeat} \mathbf F(\mathbf x, \mathbf f(\mathbf x)) = \bf0, \end{equation}\] (among other properties), the Theorem also tell us how to compute derivatives of \(\mathbf f\). The partial derivatives of \(\mathbf f\) may be determined by differentiating the identity \(\eqref{ift.repeat}\) and solving to find the partial derivatives of \(\mathbf f\). This is identical to “implicit differentiation” of single variable calculus in the case \(n=k=1\).

Example 1

To understand what this means, we first consider a concrete example. Consider the equation \[ F(x,y,z) = xy+ xz \ln(yz) =1. \] Note that \((1,1,1)\) is a solution. We will answer the questions: Does the equation implicitly determine \(z\) as a function \(f(x,y)\) for \((x,y)\) near \((1,1)\), with \(f(1,1) = 1\)? If so, find a formula for \(\partial_x f(x,y)\), and evaluate it at \((x,y) = (1,1)\). For simplicity, we will omit the computation of \(\partial _y f\), but the procedure is exactly the same as for the \(x\) derivative.

Solution.

First, \[ \partial_z F = x\ln(yz)+x \] and at \((x,y,z) = (1,1,1)\), this equals \(1\). So the Implicit Function Theorem guarantees that there is a function \(f(x,y)\), defined for \((x,y)\) near \((1,1)\), such that \[ F(x,y,z)= 1\text{ when }z = f(x,y). \]

Next we will find \(\partial_x f\). We start by recopying the equation that defines \(z\) as a function of \((x,y)\): \[ xy+ x z \ln(yz) = 1 %\qquad\text{ when }z = f(x,y). \] when \(z=f(x,y)\). Now we differentiate both sides with respect to \(x\). Clearly the derivative of the right-hand side is \(0\). Since \(z\) is a function of \((x,y)\), we have to use the chain rule for the left-hand side. It looks nicer to write \(\frac{\partial z}{\partial x}\) instead of \(\partial_x f\), so that is what we will do. We get \[ y + z\ln(yz) + x \frac{\partial z}{\partial x}\ln(yz) + x z \frac{y\ \partial z/\partial x}{yz} = 0, \] so after rearranging, \[ y+ z\ln(yz) +\left[ x\ln(yz) + x \right] \frac{\partial z}{\partial x} = 0. \] Evaluating at \((x,y,z) = (1,1,1)\) and solving for \(\frac{\partial z}{\partial x}\), we get \[ \partial_x f(1,1) = \frac {\partial z}{\partial x}(1,1) = -1. \]

In principle we can also solve for \(\partial z/\partial x\) at points other than \((x,y,z) = (1,1,1)\) to get the formula \[ \frac{\partial z}{\partial x} = -\frac{y+z\ln(yz)}{x+x\ln (yz)} \] But this is of limited use, since we only know the value of \(z(x,y)\) at the point \((x,y,z)= (1,1,1)\) where we started.

It is possible to find \(\partial_y f = \frac{\partial z}{\partial y}\) at \((x,y)=(1,1)\) by similar computations.

A more general computation

Now let’s consider a general \(F:\R^3\to \R\) of class \(C^1\). In what follows it will be convenient to use the notation \(\partial_1\) etc, instead of \(\partial_x\) or \(\frac{\partial}{\partial x}\) etc. (Similarly \(\partial_2\) instead of \(\partial_y\) etc…).

Here we are solving exactly the same problem as above, except that we are considering a general function \(F\) of \(3\) variables rather than a specific example.

Suppose that at a point \((a,b,c)\in \R^3\), \[ F(a,b,c) = 0, \qquad \partial_3 F(a,b,c)\ne 0. \] Then the Implicit Function Theorem guarantees that there exists a \(T\subseteq \R^2\) containing \((a,b)\), and a function \(f:T\to \R\), such that \[ f(a,b)=c, \qquad F(x,y,z)=0\text{ for }(x,y)\in T\text{ and }z = f(x,y). \] Now let us compute \(\partial_1 f\). We differentiate both sides of the identity \(F(x,y,f(x,y))=0\) with respect to \(x\), to obtain \[ \partial_1 F(x,y,f(x,y)) + \partial_3 F(x,y,f(x,y)) \partial_1 f(x,y) = 0. \] Substituting \((x,y) = (a,b)\) and \(f(a,b) = c\) and rearranging, we get \[ \partial_1f(a,b) = - \frac{\partial_1 F(a,b,c)}{ \partial_3 F(a,b,c)}. \] Note that this makes sense, because in order to guarantee the existence of \(f(x,y)\), we already had to know that \(\partial_3 F(a,b,c)\ne 0\).

If we substitute in \(F(x,y,z) = xy+xz\ln(yz)-1\) and \((a,b,c) = (1,1,1)\), this would yield exactly the formula for \(\frac{\partial z}{\partial x} = \partial_1 f\) that we found before.

Remark 1. Note that in order to solve for \(\partial_x f\) or \(\partial_y f\), we needed \(\partial_zF\ne0\). This is exactly the hypothesis of the implcit function theorem i.e. the main condition that, according to the theorem, guarantees that the equation \(F(x,y,z)=0\) implicitly determines \(z\) as a function of \((x,y)\). As we will see below, this is true in general.

Example 2

Consider the system of equations \[\begin{align} F_1(x,y,u,v) = xye^u + \sin(v-u) &= 0\\\ F_2(x,y,u,v) =(x+1)(y+2)(u+3)(v+4) - 24 &=0 \end{align}\] Note that \((0,0,0,0)\) is a solution.

Does the system of equations implicitly determine \((u,v)\) as a function of \((x,y)\), i.e. \((u,v) = \mathbf f(x,y)\) for \((x,y)\) near \((0,0)\)? If so, find a formula for \(\partial_x \mathbf f(x,y)\) at \((x,y) = (0,0)\).

First, let \(\mathbf F = \binom{F_1}{F_2}\). Then we consider the matrix \[ \left(\begin{array}{ll} \partial_u F_1&\partial_v F_1\\\ \partial_u F_2&\partial_v F_2 \end{array} \right) \ = \ \left(\begin{array}{cc} xye^u - \cos(v-u)&\cos(v-u)\\\ (x+1)(y+2)(v+4)&(x+1)(y+2)(u+3) \end{array} \right) \] At \((x,y,u,v) = (0,0,0,0)\) this becomes \[\begin{equation}\label{matrix} \left(\begin{array}{rr} -1&1\\ 8&6 \end{array} \right). \end{equation}\] This matrix is invertible, so the theorem guarantees that the equations implicitly determine \((u,v)\) as a function of \((x,y)\).

Next we find \(\partial_x \mathbf f = \binom{\partial_x f_1}{\partial_x f_2}\), where \(\binom uv = \mathbf f(x,y) = \binom{f_1(x,y)}{f_2(x,y)}\) is the implicitly defined function.

We start with the equations \[\begin{align} xye^u + \sin(v-u) &= 0\\\ (x+1)(y+2)(u+3)(v+4) - 24 &=0. \end{align}\] We next implicitly differentiate everything with respect to \(x\), taking care to remember that we are considering \(x,y\) as independent variables and \(u,v\) as dependent variables, so \(\frac{\partial y}{\partial x}= 0\). Then after gathering terms we get \[\begin{align} ye^u + \left( xy e^u - \cos(v-u)\right) \frac{\partial u}{\partial x} + \cos(v-u)\frac{\partial v}{\partial x} &= 0\\\ (y+2)(u+3)(v+4) +(x+1)(y+2)(v+4)\frac{\partial u}{\partial x} +(x+1)(y+2)(u+3)\frac{\partial v}{\partial x} &=0. \end{align}\] At \((x,y,u,v) = (0,0,0,0)\) this reduces to \[\begin{align} \left(\begin{array}{rr} -1&1\\\ 8&6 \end{array} \right)\binom {\frac{\partial u}{\partial x}} {\frac{\partial v}{\partial x}} = \binom 0{-24}. \label{concrete}\end{align}\] This can be solved to find that \[ \binom{\frac{\partial u}{\partial x}} {\frac{\partial v}{\partial x}} = \frac{1}{-14} \left( \begin{array}{rr} 6&-1 \\\ -8&-1 \end{array} \right)\binom0 {-24} = -\binom{12/7}{12/7}. \] In solving this, we have used the forumla for the inverse of a \(2\times 2\) matrix: \[ \left( \begin{array}{cc} a&b \\\ c&d \end{array} \right)^{-1} \ = \ \frac {1}{ad-bc} \left(\begin{array}{rr} d&-b \\\ -c&a \end{array}\right). \]

A more general computation

Now let’s consider \(\mathbf F:\R^4\to \R^2\) of class \(C^1\). Suppose that \(\mathbf a, \mathbf b\in \R^2\) are points such that \(\mathbf F(\mathbf a,\mathbf b) = \bf0\) and \(D_\mathbf y \mathbf F(\mathbf a,\mathbf b)\) is invertible. Then the Implicit Function Theorem guarantees that the eqution \(\mathbf F(\mathbf x, \mathbf y) = \bf0\) implicitly defines a function \(\mathbf y = \mathbf f(\mathbf x)\) for \(\mathbf x\) near \(\mathbf a\), such that \(\mathbf F(\mathbf x, \mathbf f(\mathbf x)) = \bf 0\).

We will now compute derivatives of \(\mathbf f\). The procedure is exactly the same as above, except that we are now considering a general \(\mathbf F:\R^4\to \R^2\) rather than a specific example.

We will write the components of \(\mathbf F\) as \(F_1\) and \(F_2\). Then the equation \(\mathbf F(\mathbf x, \mathbf f(\mathbf x))\) that characterizes \(\mathbf f\) can be written as \[ \begin{array}{cc} F_1(\mathbf x, \mathbf f(\mathbf x)) &=0\ \, \\\ F_2(\mathbf x, \mathbf f(\mathbf x)) &=0 \ . \end{array} \] where \(\mathbf x= (x_1,x_2)\) and \(\mathbf f(\mathbf x) = (f_1(\mathbf x), f_2(\mathbf x))\).
If we differentiate this with respect to \(x_1\) and use the chain rule, we can write the result as \[\begin{align} \partial_1 F_1 +\partial_3 F_1 \partial_1 f_1 + \partial_4 F_1 \partial_1f_2&=0\\\ \partial_1 F_2 +\partial_3 F_2 \partial_1 f_1 + \partial_4 F_2 \partial_1f_2&=0 \end{align}\] (where all derivatives of \(F\) are evaluated at \((\mathbf x, \mathbf f(\mathbf x))\) ) or equivalently, \[ \left. \left( \begin{array} {cc} \partial_3 F_1 &\partial_4 F_1\\\ \partial_3 F_2 &\partial_4 F_2 \end{array} \right) \right|_{(\mathbf x, \mathbf f(\mathbf x))} \binom{\partial_1 f_1}{\partial_1 f_2} \ = \ -\left.\binom{\partial_1 F_1}{\partial_1 F_2} \right|_{(\mathbf x, \mathbf f(\mathbf x))} \] where the notation means “(first differentiate then) evaluate at \((\mathbf x, \mathbf f(\mathbf x))\)”. We will use this notation below.

We can write this in more condensed form as \[ \left. D_\mathbf y \mathbf F\right|_{(\mathbf x, \mathbf f(\mathbf x))} \, \partial_1 \mathbf f(\mathbf x) = - \left.\partial_1 \mathbf F\right|_{(\mathbf x, \mathbf f(\mathbf x))} \] Wherever \(D_\mathbf y\mathbf F\) is invertible, it follows that \[ \partial_1 \mathbf f(\mathbf x) = - (D_\mathbf y \mathbf F )^{-1} \partial_1 \mathbf F \big|_{(\mathbf x, \mathbf f(\mathbf x))}. \] In particular, this holds when \((\mathbf x, \mathbf f(\mathbf x))= (\mathbf a, \mathbf b)\), since \(D_\mathbf y\mathbf F\) is invertible there by assumption.

If we wish, we could do the same computation, but differentiating with respect to \(x_2\) instead of \(x_1\). The result of this computation and the one above can be assembled into a single matrix equation, which can be written \[ \left.\left( \begin{array} {cc} \partial_3 F_1 &\partial_4 F_1\\\ \partial_3 F_2 &\partial_4 F_2 \end{array} \right)\right|_{(\mathbf x, \mathbf f(\mathbf x))} \left( \begin{array} {cc} \partial_1 f_1 &\partial_2 f_1\\\ \partial_1 f_2 &\partial_2 f_2 \end{array} \right) \ = \ -\left.\left( \begin{array} {cc} \partial_1 F_1 &\partial_2 F_1\\\ \partial_1 F_2 &\partial_2 F_2 \end{array} \right)\right|_{(\mathbf x, \mathbf f(\mathbf x))} \] or more concisely, \[ \left.D_\mathbf y \mathbf F\right|_{(\mathbf x, \mathbf f(\mathbf x))} \ D\mathbf f(\mathbf x) = - \left.D_\mathbf x\mathbf F\right|_{(\mathbf x, \mathbf f(\mathbf x))}. \] As above, we note that the condition that allows us to solve for \(D\mathbf f(\mathbf a)\), that is, the invertibility of \(D_\mathbf y\mathbf F(\mathbf a, \mathbf b)\), is exactly the main hypothesis of the Implicit Function Theorem.

The general case

For a general \(\mathbf F:U\to \R^k\), where \(U\) is an open subset of \(\R^{n+k}\) exactly the same considerations as above show that under the conditions of the Implicit Function Theorem,

\[ \left.D_\mathbf y \mathbf F\right|_{(\mathbf x, \mathbf f(\mathbf x))} \ D\mathbf f(\mathbf x) = - \left.D_\mathbf x\mathbf F\right|_{(\mathbf x, \mathbf f(\mathbf x))}. \] Substituting \((\mathbf x, \mathbf f(\mathbf x)) = (\mathbf a,\mathbf b)\) we get

\[\begin{equation}\label{gFinv} D_\mathbf y \mathbf F (\mathbf a, \mathbf b) \ D\mathbf f(\mathbf a) = - D_\mathbf x \mathbf F(\mathbf a, \mathbf b). \end{equation}\] The main hypothesis of the theorem — that \(D_\mathbf y \mathbf F(\mathbf a, \mathbf b)\) is invertible — is exactly what we need to solve for \(D\mathbf f(\mathbf a)\) in terms of derivatives of \(\mathbf F\) evaluated at \((\mathbf a,\mathbf b)\). This leads to \[ D\mathbf f(\mathbf a) = - [D_\mathbf y \mathbf F (\mathbf a, \mathbf b)]^{-1}D_\mathbf x \mathbf F(\mathbf a, \mathbf b). \] A similar formula holds if we evaluate at \((\mathbf x, \mathbf f(\mathbf x))\) instead of the particular point \((\mathbf a, \mathbf b)\), as long as \(D_\mathbf y\mathbf F\) remains invertible, but it is of limited use, since we generally don’t know what \(\mathbf f(\mathbf x)\) is.

Why is the theorem true?

If we don’t insist on a complete proof, we can say that the underlying idea of the theorem, as with so much of calculus, is to approximate nonlinear functions by linear functions. We can do this locally if the nonlinear functions are of class \(C^1\).

In our situation, we have \(\mathbf F:U\to \R^k\) of class \(C^1\), where \(U\) is an open subset of \(\R^{n+k}\), and we know that \[ \mathbf F(\mathbf a, \mathbf b) = \bf0. \] We hope to change \(\mathbf a\) to some nearby point \(\mathbf x\), then find \(\mathbf y\) near \(\mathbf b\) such that \(\mathbf F(\mathbf x, \mathbf y)= \bf0\).

Let us write \(\mathbf x = \mathbf a +\mathbf h\) and \(\mathbf y = \mathbf b+\mathbf k\) for \(\mathbf h\in \R^n\) and \(\mathbf k\in \R^k\) small. Then we want to solve \[\begin{equation}\label{oa} \mathbf F(\mathbf a+\mathbf h, \mathbf b+\mathbf k) = \bf0 \end{equation}\] for \(\mathbf k\) as a function of \(\mathbf h\). In general this is an impossible problem. But it becomes possible if we replace \(\mathbf F\) by its first-order Taylor approximation, which is \[\begin{align} \left( \begin{array}{c} F_1\\\ \vdots \\\ F_k \end{array} \right)(\mathbf a+\mathbf h,\mathbf b+\mathbf k) &\approx P_{(\mathbf a,\mathbf b), 1}(\mathbf h, \mathbf k) \nonumber \\\ &= \left( \begin{array}{ccc} \frac{\partial F_1}{\partial x_1} &\cdots &\frac{\partial F_1}{\partial x_n} \\\ \vdots&\ddots&\vdots \\\ \frac{\partial F_k}{\partial x_1} &\cdots &\frac{\partial F_k}{\partial x_n} \end{array} \right)%(\mathbf x,\mathbf y) \left( \begin{array}{c} h_1\\\ \vdots \\\ h_n \end{array} \right)\nonumber %\\\ & \qquad\qquad \qquad\qquad\qquad \qquad \ + \ \left( \begin{array}{ccc} \frac{\partial F_1}{\partial y_1} &\cdots &\frac{\partial F_1}{\partial y_k} \\\ \vdots&\ddots&\vdots \\\ \frac{\partial F_k}{\partial y_1} &\cdots &\frac{\partial F_k}{\partial y_k} \end{array} \right) %(\mathbf x,\mathbf y) \left( \begin{array}{c} k_1\\\ \vdots \\\ k_k \end{array} \right). \end{align}\] where, on the right-hand side, partial derivatives of \(\mathbf F\) are evaluated at \((\mathbf a, \mathbf b)\). Note that \(\mathbf F(\mathbf a,\mathbf b)\) does not appear on the right-hand side because \(\mathbf F(\mathbf a,\mathbf b)= \bf0\).

Instead of considering the equation \(\mathbf F(\mathbf a+\mathbf h, \mathbf b+\mathbf k) = \bf0\), we try to solve the simpler equation \(P_{(\mathbf a,\mathbf b), 1}(\mathbf h, \mathbf k) = \bf0\) for \(\mathbf k\) as a function of \(\mathbf h\). This is a system of linear equations \[\begin{equation}\label{la} \left( \begin{array}{ccc} \frac{\partial F_1}{\partial y_1} &\cdots &\frac{\partial F_1}{\partial y_k} \\\ \vdots&\ddots&\vdots \\\ \frac{\partial F_k}{\partial y_1} &\cdots &\frac{\partial F_k}{\partial y_k} \end{array} \right) %(\mathbf x,\mathbf y) \left( \begin{array}{c} k_1\\\ \vdots \\\ k_k \end{array} \right) = -\left( \begin{array}{ccc} \frac{\partial F_1}{\partial x_1} &\cdots &\frac{\partial F_1}{\partial x_n} \\\ \vdots&\ddots&\vdots \\\ \frac{\partial F_k}{\partial x_1} &\cdots &\frac{\partial F_k}{\partial x_n} \end{array} \right)%(\mathbf x,\mathbf y) \left( \begin{array}{c} h_1\\\ \vdots \\\ h_n \end{array} \right), \end{equation}\] where \(\mathbf F\) and all its derivatives are evaluated at \((\mathbf a, \mathbf b)\). We know that this can be solved for \(\mathbf k\) in terms of \(\mathbf h\) if the matrix on the left-hand side is invertible, i.e. if \(\det D_\mathbf y\mathbf F(\mathbf a, \mathbf b)\ne 0\). This is why we have the main hypothesis of the Implicit Function Theorem.

So we can see that what the theorem says is, roughly speaking: if we can solve the linear approximation for \(\mathbf k\) as a function of \(\mathbf h\), then we can solve the original equation \(\eqref{oa}\) for \(\mathbf k\) as a function of \(\mathbf h\), as long as \({\mathbf h}\) is small enough.

The “\(\mathbf h\) is small enough” condition is needed for the linear approxiation to be a good approximation.

Problems

Basic

Some of the problems below involve a large amount of computation. It is not very important to be able to do these computations quickly or accurately. What is important is knowing what you have to do to solve such questions - that way you can tell a computer to do it.

Let \[F(x,y,z) = x^3+3xyz+z^2y -4y -5z +4,\] and note that \(F(1,1,1)=0\).
- Does the equation \(F(x,y,z)= 0\) implicitly determine \(z\) as a \(C^1\) function of \((x,y)\), near the point \((x,y,z) = (1,1,1)\)?
  (This means: does there exist \(r>0\) and a \(C^1\) function \(f:U\to \R\) for some open \(U\subseteq \R^2\), such that \(S \cap B(r,(1,1,1)) = \{ (x,y,z) : (x,y)\in U, z = f(x,y)\}\)?)
- Does the equation \(F(x,y,z)= 0\) implicitly determine \(y\) as a \(C^1\) function of \((x,z)\), near the point \((x,y,z) = (1,1,1)\)?
- Does the equation \(F(x,y,z)= 0\) implicitly determine \(x\) as a \(C^1\) function of \((y,z)\), near the point \((x,y,z) = (1,1,1)\)?
Consider the equation \((x^2+y^2+z^4)^{1/2} - \cos y - \cos z = 0\).
- Does this implicitly determine \(y\) as a \(C^1\) function of \((x,z)\), say \(y = f(x,z)\) near \((2,0,0)\)? If so, compute \(\partial f/\partial x\) and \(\partial f/\partial z\) at \((x,z) = (2,0)\).
- Does this implicitly determine \(x\) as a \(C^1\) function of \((y,z)\), say \(x = f(y,z)\) near \((2,0,0)\)? If so, compute \(\partial f/\partial y\) and \(\partial f/\partial z\) at \((y,z) = (0,0)\).
Consider the system of equations \[ \begin{aligned} x^2 y + 3 y + z^3-z&= 8 \\\ 2x +2y+\cos(xz) &= 7 \end{aligned} \] Determine, for any pair of variables \((x,y), (x,z)\), or \((y,z)\), whether it is in principle possible to solve for this system for those two variables as \(C^1\) functions of the third variable, near the point \((x,y,z) = (1,2,0)\).

If \((y,z)\) can be written as functions of \(x\), say \(\binom y z = \mathbf f(x)\) for \(x\) close to \(1\), then determine \(\mathbf f'(1)\). (Equivalently, determine \(d z/d x\) and \(d y/d x\) at \(x=1\).)
Consider the system of equations \[ \begin{aligned} xy^2 + xv - uvz &= 1 \\\ ux+yvz + uz &= 3 . \end{aligned} \] Determine whether it is in principle possible to solve this system for \((u,v)\) as \(C^1\) functions of \((x,y,z)\) near \((x,y,z,u,v) = (1,1,1,1,1)\), say \(\binom u v = \mathbf f(x,y,z)\) for \((x,y,z)\) near \((1,1,1)\)

If the answer is “yes”, then write down the equation you would need to solve in order to determine \(\partial_1 \mathbf f\) at \(\mathbf x = (1,1,1)\). (Equivalently, to determine \(\partial u/\partial x\) and \(\partial v/\partial x\). at \((x,y,z) = (1,1,1)\). In fact, solve them if you like – it’s not very hard.
Consider the system of equations \[ \begin{aligned} xy^2 - xz + uy&= 1 \\\ yu^2 - z &= 0 \\\ ux+yz + u &= 3 . \end{aligned} \] Determine whether it is in principle possible to solve for this system for \((x,y,z)\) as \(C^1\) functions of \(u\) near \((x,y,z,u) = (1,1,1,1)\), say \(\left( \begin{array}{l}x\\y\\z\end{array}\right) = \mathbf f(u)\) for \(u\) near \(1\).

If the answer is “yes”, then write down the equation you would need to solve in order to determine \(\mathbf f'(1)\).

Advanced

What could the Implicit Function Theorem say about the size of \(r_0, r_1\)? The next questions are meant to address this, first by looking at an example, then by a rigorous proof. This is related to Problem Set 4, but now we are trying to improve the choice of Lipschitz constant.

For nonzero real numbers \(a,b,c\), define \(F:\R^2\to \R\) by \(F(x,y) = \frac a2 y^2 +by+cx\).
- Note that \(F(0,0) = 0\). Use the implicit function theorem to answer the (straightforward) question: Is it true that there exist \(r_0>0\) and a \(C^1\) function \(f:(-r_0,r_0)\to \R\) such that \(f(0)=0\) and \(F(x, f(x))= 0\) for \(x\in (-r_0,r_0)\)?
- Use the quadratic formula to answer the same question as above, explicitly finding \(r_0\) and \(f\). In particular, try to find the largest possible \(r_0\) that “works”. How does it depend on \(a, b\) and \(c\)?
Suppose that \(F:\R^2\to \R\) is a \(C^2\) function such that, for some numbers \(a,b,c\), \[ |\partial_x F(0,0)| = c, \qquad |\partial_y F(0,0)| = b>0, \] and \[ \|HF(\mathbf x) \| \le a \text{ for all }\mathbf x\in \R^2, \] where \(HF\) denotes the Hessian matrix of \(F\), and \(\| HF\|\) denotes its matrix norm, familiar from some earlier homework assignments.
(An example is the function from the above problem, if \(b\) is positive.)

Here you will prove that under the above assumptions, the conclusions of the Implicit Function Theorem hold with \[ r_0 = \frac{b^2}{4a(c+2b)}. \] Thus, we can estimate \(r_0\) if we know not only \(\nabla F({\bf 0})\), but also if we have a bound on how large the second derivatives of \(F\) can be.
A crude summary is that in the setting of this problem, \[ \left. \begin{array}{r} \text{larger second derivatives} \\\ \text{smaller }|\partial_y F(0,0)| \\\ \text{larger }|\partial_x F(0,0)| \end{array} \right \} \Rightarrow \text{potentially smaller }r_0. \] Note that this is compatible with the previous exercise.
We will suppose that \(\partial_y F(0,0)= b>0\). (The case \(\partial_y F(0,0)= -b<0\) is very similar.)
- Prove that if \(|(x,y)|\le R\), then \[ |\partial_x F(x,y)| \le c+ aR, \qquad |\partial_y F(x,y)| \le b- aR. \]
- Deduce that if \(|(x,y)| < b/2a\), then \[\begin{equation}\label{quant1} |\partial_x F|(x,y) < \frac{2c+b}b \partial_y F(x,y). \end{equation}\]
- Define \(r_1 = b/4a\) and \(r_0 = \frac{b^2}{4a(c+2b)}\) Note that \[ \text{ if }(x,y) \in (-r_0,r_0)\times (-r_1,r_1), \quad \text{ then }|(x,y)|< \frac r {2a} \] and hence \(\eqref{quant1}\) holds in \((-r_0,r_0)\times (-r_1,r_1)\)
- Suppose that \((x,y)\) and \((x+h, y+k)\) both belong to \((-r_0,r_0)\times (-r_1,r_1)\). Show that \[ \text{ if }k > \frac{2c+b}b|h|, \text{ then }F(x+h, y+k) > F(x,y). \]
- From this one can deduce that for every \(x\in (-r_0,r_0)\) there exists exactly one \(y\in (-r_1,r_1)\) such that \(F(x,y)= 0\), or in other words, that there exists a function \(f:(-r_0.r_0)\to (-r_1,r_1)\) such that \[ \text{ for }(x,y)\in (-r_0.r_0)\times (-r_1,r_1), \quad F(x,y)=0 \text{ if and only if }y=f(x). \]
- Then the Implicit Function Theorem implies that \(f\) is \(C^1\) in \((-r_0,r_0)\), etc.

\(\Leftarrow\) \(\Uparrow\) \(\Rightarrow\)

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Canada License.