2.1 Differentiation of Real-valued Functions

2.1 Differentiation of Real-valued Functions

  1. Differentiability in R
  2. Differentiability in Rn and the gradient
  3. Partial derivatives
  4. Differentiability vs. partial differentiability
  5. Directional derivatives and the meaning of the gradient
  6. Problems

    

Differentiability in R

Suppose that S is an open subset of Rn, and let f be a function f:SR. Our first goal is to define what it means for f to be differentiable. Since our definition should generalize what we know from functions of a single variable, let’s first recall how that goes.

For f:(a,b)R, if x(a,b) and (1)limh0f(x+h)f(x)h exists, then we say that f is differentiable at x, and we say that the above limit is the derivative of f at x. We write f(x) to denote the derivative of f at x.

Note that our domain is an open set. To generalize this to f:SR with SRn, we will need to replace h with a vector, and use the vector length in the denominator. But this won’t be enough. Consider f:R2R defined by f(x,y)=x. This function should be differentiable. Writing h=(h,k), we have f(x+h)f(x)=h, and |h|=h2+k2, but limh0hh2+k2 does not exist, as we have seen by looking at the values along k=mh. Just like completeness, we will need to look at alternative definitions of differentiable functions to find one that generalizes to Rn in a way that makes sense.

If the derivative at x is m, then limh0f(x+h)f(x)mhh=0, and conversely, if this limit is 0 for a real number m, then the derivative at x is m. Thus, an equivalent definition is
If there exists a number m such that (2)limh0f(x+h)f(x)mhh=0, then we say that f is differentiable at x, and that the number m is the derivative of f at x, denoted f(x).

The equivalent definition (2) can be understood as follows: temporarily fix x, treat h as a variable, and view f(x+h)f(x) as a function of h. Then f is differentiable at x if f(x+h)f(x) is approximately mh, with an error that is smaller than linear as h0. When this holds, f(x)=m.

It’s rare that f(x+h)f(x)=mh exactly. Instead, it will be mh+E(h) for some error term E(h), with limh0E(h)h=0. Thus, a third way of defining the derivative is:
If there exists a number m and a function E(h) such that (3)f(x+h)=f(x)+mh+E(h), and limh0E(h)h=0. then we say that f is differentiable at x, and that the number m is the derivative of f at x, denoted f(x).

To see that these definitions are all the same, note that limh0f(x+h)f(x)h=mlimh0f(x+h)f(x)mhh=0f(x+h)f(x)mh=E(h) satisfies limh0E(h)h=0(3) holds.

Differentiability in Rn and the gradient

Suppose that S is an open subset of Rn and consider a function f:SR.

We will define differentiability in a way that generalizes definition (2). The idea is that f is differentiable at a point xS if f can be approximated near x by a linear map RnR, with errors that are smaller than linear near x.

To make this precise, we will suppose that xS is fixed, and we will consider the function hf(x+h)f(x) of a variable hRn.

Remark. The notation ab defines the function f(a)=b.

We want f to be differentiable at xf(x+h)f(x) is approximately a linear function of h, for h near 0.

Recall from linear algebra:
A function :RnRm is called linear if it has the form (4)(x)=Mx, where M is a m×n matrix.

Another way of saying this is: a function :RnRm is linear if (ax+by)=a(x)+b(y) for all a,bR and x,yRn. If a function has the form f(x)=Mx+b, we will say that it is affine. We may also sometimes call it a first-order polynomial or a polynomial of degree 1.

In general, we know from linear algebra that if :RnR is a linear function, then can be written in the form (h)=mh for some vector mRn.

So, combining all these, we are led to the following basic definition:

Suppose that f is a function SR, where S is an open subset of Rn. We say that f is differentiable at a point xS if there exists a vector mRn such that (5)limh0f(x+h)f(x)mh|h|=0. When this holds, we say that the vector m is the gradient of f at x, and denoted f(x).

Note that f(x) is uniquely determined by condition (5). Thus, the gradient f (when it exists) is characterized by the property that limh0f(x+h)f(x)f(x)h|h|=0.

We can also write a definition in the style of (3):

The function f:SR is differentiable at x if there exists a vector m such that (6)f(x+h)=f(x)+mh+E(h), where limh0E(h)|h|=0. When this holds, we define f(x)=m.

These definitions can be understood as follows: temporarily fix x, view h as a variable, and view f(x+h)f(x) as a function of h. Then f is differentiable at x if the linear function mh approximates f(x+h)f(x), with errors that are smaller than linear as h0.

We will soon see that there is a simple computational way of finding out what the gradient of f must be, if it exists. First, let’s do one example the hard way. This will help us to appreciate the theorems that will be proved shortly.

Example 1.

Consider f:R2R defined by f(x,y)=(x1)3(y+2). Is f differentiable at the origin?

Solution To check, let’s write a vector h in the form h=(h,k). Then f((0,0)+(h,k))=f(h,k)=2+(6hk)+(h3k3h2k+3hk+2h36h2)=f(0,0)+(6,1)(h,k)+E(h), where E(h)=h3k3h2k+3hk+2h36h2. We can check that limh0E(h)h2+k2=0. So f is differentiable at the origin and f(0,0)=(6,1).

The next important property of differentiability generalizes a familiar fact about functions of a single variable.

Assume that f:SR, where S is an open subset of Rn, and that xS. If f is differentiable at x, then f is continuous at x.
Let f be differentiable at x. Then according to (6), f(x+h)f(x)=f(x)h+E(h), where E(h)|h|0 as h0. The limit law for multiplication implies that limh0E(h)=limh0|h|E(h)|h|=0, and it is clear that limh0f(x)h=0. So the limit law for addition implies that limh0f(x+h)f(x)=0, which says that f is continuous at x.

Partial derivatives

Recall that we have defined ej to be the unit vector in Rn in the jth coordinate direction. Thus e1=(1,0,,0),e2=(0,1,,0),,en=(0,0,,1).

If f is a function defined on an open subset SRn, then at a point xS, we define fxj(x)=limh0f(x+hej)f(x)h. This is called the jth partial derivative of f, the partial derivative of f in the xj direction, or the partial derivative of f with respect to xj.

To see what it means, let’s consider a function f of 2 variables. In this case we usually write fx and fy, instead of fxj for j=1 or 2. The definition says that fx(x,y)=limh0f(x+h,y)f(x,y)hfy(x,y)=limh0f(x,y+h)f(x,y)h. So for example, fx is computed by “freezing y” – that is, considering it to be a constant – and differentiating with respect to the variable x.

In other words, if we want to compute fx at a point (x,y), we can temporarily define a function g(x)=f(x,y), that is, f with the y variable “frozen.” Then fx(x,y)=g(x). Both sides of the above equality are limits, so it means that the limit on the left-hand side exists if and only if the limit on the right-hand side exists, and when they exist, they are equal. Similarly, fy(x,y)=g(y) for g(y)=f(x,y) with x ``frozen''.

Example 2.

Consider f:R2R defined by f(x,y)=(x1)3(y+2), the same function as in Example 1. What are the partial derivatives of f(x,y)?

Solution According to what we have said, to compute fx, we consider y to be constant, and differentiate with respect to x. Thus fx=3(x1)2(y+2). Similarly, fy=(x1)3. If we are interested in the partial derivative at a particular point, say (x,y)=(0,0), we just substitute to find that (fx,fy)(0,0)=(6,1).

For functions of 3 or more variables, the principles are exactly the same.

Note that partial derivatives at a point x only tell us about the behaviour of f on lines through x parallel to the coordinate axes. For example,

Example 3.

Let f1:R3R be the function defined by f1(x,y,z)={0 if xyz=01 otherwise. Since xyz=0 if and only if at least one component x,y,z equals zero, we can see that f=0 on the union of the xy-plane, the yz-plane, and the xz-plane. It is straightforward to check that all partial derivatives exist at the origin, and in fact that f1x=f1y=f1z=0 at (0,0,0). Similarly, let’s define f2(x,y,z)={0 if xyz=0cos(x2yz)exz otherwise. Although the behaviour of f2 is complicated if all components x,y,z are nonzero, the partial derivatives of f2 at (0,0,0) do not “see” this behaviour, and again, all partial derivatives of f2 at (0,0,0) exist and equal 0.

Notation. It is important to know that partial derivatives are often written in many ways. For example, jf,xjf, and fxj are all alternate ways of writing fxj so that they fit on one line. Similarly, xf, and fx are alternate ways of writing fx, with corresponding notation for fy, fz.

Differentiability vs. partial differentiability

We will now investigate the relationship between differentiability and partial differentiability.

Let f be a function SR, where S is an open subset of Rn. If f is differentiable at a point xS, then fxj exists at x for all j=1,,n, and in addition, f(x)=(fx1,,fxn)(x).

That is, the partial derivatives are the components of the gradient vector.

The converse is not true! It can happen that all partial derivatives fxj exist at a but that f is not differentiable at a. This is the case for the functions f1 and f2 in Example 3. These functions cannot be differentiable at the origin, since differentiability implies continuity (by Theorem 1) and these functions are not continuous at the origin. But as we have noted, all partial derivatives exist at (0,0,0).

Details for Theorem 2

Assume that f is differentiable at xS. Consider any j{1,,n}, and define g(h)=f(x+h)f(x)f(x)h|h| whenever h0 and x+hS. Then (7)limh0g(h)=0 by the definition of differentiability. It follows, see problem 9, that for any j{1,,n}, (8)limh0g(hej)=0.

This says that (9)limh0f(x+hej)f(x)f(x)(hej)|h|=0 It follows that (10)0=limh0f(x+hej)f(x)f(x)(hej)h =limh0f(x+hej)f(x)hf(x)ej

This says that fxj(x) exists and equals f(x)ej, which is what we had to prove. You should try to fill in the details of (7)(8) and $ (10) in the last two problems of this section.

If you need to determine whether a function f is differentiable at a point x, then Theorem 2 can simplify your life. It tells you

  • if any partial derivatives fxj do not exist at x, then f is not differentiable there, and

  • if all partial derivatives exist, then the vector m=(f/x1,,f/xn) is the only possible vector that can work in the definition (5) of differentiability.

Example 4.

Consider the function f:R2R defined by f(x,y)={y3x8yx6+y2 if (x,y)(0,0) 0 if (x,y)=(0,0). Is f differentiable at (0,0)?

SolutionSince f(x,0)=0 for all x and f(0,y)=y for all y, the partial derivatives at (0,0) exist and are fx(0,0)=0,fy(0,0)=1. So if f(0) exists, it must equal (0,1). To check differentiability, we must check whether 0=lim(x,y)(0,0)f(x,y)f(0,0)(0,1)(x,y)x2+y2=lim(x,y)(0,0)f(x,y)yx2+y2. This is problem 13 in Section 1.2. It is not an easy limit, but it would be much harder to determine differentiability if we did not know that the only possible candidate for f(0,0) is the vector m=(0,1).

Theorem 2 can be used to simplify some complicated problems. The next theorem is even better in this respect — it makes it simple to check differentiability of many functions of several variables, by using single variable derivatives and properties of continuous functions.

Suppose f is a function SR for some open SRn. If all partial derivatives of f exist and are continuous at every point of S, then f is differentiable at every point of S.

This theorem motivates the following definition.

A function f:SR is said to be continuously differentiable or of class C1 (or simply C1 for short) if all partial derivatives of f exist and are continuous at every point of S.

Thus by Theorem 3, any C1 function is differentiable everywhere in S.

Details for Theorem 3

For notational simplicity, we will present the proof when n=2. The idea is exactly the same in the general case.

Let x be any point in S. Since S is open, there exists r>0 such that x+hS whenever |h|<r. Below, we will always assume that |h|<r.

Consider a vector h=(h,k). We start by writing (11)f(x+h)f(x)=f(x+h,y+k)f(x,y)=[f(x+h,y+k)f(x+h,y)]+[f(x+h,y)f(x,y)] Let’s temporarily write g(x)=f(x,y). Then f(x+h,y)f(x,y)=g(x+h)g(x)=hg(x+θ1h) for some θ1(0,1) by the single variable Mean Value Theorem from MAT 137. Rewriting this in terms of partial derivatives of f, it says that f(x+h,y)f(x,y)=hfx(x+θ1h,y) for some θ1(0,1). A very similar argument shows that f(x+h,y+k)f(x+h,y)=kfy(x+h,y+θ2k) for some θ2(0,1). Combining these with (11), we see that f(x+h)=f(x)+(h,k)(fx,fy)(x,y)+E(h,k), where E(h,k)=h(fx(x+θ1h,y)fx(x,y))+k(fy(x+h,y+θ2k)fy(x,y)) Finally, since |h||h|=h2+k2 and |k||h|, it follows that

limh0E(h)|h|limh0|fx(x+θ1h,y)fx(x,y)|+limh0|fy(x+h,y+θ2k)fy(x,y)|. The right-hand side equals zero by an (ε,δ) argument, using our assumption that the partial derivatives are continuous. Hence limh0E(h)|h|=0, which proves the differentiability of f at x. Since x was an arbitrary point of S, this completes the proof.

Example 5.

Let f(x,y)=(x1)3(y+2). At which points of R2 is f differentiable?

SolutionIn Example 1, we proved that f is differentiable at (0,0), by using the definition of differentiability. That was a moderate amount of work, and it only told us about the point (0,0). Now let’s use Theorem 3 instead. We have already computed fx=3(x1)2(y+2)fy=(x1)3. These are both continuous everywhere on R2, so Theorem 3 implies that f is differentiable everywhere in R2, and that f(x,y)=(3(x1)2(y+2),(x1)3).

Example 6.

Let f:RnR be a polynomial of total degree d. At which points of Rn is f differentiable?

SolutionTo answer this, note that if we “freeze” all variables except xj, then what is left is a polynomial function of xj (whose coefficients are polynomials involving all the other variables). When we differentiate this, we get a polynomial function of xj of lower degree. When we remember that the coefficients of this polynomial are themselves polynomials involving the other variables, we see that fxj exists, and is a polynomialof degree d1. To see how this works in practice, consider a concrete example, such as Example 5 above. Since this is true for every j, and since polynomials are continuous in all of Rn, Theorem 3 implies that polynomials are differentiable everywhere in Rn.

Contrast this with the example using a naive, incorrect definition for differentiable. The correct definition of differentiable functions eventually shows that polynomials are differentiable, and leads us towards other concepts that we might find useful, like C1. The incorrect naive definition leads to f(x,y)=x not being differentiable. Although it looks more complicated, the correct version does two important things that we look for from mathematical definitions: it includes the functions that we intuitively believe it should, and it leads us to new interesting properties.

Directional derivatives and the meaning of the gradient

A direction in Rn is naturally represented by a unit vector. In general a vector has a direction and a magnitude; if we are only interested in directions, we can just consider vectors with magnitude equal to 1, i.e. unit vectors.

Thus, given a unit vector u and a point xRn, the point x+hu is the point reached by starting at x and traveling a distance h in the direction u. So f(x+hu)f(x) represents the change in f if we start at x and move a distance h in the direction u. This motivates the following definition:

If uRn is a unit vector, then we define the directional derivative of f at x in the direction u to be (12)uf(x)=limh0f(x+hu)f(x)h, whenever the limit exists.

Based on our knowledge of first-year calculus, we can see that uf(x) represents the instantaneous rate of change of f as we move in the direction u through the point x.

By comparing the definitions of directional derivative and partial derivative, we see that for any j{1,,n}, fxj=ejf where ej denotes the unit vector in the jth coordinate direction.

If f is differentiable at a point x, then uf(x) exists for every unit vector u, and moreover uf(x)=uf(x).

The proof of this is almost exactly like the proof of Theorem 2. This is not surprising, since partial derivatives are just a special case of directional derivatives.

The proof does not require u to be a unit vector; this is only needed for the interpretation of uf as a directional derivative. For any vector v, whether or not it is a unit vector, it is generally true that if f is differentiable at a point x, then (13)limh0f(x+hv)f(x)h=vf(x). The proof again is essentially the same as that of Theorem 2. It is left as an exercise.

The converse of Theorem 4 is not true. One can find functions f such that at some point x, the directional derivative uf(x) exists for every unit vector u, but f is not differentiable at x. See the exercises below for some examples.

Example 7.

Assume that S is an open subset of Rn and that f:SR is differentiable. At a point xS, determine the direction u in which f is increasing most rapidly, in the sense that uf(x)=max{uf(x):u a unit vector}.

SolutionBy Theorem 4 and basic properties of the dot product, we know that for any unit vector u, (14)uf(x)=uf(x)=|u||f(x)|cosθ=|f(x)|cosθ, where θ is the angle between f(x) and u.

We consider two cases:

Case 1. If f(x)=0 then uf(x)=0 for all u, so every unit vector maximizes (and minimizes) uf(x).

Case 2. If f(x)0, then according to (14) we have to choose u so that cosθ is as large as possible. This happens when cosθ=1, that is, when u is the unit vector pointing in the same direction as f(x). That is, the directional derivative is maximized in the direction (15)u=f(x)|f(x)|.

This example tells us what the gradient of a real-valued function means. If you remember only one thing from MAT 237, it should be this:

If it is not zero, f(x) points in the direction where f has the greatest increase.

This is the principle that allows machine learning by gradient descent, determines seam carving for image resizing, and will play an important role in the rest of this course.

Example 8.

Let f(x,y,z)=12x2+14y4+16z6. Find the direction in which f has the greatest increase at the point (1,1,1).

SolutionTo answer this, we use formula (15). We compute f(x,y,z)=(x,y3,z5)f(1,1,1)=(1,1,1)13(1,1,1) is the direction of greatest increase.

Example 9.

Let f(x,y,z)=xyz2. Find the direction in which f is decreasing most rapidly at the point (1,1,1).

Solution If f has the greatest decrease in the direction u, then f is increasing most rapidly in the direction u. By linearity of the dot product and gradient, u(f)=u(f), so f is decreasing most rapidly in the direction of f.

Thus for f(x,y,z)=xyz2, we have f(x,y,z)=(yz2,xz2,2xyz)f(1,1,1)=(1,1,2)16(1,1,2) is the direction of fastest decrease.

Problems

Basic

  1. Determine all points where a function f is differentiable, and determine f at those points.

Problems like this are normally solved by using Theorem 3 and properties of continuous functions which allow us to recognize partial derivatives as continuous. Examples might be

  • f(x,y)=xycos(y22x) for (x,y)R2.
  • f(x,y,z)=x2ey/z for (x,y,z)R3 such that z0.
  • f(x,y,z)=|(x,y,z)|=x2+y2+z2 for (x,y,z)R3.
  • f(x,y)=(sin2x+sin2y)1/2 for (x,y)R2.
  1. Given a function f, find the direction in which f is increasing/decreasing most rapidly at the point x. For example, consider any of the functions in the above question, at the point (1,1) or (1,1,1), depending on the dimension.

  2. If f is a complicated function like Example 4, determining whether f is differentiable at a given point using the defintion rather than using Theorem 3 is difficult. But determining the directional derivatives at a point using their definition is not. For example

    • Let f(x,y)={x2yx2+y2 if (x,y)(0,0) 0 if (x,y)=(0,0). For the unit vector u=12(1,1), determine uf(0).
    • Let f(x,y,z)={xyzx2+y2+z2 if (x,y,z)(0,0,0) 0 if (x,y,z)=(0,0,0). For the unit vector u=13(1,2,2), determine uf(0).

Advanced

  1. Let f(x,y)={x2yx2+y2 if (x,y)(0,0) 0 if (x,y)=(0,0).
    • For any unit vector u=(u1,u2), determine uf(0).

    • Prove that f is not differentiable at the origin.

      Hint If f were differentiable at the origin, then necessarily uf(0)=u11f(0)+u22f(0).

  1. Define f:R2R by f(x,y)={x3yx4+y2 if (x,y)(0,0) 0 if (x,y)=(0,0)
  • Is f continuous at (0,0)? Determine the answer using material from Section 1.2.

  • Show that for every unit vector u=(u1,u2), the directional derivative uf(0,0) exists and equals zero.

  • Prove that f is not differentiable at (0,0).

    Hint Consider f(t,t2)f(0,0)|(t,t2)| for small values of t. Also, note that if |t|<1, then |t||(t,t2)|2|t|.

  1. Consider the function f:R2R defined by f(x,y)={y3x8yx6+y2 if (x,y)(0,0) 0 if (x,y)=(0,0). Prove that
    • all partial derivatives of f exist everywhere in R2,
    • at least one partial derivative of f is not continuous at (0,0),
    • f is differentiable at (0,0).
      Hint See Example 4.
  2. A function f:RnR is said to be homogeneous of degree d if f(λx)=λdf(x) for every nonzero xRn and every λ>0. The same definition holds if the domain of f is Rn{0}.

Prove that if f is differentiable away from the origin and homogeneous of degree d, then for every unit vector u, the directional derivative uf is homogeneous of degree d1.

In particular this imples that all partial derivatives are homogeneous of degree d1, since partial derivatives are a special case of directional derivatives.

HintTo get started, note that for any x0 and λ>0, uf(λx)=limh0f(λx+hu)f(λx)h=limh0f(λ(x+hλu))f(λx)h.

  1. Give a detailed proof of (13), and hence of Theorem 4 (by small modifications of the proof of Theorem 2.)

  2. Suppose that S is an open subset of Rn that contains the origin, and that g is a function SR. Prove that if limx0g(x)=L, then limh0g(hu)=L for every unit vector u.

This was used in the proof of Theorem 2. It is good practice to write out the proof. You should be able to supply all relevant definitions from memory. This would be considered an easy proof question for a test.

  1. Suppose that a<0<b, and that g:(a,b)R. Prove that limh0g(h)|h|=0limh0g(h)h=0. This was also used in the proof of Theorem 2.

    

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Canada License.