Suppose that is an open subset of , and consider a vector-valued function .
The idea of differentiability is essentially the same as in the case of real-valued functions: is differentiable at a point if can be approximated by a linear map near , with errors that are “smaller than linearâ€.
In order to write this down, we recall that it is natural to represent a linear map by a matrix, which acts on (column) vectors by matrix multiplication. Explicitly, a matrix represents the linear map defined by
We also remember from linear algebra that every linear map from to can be represented by a matrix-vector multiplication like this.
Since it is natural to represent linear maps in this way, we will write the derivative at a point of a vector-valued function as a matrix.
Once we have considered this view point, the definition is very much like what we saw in the last section.
Assume that is an open subset of . Given a function , we say that is differentiable at a point if there exists an matrix such that When this holds, we say that is the derivative of at , and we write .
The matrix is called the Jacobian matrix of , named for Carl Jacobi (Prussia/Germany, 1804-1851). He is known for many advances in analysis and number theory, and gave his perspective on solving problems as “man muss immer umkehren†(“One must always invertâ€). We will see later that if is a square matrix, then it is invertible if and only if has a differentiable inverse in a ball . He also popularized the partial derivative notation .
Recall also that for a vector-valued function of a variable , as we know from Theorem 3 in Section 1.2.
The definition can also be written: there exists a matrix such that
We will see soon that at places where is differentiable, there are simple expressions for the components of in terms of partial derivatives of the components of .
The derivative vs. partial derivatives
Recall that our default rule is every vector is a column vector unless explicitly stated otherwise, even if we write it in a way that makes it look like a row vector, such as .
Applying this rule, if we write out all the components of , it becomes where all the components of are smaller than linear as , by which we mean that for every .
By writing out the the th row of , we can see that that is differentiable at if and only if, for every , where Comparing to the definition of differentiability of a real-valued function, we see that it is equivalent to the assertion that the th component of is differentiable at , for every . Moreover, from what we already know about differentiation of real-valued functions, we know that if holds, then it must be the case that By combining this with things we already know about differentiation of scalar functions, we arrive at the following conclusions.
Suppose that is an open subset of . Then a function is differentiable at a point if and only if the component functions are differentiable at for every . Moreover, where all the partial derivatives are evaluated at . Furthermore, if all partial derivatives (for and ) exist and are continuous in , then is differentiable in .
As in the case of real-valued functions, this theorem often provides the easiest way to check differentiability of a vector-valued function: compute all partial derivatives of all components and determine where they exist and where they are continuous. In many cases, the answer to both questions is “everywhere.â€
As with real-valued functions, we say that is of class in (or sometimes just is ) if all partial derivatives of all components of exist and are continuous everywhere in .
Example 1.
Consider the function defined by Note that is not differentiable when , and is not differentiable when . Taking these into account, the matrix of partial derivatives at a point is given by The entries of this matrix are all continuous everywhere in which is an open set, so we conclude that is differentiable everywhere in this set, and that the derivative is given by the above matrix.
Important special cases
The special case
We can view the case of real-valued functions as the special case of -valued functions. Then, according to what we have said, for , we should view as the matrix with entries We can recognize that this is basically the same as the gradient, although now we are insisting that is a row vector, which we did not do before with . The reason this occurs is that is not an ordinary element of , but instead a (representation of a) linear map . From linear algebra, we know this is a matrix or a row vector.
The special case .
Another special case that arises often is the case , when the domain is -dimensional. For , a function has the form Then if is differentiable at a point , its derivative is a matrix, of the form
We call the image of a function a parametrized curve. Here are some geometric interpretations.
If is differentiable and , then it is a vector that is tangent to the parametrized curve at .
Any (nonzero) multiple of is also tangent to the parametrized curve at . In particular, is a unit tangent vector, assuming that .
If we fix and let vary, then the straight line parametrized by is the tangent line to the curve at .
If represents the position of a particle at time , then is the speed of the particle at time , and is the velocity vector of the particle at time . The unit tangent is the direction of motion at time .
These are illustrated by the following example.
Example 2.
Define by We know that for every , is a point whose distance from the origin is , i.e., and that the line from the origin to forms an angle of radians with the positive -axis. The image of this curve in the plane is shown below, for
The components of are differentiable everywhere. Thus is differentiable everywhere, and From the definition of the derivative, if we fix and consider as a function of , then in the sense that the error is smaller than linear. (That is, as .)
This can be seen in the picture below, which shows (in red) the sets for several choices of , together with the same curve as above,
Above, in red, the set for and , along with the curve defined above. From these one can see/understand the following:
Each red segment is tangent to the curve at the point for the relevant value of . This reflects that the error is smaller than linear as .
The vector is parallel to the red segment, hence tangent to the curve at .
The Euclidean norm of the derivative tells us the speed at which point moves along the the black curve as passes through . This is why the red segments are longer where is larger.
The unit vector is sometimes called the unit tangent to the curve at . For this to make sense, we must have .
Example 3.
The picture below shows a portion of the curve in parametrized by , together with a segment of a tangent line for a particular choice of . This is similar to Example 2.
The Differential
Given a diferentiable function , where is an open subset of , at a point we define the differential of evaluated at , , to be the linear map given by
This is also sometimes written . We may also neglect to mention the point and write .
You have already used differentials when applying integration by parts: the function chosen to be gives us the differential , and it became helpful to think of and as variables which can be manipulated algebraically.
It is common to write or for example in dimensions, You can think of this as conventional notation. It was introduced by Augustin-Louis Cauchy (see problem set 2), along with the formal definition of the derivative as a limit. It is very valuable to solving differential equations and interpreting physical implications of dynamic systems. For us, the important reason to introduce this notation is that it allows us state the Chain Rule in a concise and memorable way, without a lot of messy notation.
Rationale for the notation Formula makes sense if we declare that is the linear function defined by Why is this reasonable? Well, let . Then one can check from the definition that . But since , we can write this as .
Linear Approximations
The definition of the Jacobian matrix implies that if is differentiable at , then Hence, the Jacobian matrix generalizes the idea that the derivative gives the best linear approximation to a function. This can be used to compute approximate values of functions.
Example 4.
Let . Estimate the numerical value of .
Solution Let’s write for Here we chose so that and are easy to compute. Indeed, and , so For comparison, using a calculator we find that , so the approximation was off by about 1%.
Example 5.
Let . Estimate the numerical value of .
Solution. Write for and . Since and ,
Using a calculator we find so our approximation is reasonable.
In these two examples, the linear approximation is not necessary because we have an explicit algebraic description of our function. In many applications, you may know the value at a point and how the function changes, but have no closed form. Most interesting partial differential equations fall under this category, and this idea is the basis for the finite element method using linear approximations.
Problems
Basic
Understanding and using the derivative (i.e. the Jacobian matrix) will be crucial to later concepts, like multivariable integration and change of variables. It also appears, along with its determinant, in STA 247/257 (Probability) and APM 346 (Partial Differential Equations).
Determine all points where a function is differentiable, and determine at all such points.
Let be defined by , and consider the curve parametrized by .
Determine the unit tangent vector to the curve at the point for .
Give the equation for the tangent line to the curve at the point for .
If denotes the position of a particle at time , then determine the velocity and the speed of the particle at time .
Use the Jacobian matrix to compute the approximate value of the function at the point . For example
For , use the Jacobian matrix to compute (without a calculator) an approximate value of
For , use the Jacobian matrix to compute (without a calculator) an approximate value of . Use a calculator to see how good your approximation is.
For , use the Jacobian matrix to compute (without a calculator) an approximate value of . Use a calculator to see how good your approximation is.
Advanced
(Product Rule) Suppose that is an open subset of , and that are functions of class in . Prove that is of class , and that where denotes the transpose.
Note: In above, and are all column vectors. What is the corresponding formula for the row vector (still assuming, as usual, that and are column vectors)?
Suppose that is an open subset of , and that are functions in that are differentiable at a point . Prove that is differentiable in , and that holds at .
The difference between these two questions is that Theorem 1 is relevant for the first one, whereas for the second, since our assumptions only give us information about differentiability at the point , all we can use is the definition of differentiable and of the Jacobian matrix.