Understanding Integration, Integrals, Antiderivatives, and their Relationship to Derivatives: the Fundamental Theorem of Calculus

So…what are integrals and how are the related to derivatives? Think of dx, the symbol for infinitesimal change in the x coordinate (it’s different from just delta x in that in dx is the limit as delta x approaches zero) , here in the context of derivatives:

The idea…is to compute the rate of change as the limiting value of the ratio of the differences Δy / Δx as Δx becomes infinitely small.In Leibniz’s notation, such an infinitesimal change in x is denoted by dx, and the derivative of y with respect to x is written dy/dx. (Wikipedia)

Apparently many definitions of what exactly dy/dx means are lacking (there’s a whole article on JSTOR entitled What Exactly is dy/dx?)  There are manipulations you will need to know with dy/dx.  So, say you have y=f(x), and the derivative with respect to x is  dy/dx=f'(x).  You could then write dy=f'(x)dx.  Then if you take the integral of both sides of this, you get y=f(x)!  So if you had y=ln(x), and wanted to find the integral of this (the antiderivative of natural log), you would write integral of y=integral of ln(x)dx, and you could use integration by parts to solve this–setting u = ln(x), then du = (1/x) dx, dv = dx, and v = x.

With derivatives, the derivative is basically a slope, a rate of change, the change in the y values of a function as the x value, the input of the function, is changed. The derivative is basically an instantaneous rate: we look at a change in x value so small, infinitesimally small, so that you get the slope, the rate of change, of a function at a single x input. The instantaneous rate of change for an x input, the derivative, is a tangent line to the function at that point. Think about the rate of change in an intuitive way: if between two different x values there is no change in the y values, the rate of change is 0, no change in y based on change in x. If there is a positive change in the y values based on change in x inputs, there’s a positive derivative; if there’s a negative change, there’s a negative derivative. For functions that are not linear, the derivative can be zero, positive, and negative across different x values as the curve changes. See this post on critical points, points of inflection, and concavity for more information.

Infinitesimal change in the x input, dx, plays an important role in integrals as well. Whereas with derivatives you are finding the rate of change of a function at a single point, the instantaneous rate, with integrals you are finding the area under a curve between two points. The notation is like \int_a^b f(x)dx. Here, we’re finding the area under the curve between x=a and x=b, where the curve is made by the function f(x). See this article on Riemann Sums for more information and this section on “Areas and Integrals” in Mathematics for Economists (Google Book Search free digitized version).

It’s easy to think about integrals and the area under the curve when you think of the area of a rectangle on a graph. Think about the area under a curve from x=0 to x=5 where the y value is 4 at each x value; think of the area of the space sketched out by the ordered pairs (0,4) and (5, 4). Well, you know the area of a rectangle is computed by base * height. So, it’s easy to see here that the area under the curve is simply x * y=5*4=20. Well, in this case the height is uniformly 4, so that’s easy. But what about when the height varies per x, as in a curve? That’s where Riemann Sums and dx come in.

Basically, think of using tons of very skinny rectangles to approximate the area under a curve; the very skinny rectangles would do a good job of approximating the area under the curve when you added them all together. Well, that’s what happens with integrals. You’re multiplying the y value (the height=) at each new x by the change between each of the x’s, where the change is the infinitisemally small dx. So you have very small changes in x, dx’s, multiplied by the y value at each of those x’s, and you add them all up, and that’s the area under a curve.

Now, how does this relate integrals to derivatives? There’s the Fundamental Theorem of Calculus. Basically, it casts integrals in terms of antiderivatives. Say you have a function f(x), the integral of this function, F(x), is called the antiderivative. When you take the derivative of an antiderivative, you end up with the original function, f(x). That’s the (first?) Fundamental Theorem of Calculus; taking the derivative of an antiderivative reverses the antidifferentiation and you end up with the original equation f(x). It may be specific to indefinite integrals in this part, I’ll look into it more:

The first part of the theorem, sometimes called the first fundamental theorem of calculus, shows that an indefinite integration[1] can be reversed by a differentiation.

The second part, sometimes called the second fundamental theorem of calculus, allows one to compute the definite integral of a function by using any one of its infinitely many antiderivatives. This part of the theorem has invaluable practical applications, because it markedly simplifies the computation of definite integrals. (Wikipedia)

Think of \int_a^b f(x)dx. That’s the area under the curve from a to b. Now think of \int_a^c f(x)dx. That’s the area under the curve from a to c. Now, if you want to find the area under the curve from b to c, you can subtract, \int_a^b f(x)dx - \int_a^c f(x)dx. Well, this is a key part of the Fundamental Theorem of Calculus. Say y=f(x). Say the antiderivative of that is z=F(x). The Fundamental Theorem of Calculus says that dz/dx(F(x))=y=f(x). Another way of writing this is lim h->0 (F(x+h) -F(x))/h = f(x), as shown on p. 375 of the first edition of Mathematics for Economists (Google Book Search digitized version).

Now, y at an x input value is the height of the curve at that x value. Think of the Riemann Sums explanation of integration; if you made the rectangle skinny enough in the x dimension, if the difference in the x values is infinitesimally small, dx, then you basically get one height, one y value. This is how integration and derivatives fit together.

Take a look at the graph at the top of the post again.

Basically, if you have \int_a^b f(x)dx and \int_a^c f(x)dx, the difference between the areas will be \int_a^b f(x)dx - \int_a^c f(x)dx. That leaves you with the area under the curve between b and c. Say c-b=h, and make h infinitesimally small, like dx. h is then dx, your infinitesimally small change in x. Think of the definition of derivatives via difference quotients. Here the area between b and c is z=Area=A(x)=F(c)-F(b)=F(b+h)-F(b), where F depends upon y=f(x); for each difference in x, for each little rectangle, the Area is y * difference in x, and to get the Area z, all of these little rectangles are summed up; look again at the section on Riemann Sums and the definition of integrals.

Now, if you took the derivative of this area function, that would be change in the dependent variable, z over change in the indepenedent variable, x, where the change in x is h. So that would be dz/dx= (F(b+h)-F(b))/dx=(F(b+h)-F(b))/h.

As h is infinitesimally small, the Area function with output z here looks at one rectangle for F(b+h)-F(b), with one y value and one dx value which is h. Looking at the definition of an integral, the one y value here is y=f(x), and there is one integral definition for this one rectangle, \int_b^{b+h} f(x)dx,where b+h=c, so we could also write it as \int_b^c f(x)dx.

So here dz/dx is equivalent to dy/dx here, which is (F(b+h)-F(b)/h.

Now, if the change in x, which here is h, is infinitesimally small, you basically get one y value, and the area is y*dx, where the integral is \int_b^{b+h} f(x)dx, the area is basically f(x)dx, here f(b)dx, which equals y*dx. As dx=h, that is y*h. So the deriviative (F(b+h)-F(b)/h would be equivalent to (y*dx)/dx=(y*h)/h=y. So tada, that’s why if F(x) is the antiderivative of f(x), when you take the derivative of F(x) you are left with y, where y=f(x). Yay! This could be slightly off/inaccurate, so check the Wikipedia post Fundamental Theorem of Calculus and pp. 375-377 of the first edition of Mathematics for Economists for more info (unfortunately p. 375 is blocked out on the Google Book Search digitized version).

Calculus for Dummies has some good graphs illustrating some of these principles such as on pp. 242 and 247.

Here’s the text on the illustration above, in case it’s too small for you to read:

As dx is infinitisemally small,f(b)~f(b+h)=f(c). The area under the curve
between b and b+h is F(b+h)-F(b) (where F(b+h) is the area under the curve
from a to c=b+h, and F(b) is the area from a to b. Since h is so small, the
area is basically f(b)*h. Now, the derivative of this integral area function is
the change in output, area, which is f(b)*h, over change in independent
variable, h. So that equals f(b), which shows the Fundamental Theorem of

Once you start doing integrals there are lots of techniques like partial fractions etc that will make it easier to solve integrals.

Internal tag: math

Some latex instructions, including how to make integral signs, here.


Functions, Mappings, and Counting, Surjections, Injections, and Bijections

See more in Google Books’ scan of Norman Biggs’ Discrete Mathematics:

A few quick points to be expanded upon later, correct me in the comments if I make any mistatements:

  • functions map from one set to another, from a domain to a codomain; the set of all output values is the range. The sets can be finite sets or infinite sets. Common language is such as “the function f maps from the set X to the set Y.” Often X and Y are infinite like N (natural numbers) and Z (all integers). X would be the range and Y would be the codomain.
  • The input numbers from the domain are called arguments, and the input is usually placed into a placeholder called an independent variable–the independent variables inputted may in fact be dependent variables of another function as in the composition of functions.
  • The output numbers from the range are called values, and are represented by a placeholder called the dependent variable.
  • Function definitions specify the codomain, not the range, thus when function definitions say something like “a function from X to Y” or ” ƒ is a function on N into R” the Y and the R specify the codomain, not the range:

Formal description of a function typically involves the function’s name, its domain, its codomain, and a rule of correspondence.

  • The range is a consequence of the function, the range consists of all the values mapped to/outputted by the function, but the range isn’t stated in the function definition, the codomain is.
  • Each number in the domain can map to at most one value. The function may be undefined for certain input numbers. However, each value in the range can be mapped to by more than one argument in the domain. The codomain can have values which aren’t mapped to at all; the range is the set of all values that are mapped to.
  • When each value is mapped to by at least one input number, that’s called a surjection. In a surjection, each value must be mapped to by at least one input number, and each value may be mapped to by more than one input number.
  • When each input number maps to at most one value, that’s called an injection. In an injection, there can be values in the range set which are not mapped to at all, but no value can be mapped to by more than one input number.
  • When there’s an injection, and some of the values aren’t mapped to, then that value set must be a codomain, and not a range of the function. Because according to Wikipedia “the range of a function is the set of all “output” values produced by that function”–if the range is the set of all output values, and in an injection some of the values might not be mapped to by the function, then that isn’t a range, but a codomain.
  • “If f is a surjection then its range is equal to its codomain.” This is because in a surjection, each value in the target set is mapped to by a input value, there are no unmapped-to values as there could be in an injection. Thus I figure in an injection, where there are values that are not mapped-to, the codomain is larger than the range, but in a surjection, the range is equal to the codomain.
  • When a function is both a surjection injection and an injection, i.e. each value is mapped to by one and only one input number, that’s called a bijection.  Think of bijections as being one-to-one mappings.

Here’s some interesting information on ranges and codomains from Wikipedia:

Mathematical functions are denoted frequently by letters, and the standard notation for the output of a function ƒ with the input x is ƒ(x). A function may be defined only for certain inputs, and the collection of all acceptable inputs of the function is called its domain. The set of all resulting outputs is called the range of the function. However, in many fields, it is also important to specify the codomain of a function, which contains the range, but need not be equal to it. The distinction between range and codomain lets us ask whether the two happen to be equal, which in particular cases may be a question of some mathematical interest.

For example, the expression ƒ(x) = x2 describes a function ƒ of a variable x, which, depending on the context, may be an integer, a real or complex number or even an element of a group. Let us specify that x is an integer; then this function relates each input, x, with a single output, x2, obtained from x by squaring. Thus, the input of 3 is related to the output of 9, the input of 1 to the output of 1, and the input of −2 to the output of 4, and we write ƒ(3) = 9, ƒ(1)=1, ƒ(−2)=4. Since every integer can be squared, the domain of this function consists of all integers, while its range is the set of perfect squares. If we choose integers as the codomain as well, we find that many numbers, such as 2, 3, and 6, are in the codomain but not the range.


In mathematics, the codomain, or target set, of a function, described symbolically as f : XY, is the set Y into which all of the output of the function is constrained to fall.

All the output that the function can possibly produce from its given domain, X, is the image. The function’s image will not necessarily fill the entire codomain Y, even though the output must all land inside of the codomain: there can be points in the codomain that are “not used.”

The codomain (or target) is part of the definition of a function. The image (or range) is a consequence of the definition of a function: the image is a subset of the codomain and depends upon (i.e. is a consequence of) how the definition of the function prescribes the domain, codomain, and map or formula.

(The domain of f is the set X.)

Internal tag: math

A Closer Look at the Basics of Functions and Derivatives

When looking at a function name, the input variable(s)/independent variable(s) are the letters in parentheses next to the function name such as the x in f(x). (Disclaimer: I’m no math pro, so take this all with a grain of salt). Basically, anything you put in the f(x) in place of the x you will substitute in for the x in the right hand side of an equation. You probably already know this of course. So, say you have f(x)=x^2. Then if you put in 5 as x, like f(5), that of course is 5^2. Another example is if you have f(x)=f(a) + (x-a). If you insert a+h into the function, like f(a+h), then you substitute a + h in the right hand side instead of x, so the right hand side becomes f(a) + (a + h -a)=f(a) + h.

See the post When to Use the Composite Function/Chain Rule for Derivatives for what to do when taking derivatives when the variable inside the parentheses is modified in any way, such as multiplied by some factor or raised to the power of or added to or subtracted some other terms; in that case the function is a function of another function, is a composite function, in which case you may have to use the composite function/chain rule to find the derivative.

Now, it’s important to figure out what variable you’re differentiating with respect to, and what exactly that means. Basically, “differentiating with respect to” a variable means that you’re figuring out how much the output of the function changes, how much the dependent variable changes, based on a change on the variable you’re differentiating with respect to. So if you have y=f(x), and you differentiate the function with respect to x, so it’s f'(x) or also written as dy/dx, that means you’re figuring out the ratio at each x of how much y changes based on how much x changes; it’s a rate. (See What dx Actually Means for more info on the dx notation, dy/dx is basically the limit of \Delta y / \Delta x as \Delta x \rightarrow 0, meaning the change in y based on change in x as the change in x gets very, very small, i.e. the “instantaneous rate of change,” instantaneous as in the x hardly changes at all; the change in the output y of the function is based not over a range of x but practically right at that x since the change is infinitely small.)

Now the interesting thing is that if your function doesn’t change based on the variable you’re differentiating with respect to, the derivative is zero. That’s pretty important, and makes sense. If your variable is y=f(x) and f(x) isn’t a function of z, i.e. it doesn’t changed based on changes in z, differentiating f(x) with respect to z will yield no change, i.e. the derivative will be zero.

This becomes important especially when dealing with constants and doing partial derivatives where you treat some variables as constants. The common sense thing to remember, is that you have a constant, if a variable is being treated as a constant, and if a function isn’t a function of some variable, differentiating the function with respect to that constant or variable being treated as a constant or variable not affecting that function will yield a derivative of zero, which makes sense; the function’s output won’t change at all based on that constant, or variable treated as a constant, or changes in a variable which doesn’t affect the function.

When we are given “constants” such as c, without specifically specifying the value of the constant such as if we were to specify that c=5, c really a variable that we just treat as a constant when we differentiate the function?. If we don’t specifically specify the value, that means c could equal 1, 5, -100, or anything else, like a variable…it’s only treated like a constant when we differentiate the function.

Some other interesting information about derivatives and constants:

As far as I know any constant can be written as a function of a variable. For example, if you have the constant 5, say y=5, you could write that as a function of x, c=y=f(x)=5 * x^0=5*1=5. 5 can be thus written as a function of x to the power of zero, which equals one. The derivative of c=y=f(x)=5 * x^0=5*1=5 with respect to x is zero, since any change in the x results in no change in the output variable y. No matter what change in x, the output y still equals 5. You can also prove this using the power rule: using the power rule on y=f(x)=5 * x^0 = 0 * 5 * x^-1 which equals 0.

This is all useful to know once you start doing partial derivatives. You could have a function of three variables, say z=f(s, x, y). However, if the s variable is raised by the power of zero, wherever the s variable appears in the function, its value is actually 1, as s^0=1. Therefore, f(s, x, y) is equivalent to f(1, x, y), since everywhere an s appears in the function equation you can substitute a 1. What this means effectively is that the function f really only depends on the x and the y; s acts as a constant since it’s raised to the power of zero, there can be no change in the output z based on any change in s; where s is raised to a zero, it’s derivative will be zero, by the power rule, y=f(x)=5 * x^0 = 0 * 5 * x^-1. For example if you had z=f(s, x, y)= 5s^0x^3y^2, that would be 5*1*x^3*y^2, which equals 5x^3y^2, so the output z of function f only depends on changes in the x and the y variables. If in the equation you are adding s^0 as a term in a polynomial, no matter what s changes to, the value of s^0 will always be equal to 1; you would always be adding a 1 to the rest of the terms in the polynomial, so once again, the value of the function would not depend on the s, the value would not change based on any changes in s. The appendix to chapter 14 of the book Mathematics for Economists discusses this topic.

An interesting way to think of the derivative of a function is that it is a graph of the slope of the tangent line at each x value of the original function.  That is, on the original function, at any x you can find a tangent line to the function at that x; the slope, the change in y over change in x at that point, the instantaneous rate, is the y value of the derivative at that x.  So when you graph the derivative of a function, you graph a collection of slope values of the tangent lines at each x of the original graph.

Also see:
Functions, Mappings, and Counting, Surjections, Injections, and Bijections

When to Use the Composite Function/Chain Rule for Derivatives

Ah, the composite function rule/chain rule. (Wikipedia) (Mathematics for Economists).

When would you want to use the composite function/chain rule? (Note: I’m no math expert, so take this all with a grain of salt). Well, if you have a function that’s a function of another function, i.e. a composite function, sometimes the easiest way to find the derivative of the composite function is to use the composite function rule/chain rule.

From Wikipedia:

In mathematics, a composite function, formed by the composition of one function on another, represents the application of the former to the result of the application of the latter to the argument of the composite. The functions f: XY and g: YZ can be composed by first applying f to an argument x and then applying g to the result. Thus one obtains a function g o f: XZ defined by (g o f)(x) = g(f(x)) for all x in X. The notation g o f is read as “g circle f“, or “g composed with f“, “g following f“, or just “g of f“.

It may be helfpul to think about what a function is. Functions are generally formulas that you apply to some input, and which “map” the input to some output. The “value” of the output, the dependent variable, is usually named some variable like y, and the name of the function is usually something like f or g; the input, the independent variable, is usually named some variable like x, inside parentheses next to the name of the function, like f(x). From the http://www.math.csusb.edu/ website:

A function (or map) is a rule or correspondence that associates each element of a set X called the domain with a unique element of another set Y called the codomain. We typically give the rule a name such as a letter like f or g (or any letter of your choice) or a name agreed upon by convention like sine or log or square root.

Now, functions can be very simple, such as y=f(x)=x, in which case the function basically doesn’t do anything but map x back to itself. You can have more complicated functions such as y=f(x)=x^3 + 2x + 5, a polynomial, which does quite a few things to the input x before outputting the output value y.

Functions are interesting because basically anything in a mathematical expression can be called a function. Take y=x^3 + 2x + 5 for example. You could say x^3 is a function which maps x to some variable z, and you could name the function g(x). You could say 2x is a function which maps x to some variable u, and you could name the function h(x). You could even say 5 is a function which maps x to the constant 5 each time and name the function i(x) and the name the constant c. You can write 5 as a function of x here if you want to, c=5 * x^0=5*1=5. So pretty much anything in a math expression can be called a function, even constants.

So what about composition of functions? This is another area where I think you can basically find a function to be a composition of functions whenever you want–but there are only certain circumstances in which it matters enough for you to think about using the composite function rule.

One example of a situation in which you have a noticeable composite function is when instead of a lone x or some other independent variable within the parentheses of the function notation, you have other things going on, such as y=f(5x) instead of just y=f(x). In this case, the 5x within the parentheses is a whole other function, you could name the function g for example, and name the output of the function g(x) a dependent variable such as u, and then you would have u=g(x)=5x. Then since y=f(5x), and 5x=u, y is a function of u, a function of the function g(x), and also a function of x, since u is a function of x. In this case we have the composite function y=f(u)=f(g(x))=f(5x).

Now, here’s how to use the composite function rule/chain rule (see Wikipedia and Mathematics for Economists). To find dy/dx, you can first find dy/du then multiply that times du/dx. What if y=f(x)=u^2 and u=g(x)=5x? Then y=f(u)=f(g(x). By the power rule, dy/du would be 2u. Then, where u=g(x)=5x, du/dx would equal 5. By the composite function rule, the derivative dy/dx = dy/du * du/dx = 2u * 5 = 2*5x *5 =50x. This is why I said that there are some cases in which you want to use the composite function rule and in other cases you won’t need to think about it: in this case it might have been simpler to distribute the power of 2 in the beginning, so if we had y= (5x)^2, then y=25x^2, and using the power rule then dy/dx=50x, which is what we got by using the composite function rule above. So sometimes you can simplify first or figure the problem out without explicitly using the composite function/chain rule, and other times it’s easier to start out by using the composite function/chain rule.

When you are multiplying or dividing terms with the variable you are differentiating with respect to, when you are multiplying different functions (see the above about how just about anything can be called a function), in order to differentiate the resulting function, a function which is a product or quotient of two other functions, you can use the product and quotient rules. Once again, you only need to use these rules when it would be easier than multiplying out or dividing out the functions, or when the functions can’t be simplified any further. For example, if you had y=5x^2 * 3x^3 you might as well just multiply this out and then take the derivative of the result. You could have 5x^2 * 3x^3=15x^5, then use the power rule to get dy/dx=75x^4. or you could use the product rule to get the same result, but it would take more effort. You could even use the product rule on y=f(x)=5x, since 5=5*x^0, and here y =5*x^0 * x, in case you were wondering; there are many functions where there’s no point in using the product rule. But if the functions you start with are complicated enough, it can be simpler and easier to use the product rule to begin with instead of multiplying out the functions then taking the derivative of the product. (See product and quotient rules, Wikipedia and Mathematics for Economists). And to sum up, when the output of one function is the input into another function, then you use the composite function rule/chain rule to find the derivative. (See composite function rule/chain rule, Wikipedia and Mathematics for Economists).

Also see: A Closer Look at the Basics of Functions and Derivatives