Printer Friendly Version of this page

Maths Direct

The area under a curve as a sum

Definite Integrals

Integration: calculating areas

The Area Under A Curve

How To Calculate It Using A Spreadsheet

As with many things that I do with a spreadsheet, the following discussion was sparked off by an e-mail message flying in from the ether. The question that was asked follows and my solution then unfolds!

I sometimes come across situations at work where I need to calculate the area under a plot of a data series but I can't think of a way to do it in Excel (I still use Excel 97). Anybody any ideas?

I will use the Trapezoid Rule in this page and have to confess that my previous version of this page contained an error that Sameer Bhide spotted for me: thanks Sameer. The mathematics involved in this method is not especially advanced although at first sight there are more calculations than we might like! Then Rick Andrews wrote to me in December 2008 to point out an error with the integral calculus section of the page: excellent work Rick, thank you.

For matherically minded visitors, I have updated this page to include a simple treatment of the Integral Calculus way of working through such calculations. This additional section is at the end of the page.

Imagine we are presented with data relating to the cooling of a dish of water from boiling point for 10 minutes. The table below records the times (in minutes) and the temperature (C = degrees Celsius).

Table 1
Time
(mins)
C
0
100
1
71
2
50
3
35
4
25
6
12
8
6.2
10
3.1

Plotted on a graph, we find:

cooling curve of water
Figure 1

Our task, then, is to estimate the area under the curve extending from 0 minutes and 100 degrees to 10 minutes and 3.1 degrees.

The Method

What we are going to do is really to draw a series of boxes, trapezoids, over the curve on the graph and then add together the area of the first trapezoid to the area of the second trapezoid to the area of the third and subsequent trapezoids. The method is explained below: do follow it through step by step.

cooling curve of water with trapezoids
Figure 2

We find the results of using the method in the following table:

Table 2
x
y
area
0
100
85.5
1
71
60.5
2
50
42.5
3
35
30.0
4
25
18.5
6
12
9.1
8
6.2
9.3
10
3.1
 
area
283.0

I have used the Trapezoid rule here although it's not obvious yet.

In fact the area in the '0' row in fact shows the area in the range x = 0 to 1; similarly, the area in the '1' row shows the area in the range x = 1 to 2 and so on. Figure 3 is a simpler, although different, version of figure 2 and it demonstrates the trapezoid points.

When we take the area of the range covered by x = 0 to 1, the shape we would draw would be a trapezoid: the green shape on the bottom left of figure 3. To calculate the area of a trapezoid, we have to split it into two: we make a rectangle out of it and then a triangle. This is explained on the right hand side of figure 3. Leaving us with a formula of

area of a trapezoid and area under a curve
Figure 3

Either the area of a trapezoid is ba + b(c-a)/2
Or the area of a trapezoid is b(c+a)/2

For the range x = 0 to 1 and then for the range x = 1 to 2 we have

Table 3
 
0 to 1
1 to 2
a
100
71
b
1
1
c
71
50
c - a
-29
-21
area
85.5
60.5
cumulative area
85.5
146.0

Do this for each range, increasing in steps of 1 (0 - 1, 1 - 2, 2 - 3 etc) and we arrive at table 2.

There is a variation on this method that improves the result: it involves using logarithms. The following table shows that when we use Natural Logarithms we get a slightly refined result: for this example anyway. For other examples, the logarithm method could improve the situation even more.

The logarithm method is really only applicable when we are dealing with non linear data sets and it works this way.

Table 4
x y
ln(y)
area
0 100
4.60517
85.5
1 71
4.26268
60.5
2 50
3.912023
42.5
3 35
3.555348
30.0
4 25
3.218876
18.5
6 12
2.484907
9.1
8 6.2
1.824549
8.8
10 3.1
1.131402
   
area
282.5

We need to add a column headed ln(y) meaning that it shows the value of the Natural Logarithm of the 'y' values. The area values in this case are exactly the same as before EXCEPT for the result for the range x= 8 - 10.

To evaluate the area for the range x = 8 - 10, we need to regress x on ln(y) to find the beta value of X coefficient as Excel calls it. We use Excel's in built Tools/Data Analysis function to carry out this calculation. The value of beta is -0.34831 and remember it is still in the form of a logarithm. To use this value, we need to find the antilog which is found by Excel to be =EXP(-0.34831) = 0.705881. We then enter the following formula in the cell to give us an estimate of the area for the range x = 8 - 10 = y/0.705881 = 6.2/0.705881 = 8.7834.

Finally

This method is simplified and it gives an approximate solution. If we split our data into more and more ranges, we need more and more trapezoids and that will make our results more accurate. If we take it a stage further we can use calculus and obtain exact answers.

NOTE: I used Microsoft Excel 2000 when developing this sheet but the method will work (with whatever minor amendments necessary) on virtually any spreadsheet I have ever come across.

BONUS: Using Integral Calculus to Find the Area Under a Curve

I have to confess that as I wrote and then rewrote this page I felt myself more and more drawn to including Integral Calculus here. Not that I'm a calculus freak or anything like it. Just that it gives us a much more direct answer that is highly accurate!

Let's take a really simple example of using integral calculus to find the area under a very simple curve: the graph of y= x over the range 0 to 5, or as mathematicians say:

This is what the graph of that looks like:


Figure 4

and we want to calculate the area of the blue section from x = 0 to 5 and y = 0 to 5

Since we're dealing here with a triangle, we learned at school that

the area of a triangle = * base * height

Let's do that: * 5 * 5 = 12.5 square units

Integrative calculus notation shows this as:


user friendly or what?

Translating this into a more user friendly style we get

Hang on ... where did the come from? Well, that's what we need to do to integrate.
INTEGRATION RULE: To integrate x we raise the power by 1 to give x squared and we divide by the value of the new index, two in this case.

To evaluate this expression, we then find the value of y when x = 5 and subtract the value of y when x = 0 from it. In this case, we have:

NEW QUESTION: find the area under the curve for

... this gives
Area under the curve = formula Notice that the has become and -2 has become -2x: again this is because of the rules of integration.

Area = formula

Back to the example that we started with now: you should be able to set out the function for the curve in figure 3: ... the answer is 165 square units whereas the answer shown in figure 3 says the area is 167.5. Well, that's not bad considering the Trapezoid Method is only an approximation.

FOR INTEGRAL CALCULUS PURISTS: please note I have omitted the constant 'k' that you are all familiar with. Forgive me but it doesn't affect the answers for these kinds of calculations, does it?

I have to be perfectly honest and say that whilst I have worked through this discussion of integral calculus and developed all of the graphics and explanations, I have had some help. The links on the left hand side of this page take you to some experts! I am indebted to them.

Printer Friendly Version of this page

Duncan Williamson
August 2000 revised October 2001, February 2004, October 2005 and December 2008

Write to me at any time


Webmaster Duncan Williamson 2001, 2004 & 2005