Using the Microsoft Excel AVERAGE, STDEV, DAVERAGE and DSTDEV Functions

CMA

Microsoft Excel (XL) has a whole host of built in functions: routines that are preprogrammed to do things that we might use over and over again. Alternatively, we might work with things that are complicated that would be tedious to keep reprogramming. Additionally, to minimise the possibility of programming badly and not realizing it; we can rely on preprogrammed functions. We will look at the following four functions:

  1. §AVERAGE (average or arithmetic mean) function
  2. §STDEV (standard deviation) function
  3. §DAVERAGE (database average) function
  4. §DSTDEV function: the database standard deviation function.

The AVERAGE and STDEV functions are probably familiar to you; but they are included here to provide a convenient starting point for the more powerful and flexible DATABASE functions, DAVERAGE and DSTDEV.

The purpose of the latter two functions is to allow us to build or load a table into XL and analyse the averages (arithmetic means) and standard deviations of one, two, three or more variables, spread over one, two, three or more columns. Of course, the AVERAGE and STDEV can analyse data spread over several columns; but that's not the point as we will see now.

Children's Teeth

Let's take the example of children's teeth! Let's imagine that we wish to analyse the dental records of children from the age of 5 to 16 years; and look at the damaged, missing and filled teeth of these children. For the purposes of this demonstration, we will not be distinguishing between boys and girls, although we could build that in too. The table that follows contains the frequency of each of the categories of problem we are looking at; and they do this by the age of the child, increasing in years:

age decayed missing filled
5 30 10 10
6 20 10 10
7 40 0 20
8 40 10 20
9 10 10 10
10 30 20 10
11 40 20 20
12 10 10 20
13 40 10 10
14 10 10 10
15 30 20 20
16 20 20 20

For example, the table shows that in our survey of 'n' children, we found that there were 30 decayed teeth in children aged 5 years, there were 10 missing teeth in children aged 5 years and there were 10 filled teeth in the mouths of children aged 5 years; and so on for each age group.

Basic Statistics

We can analyse this table in a few ways without the need to know much about mathematics, statistics and XL. Just look at the next table and see the work we can do:

age decayed missing filled
5 30 10 10
6 20 10 10
7 40 0 20
8 40 10 20
9 10 10 10
10 30 20 10
11 40 20 20
12 10 10 20
13 40 10 10
14 10 10 10
15 30 20 20
16 20 20 20
       
Total 320.00 150.00 180.00
Average 26.67 12.50 15.00
Standard Deviation 12.31 6.22 5.22
Minimum 10.00 0.00 10.00
Maximum 40.00 20.00 20.00
Range 0.00 0.00 0.00

So not only can we see the total number of decayed, missing and filled teeth for all children in the survey, but we also know the averages of all of these problems, as well as the standard deviations, the minima, the maxima and the ranges. There are more statistics we could derive from these data; but let's concentrate on the matter in hand.

To complete the early part of this discussion, the next table shows the detail of the functions for the totals, averages and standard deviations: we won't refer to these functions again in any detail!

  A B C D
1 age decayed missing filled
2 5 30 10 10
3 6 20 10 10
4 7 40 0 20
5 8 40 10 20
6 9 10 10 10
7 10 30 20 10
8 11 40 20 20
9 12 10 10 20
10 13 40 10 10
11 14 10 10 10
12 15 30 20 20
13 16 20 20 20
14        
15 Total =SUM(B2:B13) =SUM(C2:C13) =SUM(D2:D13)
16 Average =AVERAGE(B2:B13) =AVERAGE(C2:C13) =AVERAGE(D2:D13)
17 Standard Deviation =STDEV(B2:B13) =STDEV(C2:C13) =STDEV(D2:D13)

The DAVERAGE Function

For all ages of all children, we know the average number of decayed teeth now, the average number of missing teeth and the average number of filled teeth. That's interesting; but suppose we are testing a new kind of toothpaste, a new brushing technique or the fluoridation of water. How will the total arithmetic mean that we have just calculated help us? Well, of course it might, but let's assume that the new toothpaste was introduced, say, 2 years ago; and there has been a control group of children who are now in the age range 7 – 10 years who have been using that toothpaste, and no other, ever since. The new brushing technique might have been introduced just two years ago; and the fluoridation programme might have started only a year ago.

To test the efficacy of the new products and methods, we need to be selective about the statistics we compile and analyse. The database functions in XL help us to be selective. Keeping the work as simple as possible, and taking the toothpaste case as our demonstration case, what we would like now is just to concentrate on the decayed teeth of the control group. The DAVERAGE function lets us do this as follows:

=DAVERAGE(database,field,criteria)

where         database is the database, or range, where our data is stored in the spreadsheet
        field is the variable we want to test: in this case it is “decayed”
        criteria is one or more constraints that helps us home in on our quest!

The table that follows shows us all we need to know and then we'll explain what we've done.

  A B C D
7 age decayed missing filled
8 7      
9 8 >10    
10 9 >20    
11 10 >20    
12        
13 Age decayed missing filled
14 5 30 10 10
15 6 20 10 10
16 7 40 0 20
17 8 40 10 20
18 9 10 10 10
19 10 30 20 10
20 11 40 20 20
21 12 10 10 20
22 13 40 10 10
23 14 10 10 10
24 15 30 20 20
25 16 20 20 20
26        
27 DAVERAGE: 7 – 10 30.000 =DAVERAGE(A13:D25,"decayed",A7:A11)
28 DAVERAGE: 7 – 10 constrained 36.667 =DAVERAGE(A13:D25,"decayed",A7:B11)
29        
30 STDEV all data 10.370 =STDEV(B14:D25)
31        
32 DSTDEV decayed: 7 - 10 14.142 =DSTDEV(A13:D25,"decayed",A7:A11)
33 DSTDEVP decayed: 7 – 10 12.247 =DSTDEVP(A13:D25,"decayed",A7:A11)

Notice, firstly, the introduction of a new section at the top of the table (rows 7 – 11): this is the criteria section: it doesn't have to go at the top; but if you put it at the bottom and then wanted to add more data, you might have problems if you accidentally over wrote this section.

Look at row 27, the DAVERAGE: 7 - 10 row. Here we find the DAVERAGE is 30.000 and the function we have programmed is

=DAVERAGE(A13:D25,"decayed",A7:A11).

Note: we could use a range name here instead of the range A13:D25

This means that we have asked XL to look at the whole table of data: A13:D25 (both titles and values), to concentrate on the decayed variable (notice that the name of the variable we need to look at is enclosed in “” even in the function) and then to look at the criteria, in the range A7:A11 in this case. Why does this give us a DAVERAGE of 30? Well, what it has found is that when we look at the criteria range of A7:A11, this tells us to look at the decayed data for children aged 7, 8, 9 and 10 ONLY and calculate the average number of decayed teeth they have:

= (40 + 40 + 10 + 30) ÷ 4 = 120 ÷ 4 = 30.000

Look at row 28 now. Now we have asked XL to look at children aged 7, 8, 9 and 10 BUT only to calculate the average number of decayed teeth under the following constraints:

=DAVERAGE(A13:D25,"decayed",A7:B11)

Column A        Column B
Age 7   all decayed teeth
Age 8   only if there are more than 10 decayed teeth in this age group
Age 9   only if there are more than 20 decayed teeth in this age group
Age 10   only if there are more than 20 decayed teeth in this age group

So, the average number of decayed teeth under these constraints is:

= (40 + 40 + 30) ÷ 4 = 110 ÷ 4 = 36.667

since there are <20 decayed teeth in the 9 year age group, this group is ignored by DAVERAGE.

Play around with this function: what result do think you will get if you change the function to

=DAVERAGE(A13:D25,"decayed",A7:C11)?

or

=DAVERAGE(A13:D25,"decayed",A7:D11)?

and how do you explain your results?

Suppose now that you changed the function back to

=DAVERAGE(A13:D25,"decayed",A7:C11)

and then in cell C11 you entered the new constraint >30: how does this change the result you had when you calculated the DAVERAGE without this constraint?

Powerful and flexible, I think you will agree. You can set up all sorts of hypotheses and test them using DAVERAGE.

The DSTDEV Function

As a starting point to this part of the discussion, we have entered the STDEV function in row 30 of the table above where we have a standard deviation result for ALL damaged teeth of 10.370. However, look at row 32 and determine what is happening there: what have we done, and what does the result mean?

The format of the DSTDEV function is

= DSTDEV(database,field,criteria)

It's remarkably similar to the DAVERAGE function, isn't it?

Here's row 32 again:

32 DSTDEV decayed 7 – 10 14.142 =DSTDEV(A13:D25,"decayed",A7:A11)

So, we can use the same data, the same field and the same criteria for this function as we did with DAVERAGE. Here, we have calculated the standard deviation of the decayed teeth subject to the constraints in the range A7:A11. Check through this and make sure it's clear!

Work through this example yourself and change the criteria from A7:A11 to A7:B11 and see what happens: why does it happen and what does it mean?

Work through this example again and change the criteria from A7:B11 to A7:C11 and see what happens: why does it happen and what does it mean?

The DSTDEVP Function

Finally, we have included the DSTDEVP function for the sake completeness! This function is related to the DSTDEV function and the difference between the two is:

DSTDEV assumes our data are based on SAMPLE data; and
DSTDEVP assumes our data are based on POPULATION data.

This may be important if you are an advanced statistician or you need to distinguish between sample and population data and situations. XL's Help screen and a statistics text will give you more information on this point.

As a matter of interest, the STDEV function is SAMPLE based; and it has its own STDEVP brother (or sister!).

Conclusion

This page has demonstrated in brief the average (AVERAGE) and standard deviation (STDEV) functions of Microsoft Excel spreadsheets. Moreover, it has introduced the idea of the database versions of these functions, DAVERAGE and DSTDEV: these versions allow us to interrogate any basic data we have by setting constraints that allow us to tease out a variety of relationships and test hypotheses. We demonstrated that the database average and standard deviation functions are both powerful and flexible; and add a dimension to basic statistical analysis that may not be available otherwise.






Duncan Williamson
August 2000 revised August 2001

CMA