Box and Whisker Diagrams with MS Excel: an upgrade

Building a Trial Balance using Excel

Permutations and Combinations

Shortcuts REPT and Formatting

DATA TABLES: one variable and two variable

Excel's SOLVER Function

Box and Whisker Diagrams with MS Excel

Using SUMIF to generate your Profit & Loss Account

Import External Data with Excel XP/2002

Solve simultaneous equations with MINVERSE and MMULT

Means, medians and modes abound in this section of four pages: start here

Candy striped XY graphs: not bad!

Rule of 72: compound interest

Lorenz Curves 1: what they are

Lorenz Curves 2: how to prepare them using Excel

Splitting text using Excel's in built functions

Charts that communicate

Using Gantt Charts to Provide Information

Power Point Presentation 1 Designing spreadsheets

Power Point Presentation 2 Introduction to Pivot Tables

The Excel function NPER

The Excel function DSTDEV and several others

How to derive and use the area under a curve

Box and Whisker Diagrams: an upgrade

Nikolai Graae kindly wrote to me from Denmark, having read my page on BoxPlots and he gave me some information that I was neither ready for nor aware of. The upshot is that my BoxPlots page is, unbeknownst to me, inadequate. When he told me what he had to say and I verified it, I was shocked to find that BoxPlots aren’t as simple as they pretend to be.

Enough rambling: what Nikolai told me is that what I had prepared is fine for secondary or high schools but not for statisticians. Moreover, he warned me that whilst Excel is smart, it will take some nifty footwork to get it to cope with what he told me. What he told me is this:

Hi Duncan,

Excellent method for making box and whisker diagrams (boxplots) in Excel.

Regarding the missing cross, it’s no big deal really since the dash is preferable to the cross as a median symbol. Anyway, in order to see the cross and other ‘missing’ symbols, background colour has to be set to ‘no colour’ and voilà, all missing markers are now available.

The diagrams will look more professional if the markers for Q1 and Q3 are omitted, that is set marker to none.

And to make the box plot statistically correct the whiskers should not extend to the minimum and maximum values but to the smallest and largest observations within 1,5*IQR (interquartile range, Q3-Q1).

Observations between 1,5*IQR and 3*IQR are termed mild outliers and are marked with a circle (for example), whereas observations that fall outside of 3*IQR are termed extreme outliers and are marked with a cross.

Creating a box plot that reflects true IQR, mild and extreme outliers will demand a bit more manual work as min and max observations will have to be compared to <1,5*IQR and then set as the range of the whiskers, and if there are observations >1,5 and 3*IQR, they will have to be included in the data table used for the box plots.

Regards,

Nikolai Graae

Firstly, I’m ashamed that I didn’t spot how to get the cross to show: I should have spotted that. As for the stats, I checked my sources and found that I hadn’t made a mistake as far as they were concerned. I then checked my more advanced sources and found that we are now talking about Tukey’s method of preparing a BoxPlot.

The major change in the method, as Nikolai has told us lies in the length of the whiskers. I also found out about showing the outliers … all will be revealed in a moment.

I have found it tricky to program a spreadsheet to cope with all of this and my solution is a bit ham fisted. Someone with Visual Basic knowledge who wants to pile in is more than welcome here NOW! Nevertheless, it is possible to get Excel to sort out what you want and I have done it: it’s not elegant but it has allowed me to make a fascinating discovery!

Nikolai is correct in his correction, there is no doubt about that. However, when I came to construct boxplots on the basis of the Tukey method I found that a good proportion of them look no different from the boxplots that I would have drawn using the method in my original demonstration. Let me guess that this is why the method was revised and has become the norm for secondary schools and less advanced courses.

The key issues in the construction of boxplots are

  1. The size of the box and
  2. The length of the whiskers

The nature of the data will determine whether there is any difference between the method I originally demonstrated and Tukey’s method. I recommend anyone with the energy and patience to verify for themselves that what I have found is true and if anyone would like to discuss this matter further, just email me and perhaps we could swap Excel files.

My spreadsheet file is available for anyone to unravel but I have to confess that for various reasons it has gone largely undocumented. It does contain some comparisons between the two approaches we have discussed here, however.

Thanks Nikolai for your insights and I am sure that visitors to this site will be grateful to you, especially anyone who is operating at an intermediate level rather than the basic level at which my initial page was first written.

© Duncan Williamson
20 July 2003

Write to me at any time

 
 
© Webmaster Duncan Williamson 2003