The mean and standard deviation of grouped data

 

The Normal Distribution Page

 

Areas in the tail of the normal distribution table

 

Box and Whisker diagrams with MS Excel

 

The Business Section

 

Means, Standard Deviations and Coefficients of Variation of Grouped Data

 

The page on the standard deviation is fine as far as it goes: it deals with simple means and standard deviations of data sets. This brief additional page provides a discussion of the mean and standard deviations of grouped data.

 

Since the amount of dispersion can also be an important issue, we introduce the Coefficient of Variation here, too.

 

These ideas are all introduced by working with the heights and shoe sizes of a very large sample of British schoolchildren.

 

Grouped Data

 

By grouped data we mean the situation where we have, for example, sampled many people or situations and it is convenient to classify our data into groups and sub groups: all will become clear with the examples we use here.

 

 

 

where

x is the x variable

f is the frequency of responses

 

The best way to appreciate these formulae is to use them: let’s do that now.

 

Example

 

Here are the heights of almost 30,000 British schoolchildren, ages 7 to 16 years of age.

 

              i      Work out the mean and standard deviations of the data for

a         all children

b        children aged 7 years

c         children aged 10 years

d        aged 16 years

             ii      comment on your results

 

Heights and ages of UK schoolchildren aged 7 - 16 years

 

 

Total

7 years

10 years

16 years

 

Height (cm)

26,821

1,702

3,634

476

95-100

95

22

5

1

0

100-110

105

74

14

5

1

110-120

115

447

222

21

0

120-130

125

2,874

980

189

0

130-140

135

5,274

445

1,201

2

140-150

145

6,218

29

1,700

3

150-160

155

5,174

4

474

24

160-170

165

3,620

1

34

102

170-180

175

2,336

1

3

219

180-190

185

695

1

3

109

190-200

195

81

0

3

14

>200

210

6

0

0

2

Note: you can go to http://censusatschool.ntu.ac.uk/tableheight.asp and download these and additional data for the heights of UK schoolchildren.

 

The original class intervals are those given in the left hand column: we have use the mid point of these class intervals to represent our ‘x’ values: notice that the final class of >200 is open ended and we have chosen to limit that class interval to 210 cms … argue if you disagree and you can always check your version against this one.

 

Answers

 

i We’ll give the answer to part ‘a’ of the question in full and then to the other parts in outline only.

 

a All years

Height (cm)

 

 

 

 

x

f

fx

x^2

fx^2

95

22

2090

9025

198550

105

74

7770

11025

815850

115

447

51405

13225

5911575

125

2,874

359250

15625

44906250

135

5,274

711990

18225

96118650

145

6,218

901610

21025

130733450

155

5,174

801970

24025

124305350

165

3,620

597300

27225

98554500

175

2,336

408800

30625

71540000

185

695

128575

34225

23786375

195

81

15795

38025

3080025

210

6

1260

44100

264600

Totals

26821

3987815

286375

600215175

Mean

 

 

148.683

Standard deviation

 

 

16.494

 

b, c and d answers are all contained in a summary table, along with the results for part a, repeated for convenience:

 

Summary

All

7 years

10 years

16 years

Mean

148.683

126.557

142.003

174.370

Standard deviation

16.494

7.494

8.595

10.183

 

ii The results we have can be interpreted as follows:

 

since we are working with the heights of children, we must expect the average heights to increase year by year: that is, the average height of 7 year old children should be less than the average height of a 10 year old and that should be less than the average height of a 16 year old.

as for the standard deviations, we can see that they are increasing in line with the increases in average heights. An explanation of the increasing standard deviations could be that the children are growing at different rates: this means that whilst an overall age group is increasing, there will be a few children who are growing at, say 0.5 cm a year whilst others are growing at 1 cm, 2 cms and so on. The effect of these different growth rates is to increase the standard deviations as the dispersion of the age groups will be greater and greater as the age increases. We should see this effect on the following graph:

 

 

Coefficient of Variation

 

The coefficient of variation shows us the extent of the dispersion, or variation, in a data set by comparing the standard deviation with the mean:

 

 

We said in part ii of the worked example that the heights of the 16 year age group was more disperse than the other age groups … the coefficient of variation will extend this part of the discussion for us. Here are the coefficients of variation for all of the data sets we have:

 

Summary

All

7 years

10 years

16 years

Coefficient of Variation (%)

11.093

5.921

6.053

5.840

 

So, we can see that even though the averages and standard deviations are changing as we have already seen, in fact the dispersion, or variability, of the data sets is reducing as the children get older.

 

Your Turn

 

From http://censusatschool.ntu.ac.uk/tablefootsize.asp we have found the corresponding heights and shoe sizes of almost 60,000 British schoolchildren, as per the table below.

 

Choosing any shoe size range you like, calculate the

 

a         means

b        standard deviations

c         coefficients of variation

 

 

Foot sizes

Height

Total

> 12

12 to 17

> 17 to 19

> 19 to 21

> 21 to 23

> 23 to 25

> 25 to 27

>27 to 29

> 29 to 35

Total

57,509

1,668

1,509

5,139

12,747

15,838

12,441

5,336

2,038

793

95

38

5

13

5

9

3

3

0

0

0

105

168

11

28

35

44

24

15

7

3

1

115

1,112

39

239

512

214

66

34

3

2

3

125

6,118

161

568

2,242

2,453

468

148

48

11

19

135

10,793

262

359

1,552

4,966

2,858

590

136

33

37

145

12,668

362

147

458

2,989

5,563

2,479

480

115

75

155

13,025

439

93

216

1,475

4,533

4,572

1,260

326

111

165

9,225

259

47

96

490

2.04

3,631

1,958

541

163

175

3,459

98

12

17

89

251

885

1,202

694

211

185

797

23

3

6

14

27

75

228

282

139

195

100

9

0

0

4

5

9

13

29

31

210

6

0

0

0

0

0

0

1

2

3

 

© where appropriate Duncan Williamson

16 October 2002

© Webmaster Duncan Williamson 2002