Out.Stats3b

Central Tendency

What's the average?  We have heard the question many times:  travelers will ask about on-time averages, university administrators will ask about retention rates (average number of students that return after their first year), investors will ask about average rates of return, sports fans may ask about batting averages,...  But how do we go about answering the questions?  How do we compute the averages?  The first thing to realize is that when people are talking about the average they are talking about a measure of central tendency.  In this section we will talk about three measures of central tendency, the mean, median, and mode.  To better understand the difference between the three measures, let's return to our example of grades.  For an on-line discussion of measures of central tendency you should check out the UCLA On-line Statistics CourseMathematics 220DX Statistics at the New Hampshire College, DAU the Stat refresherand Hyperstat Online by David Lane at Rice University

To demonstrate the measures of central tendency we will use our two grading examples.  The frequency distribution of the exam scores for ECN202 is repeated in the first diagram below and the histogram for the grades in ECN201 follows it.

Mean: This is what people generally mean when they say average. The mean is the arithmetic mean of all the data and is defined as the sum of all possible values divided by the number of observations.  In the table below we have the grade data for the class that had been divided into four groups.  For each group we added up the grades to get a total for the grades and then divided the total by 13, the number of students in each group.   If we then add the four group averages and then divide this sum by the number of groups, you get the average of 84.454. This is the same number that you would have obtained if you had simply added up all 52 grades and divided by 52  [Note: this procedure works here only because the groups all had the same number of students.  If the groups were of different size, then you would have needed to weight the groups by their relative size.

In terms of the frequency distribution above, you see that there are not any actual scores of 84 or 85, but those scores are in the middle of the distribution of grades.  If there were more grades in the higher range, then this would drag up the mean score. 

Grades on ECN202 Exam 1

Student Grade Student Grade Student Grade Student Grade
1 91.983 14 97.531 27 96.8659 40 89.0001
2 91.7597 15 98.2979 28 76.2859 41 76.5876
3 87.9158 16 70.4242 29 99.6804 42 93.3512
4 77.0586 17 72.6251 30 87.6299 43 88.82
5 98.7479 18 86.9584 31 89.6395 44 77.4919
6 79.8029 19 95.2241 32 85.6969 45 94.7336
7 80.5968 20 91.9544 33 95.2098 46 75.4139
8 77.8953 21 80.2882 34 71.9719 47 86.5368
9 96.1051 22 77.2291 35 92.3448 48 93.7865
10 74.1581 23 93.1482 36 74.4269 49 73.8672
11 83.152 24 75.1727 37 82.7137 50 75.7028
12 91.7678 25 87.5282 38 77.8714 51 73.415
13 78.1368 26 80.501 39 71.828 52 74.7755
Total 1109.08   1106.88   1102.16   1073.48
Group Average 85.3138   85.1448   84.7819   82.5756
Average   84.454          

One of the features of this work is the difference between the average for the class and the average for the groups.  As we will see in the inferential statistics section, the group averages could be viewed as averages based on a sample of the entire student population.  What you see is that the group averages tend to be close to the class average, but they are not exactly the same.  

In the ECN201 example to make the computations a bit easier we will examine the descriptive statistics when we take a sample of every fifth student.  The sample appears in the table below.  

Sample of Grades on ECN202 Exam 1

Student

Grade

Student

Grade

1

13

61

22

6

25

66

20

11

19

71

22

16

14

76

24

21

18

81

17

26

27

86

19

31

19

91

26

36

22

96

21

41

25

101

22

46

19

106

22

51

12

111

22

56

25

116

15

The abbreviated descriptive statistics for the sample of ECN201 grades are presented in the table below. [the complete set of statistics can be found on the ECN201grades site].  The "average" grade in the sample is 20.4, fairly close to the mean of 21.7 for the entire population of grades.  In the following section we will examine the differences between the average and the other two measures of central tendency.  

Descriptive Statistics: ECN201 Grades

Sample Population

Mean

20.4

21.7

Median

21.5

22

Mode

22

21

Sum

490

2580

Count

24

119

Median: The median is the midpoint in the distribution of grades.  At the median score there are as many people above the score as there are below it. The easiest way to do this would be to sort the data by score and pick the midpoint (if there are an even number of observations as there are in this example, you simply average the two middle values).  In this ECN202 class, the median grade was 84.42, only slightly below the mean value. 

Grades on ECN202 Exam 1

Grade Student Grade Student Grade Student Grade Student
70.4 1 76.6 14 85.7 27 92.3 40
71.8 2 77.1 15 86.5 28 93.1 41
72.0 3 77.2 16 87.0 29 93.4 42
72.6 4 77.5 17 87.5 30 93.8 43
73.4 5 77.9 18 87.6 31 94.7 44
73.9 6 77.9 19 87.9 32 95.2 45
74.2 7 78.1 20 88.8 33 95.2 46
74.4 8 79.8 21 89.0 34 96.1 47
74.8 9 80.3 22 89.6 35 96.9 48
75.2 10 80.5 23 91.8 36 97.5 49
75.4 11 80.6 24 91.8 37 98.3 50
75.7 12 82.7 25 92.0 38 98.7 51
76.3 13 83.2 26 92.0 39 99.7 52
Median = (83.2+85.7)   84.4244        

The median for the population of ECN201 grades was 22 while the median for the sample was 21.5.  

Mode: The mode refers to the most frequently observed number.  If we look at the scores and round them to whole numbers and sort them we can find that the number 77 appears four times and the number 92 appears 5 times. These would be the modes in the score distribution.

Grades Grades Grades Grades
70 77 86 92
72 77 87 93
72 77 87 93
73 77 88 94
73 78 88 95
74 78 88 95
74 78 89 95
74 80 89 96
75 80 90 97
75 81 92 98
75 81 92 98
76 83 92 99
76 83 92 100

The mode for the population of ECN201 grades was 21 while the median for the sample was 22.  

In this example we have looked at three measures of central tendency, each of which gives us a bit of information on the class' performance on the exam. The fact that they are all different provides the sophisticated observer with additional information.  For example, if the mean, median, and mode are the same, you are most likely looking at a symmetric distribution. If the mean is above the median, then you probably have an outlier to the right which would give you a long upper tail that would translate into a skew to the right in the distribution. This is what we would be likely to see in income statistics if there were a few very wealthy people in the sample.  If the mean tends to be below the median, then there is a long lower tail - what you would expect to see in a distribution of grades when there was someone with a very low grade.  Now it is time to move on to a discussion of variability.