Introduction to Data Analysis

In this unit we will examine the importance of data analysis including the ability to move between graphs and words, and then back again.   Successful professionals need the ability to move effortlessly between words and graphs - and so do economics students - so we will spend some time looking here at the basics of some graphs.  

An Example of Data Analysis 

The best way to know what you can expect in the course is to look at an example of the type of work you could be expected to do when you emerge from the course. So, if you are up to it, why not work your way through the evaluation of the President of a leading university-an exercise in which we employ a number of the skills that you will develop in the course.

Slippery Slope University hired their president six years ago to help improve the revenue side of the University's finances. The president's track record is presented below:

Revenue ($millions)

1991

100

1992

90

1993

92

1994

95

1995

98

1996

101

Would you vote for an extension of the president's contract? Has this person succeeded?

This is a surprisingly difficult question given the little bit of data that we had, but let's give it a shot.  On the surface, it would appear that he is vulnerable to the critics who point out that in the six years since he took office, revenues have increased a paltry 1% (101 from 100). The president points out, however, that he inherited the problems of the previous president and that he has been able to turn things around and that since 1992, his first full year, revenue has actually increased 12 percent (101 from 90).

wpe2.jpg (19604 bytes)

As you can see, one's interpretation of the president's performance is dependent upon the time-frame one uses.


But we are not done yet. Let's assume that the president gets his way and that we decide to use his time frame.  Has he done a good job? Are you ready to extend his contract?  Before you get out your pen, consider the fact that during this period prices were rising in the economy (yes, there was a little inflation - but more about that later).  Does this matter? Would the 12 percent revenue increase look different to you if you knew that during this time prices rose 20 percent?  It should, and let's see how.

To incorporate into your analysis the effect of price inflation, you need to get information on the price level. In the third column I have included information on the price level, the Consumer Price Index (CPI) that you hear people talk about every month.

  Revenue Price Index
1991 100 136.0
1992 90 140.3
1993 92 144.5
1994 95 148.2
1995 98 152.4
1996 101 156.95

With these data, and using the appropriate formula, we can create a new concept called "inflation adjusted", or real revenues.  The "real" data, in 1996 prices, are included in the following table and graph.

  Revenue Price Index Real Revenue
1991 100 136.0 115.4
1992 90 140.3 100.7
1993 92 144.5 99.9
1994 95 148.2 100.6
1995 98 152.4 100.9
1996 101 156.95 101.0

What we see here is that real revenue in 1996 was virtually unchanged from what it had been in 1992, and substantially lower than in 1991. In 1996 the university finds itself in about the same position it was in 1991 since revenues increased at about the same rate as prices for this period. The president does not look too good now, but let's look a little further into this.

wpe3.jpg (15111 bytes)


Yes, it is true that inflation adjusted revenue has been flat, but maybe there were extenuating circumstances that we should consider. Should we consider the revenue situation at other comparable universities during this time period?

Let's do it. The table below has the revenue figures for Slippery Slope and the entire sample of comparable universities.

 

Revenue: Slippery Slope

Revenue: Total
1991 100 1390
1992 90 1300
1993 92 1310
1994 95 1320
1995 98 1330
1996 101 1340

With this new information, what can we say about the president's relative performance? There are two techniques which we might use to answer this question. One would be to look at the ratio of Slippery Slope to the revenue for the entire group of universities. If we adopt the president's view that we should begin our analysis in 1992, he has been successful in increasing Slippery Slope's share of total revenue from 6.9 to 7.5 percent because revenue has increased faster at Slippery Slope (12 percent) than for the entire group (3 percent).

 

Revenue: Slippery Slope

Revenue: Total

Slippery Slope Share

1991 100 1390 7.2%
1992 90 1300 6.9%
1993 92 1310 7.0%
1994 95 1320 7.2%
1995 98 1330 7.4%
1996 101 1340 7.5%

For those who like visuals, here is the graph of Slippery Slopes's relative performance. Given that we are graphing a ratio (SS/Total), when this falls it means that Slippery Slope is not doing as well as the total and when it has a positive slope, Slippery Slope's revenue is growing faster than the total.

wpe6.jpg (13942 bytes)

 

Another way to look at it would be with the use of an index number. Below you will see a table containing the original plus the new indexes for Slippery Slope and the Total. The indexes were constructed by dividing each of the original columns by the corresponding 1992 figure.

 

Revenue: Slippery Slope

Revenue: Total

Index: Slippery Slope

Index: Total
1992 90 1300 1.00 1.00
1993 92 1310 1.02 1.01
1994 95 1320 1.06 1.02
1995 98 1330 1.09 1.02
1996 101 1340 1.12 1.03

These data tell us that from 1992 to 1996, revenue at Slippery Slope increased 12 percent [we take away 1 from the index 1.12 and have .12 which equals 12 percent] while revenue for the entire group of universities increased 3 percent [we take away 1 from the index 1.03 and have .03 which equals 3 percent]. The graphical representation of these two are presented below.

wpe4.jpg (15464 bytes)

What's the verdict? As I indicated at the outset, this was not going to be easy. The bad news for the prez is that revenues have gone up, but not after we account for inflation. The university is about where it was in 1992 in terms of the buying power of its revenue. The good news is that the Prez's university seems to have done better than the other universities. The review is mixed-and the decision will be a difficult one-but I hope that you can see how a little data analysis can go a long way toward helping us make a more informed decision.

An Introduction to Graphs 

You have certainly heard the expression, a picture is worth a thousand words. While this may be an overstatement, it is often very useful to describe a relationship between phenomena in visual form. In this course we will be using a good deal of graphs and tables, so let's be sure we have the basics set in the beginning.  In the unit we will begin with a little history of the evolution of graphs and then we will look into the question: When do we use tables and when do we use graphs.   We will also examine briefly a few of the peculiarities of certain graphs - the Pie, Bar, Scatter, Line, and Time-series.  

A Brief History of Graphs

For those interested in a broader treatment of the issue of presenting information visually, I would suggest three books by Edward Tufte: The Visual Display of Quantitative Information, Envisioning Information, and Visual Explanations. Tufte's message is clear: text, tables, and graphics should be viewed as alternatives - each with its strengths and weaknesses which is why they continue to coexist. The pages of his books are filled with a number of fascinating examples of success and failure in presenting information visually. Three of my favorites, which will be discussed in class, are the tabular representation of information presented in the John Gotti trial, the graphical representation of data on the London cholera epidemic of 1854 developed by John Snow, and the presentation of the data on O-ring failures that provides some insight into the space shuttle Challenger disaster of 1986.

One of the most popular techniques for visually displaying quantitative information is the graph, but you know that if you have opened an introductory economics textbook or read the financial sections of newspapers or magazines. Unfortunately, experience has clearly indicated that for most people there is a considerable amount of information lost in the translation between words and graphs. Stated somewhat differently, it has become painfully clear that graphs are often misunderstood by students. On the input side, graphs often get in the way of students' learning of economics rather than aiding in their learning, and on the output side, graphs seldom add to the quality of student presentations / writing.

And this is not peculiar to Economics. Richard Bowen, a Psychology researcher who has seen much the same thing in his discipline, has written an interesting little book, Graph It! How to Make, Read, and Interpret Graphs. In his book he talks of Graphicacy, graph literacy, as the goal for his readers. He recognizes, however, that to achieve this goal he will need to help many overcome Graphobia, the fear of graphs, and motivate others to make the investment in developing the skills necessary to create and interpret graphs. Fortunately, these skills will pay off well beyond any one college course since more as more data is being presented in quantitative form, and very often the data is presented graphically.

The difficulty many people have with graphs is not surprising once you realize how recently we 'discovered' graphs. Until the 1800s graphical design was dependent upon a direct analogy to the physical world. The first graphics were maps. When you look at a sheet of paper, it is fairly easy to make the transition from movements right-left and up-down to physical movements north-south and east-west where the grid lines substitute for latitude and longitude. We tend to order things spatially and thus it was a natural to develop maps that mimicked this order. For example, look at the map of the URI campus at Kingston. Once you get yourself orientated to the north, if you turn to the right you will be able to look at the map and see what you will be looking at. If you are looking North while standing in Chafee and then turn to the right you will see Woodward and Tyler Hall, which is just what you see in the map. You will also see that Tyler is about twice as far from Chafee as Woodward so not only is direction retained, so is concept of distance (URI map).

Similarly, if you have a map of the Kingston area surrounding URI (the star), you will find Wickford is to the northeast of campus, and Newport is approximately twice as far in the easterly direction. If you are looking for some mapping programs, you may want to try the US Census and Mapquest.

When did we begin to see high quality maps? One of the earliest maps was produced in China in 1137 - nearly 400 years before comparable maps were produced in Western Europe. It was not until the late 1600's, however, that we began to see the emergence of data maps - a map to which  data is added. The reason it took so long may be that it required a combination of cartographic and statistical tools. An example of a data map that is available electronically today appears below. What is missing is the legend, but we can see there is 'something' that sets apart the Northeast, the area extending from Virginia to New Hampshire, and California, as well as the lower Mississippi delta region (Arkansas, Mississippi, and Louisiana).

Median Family Income: by State

Eventually we were able to extend the reach of our visual representation of information beyond space to time and by 1786 we saw our first time-series graph in The Commercial and Political Atlas, by William Playfair. Now we could visualize the passage of time with a graph, just as we had been able to envision the passage of space.  Just as we could say the movement from Kingston to Newport was twice the move from Kingston to Wickford, now we can say the movement from 1970 to 1980 was half the distance of the move from 1970 to 1990. Our ability at ordering time allowed us to translate time-series relationships fairly easily into time-series graphs that allowed us to 'see' the relationship between interest rates and time. The reader can easily see interest rates peaked in 1980 and have been following a cyclical pattern downward since then.

wpe3.jpg (14654 bytes)

Occasionally, we can combine space and time in one graph. One very impressive example would be Charles Minard's graphical depiction of Napoleon's march into Russia.

The final advance in the graphical display of information was to move beyond space and time to relational graphics. Again it was Playfair who "broke free of the analogies to the physical world and drew graphics as designs-in-themselves." The implication was that 'any variable quantity could be placed in relationship to any other variable quantity, measured for the same units of observation."

The examples of these relational graphs from your economics books are numerous, although as Tufte pointed out, these graphs are not too frequent in the popular press. When he examined 15 news publications for the years 1974-1980, it was in Japan and Germany where we saw highest use of relational graphics, but even here the number of statistical graphics based on more than one variable ranged between 5 and 10 percent. One of my favorites, The Economist, had only 2 percent while the New York Times and Time had .5 percent and 0 percent of their graphs that were relational.

In a similar review of college and high school textbooks, Tufte found relational graphics to be significantly more common in a number of disciplines-ranging up to 77 percent in the high school text, Chemical Principles by William Masterton and Emil Slowinski to 82 percent in the college text, Statistics: A Guide to the Unknown by Judith Tanurnum. Among the three economics texts reviewed, relational graphics accounted for 16 percent of the statistical graphics in the classic college text by Samuelson, about midway between what he found in two high school texts.

It's now time to try your hand at data graphics that Tufte describes as visually displaying "quantities by means of the combined use of points, lines, a coordinate system, numbers, symbols, words, shading, and color." As it turns out, it is a good thing the creator of the graph has so many dimensions to use since in the majority of cases  we are interested in representing is multidimensional relationships.  Reality is complex and it is a challenge to represent multivariate relationships on sheets of paper - what Tufte calls the flatlands.

When you see a graph, you should think that behind this graph is a table of numbers and the creator of the graph was attempting to make it easier for the reader to see the story / pattern that existed in the data. As Bowen indicates, "Graphs are intended to make it easy to read, understand, and remember a relationship found in a set of data." To get there, however, we need to look a little more closely at the mechanics of the various graphs. You should review the various types of graphs you might create or be asked to interpret, and be sure to pay special attention to the Scatter and Line Graphs.

Graphs vs. Tables

When do we use a graph and how do we create them? The short answers are sparingly and with care, but let's look at them a bit more carefully. On the when issue, I suspect your experiences have been similar to mine - you have seen some impressive graphs that immediately invoke an image, and you have seen some losers that convey virtually no information. My experience has convinced me of the importance of graphs, and of the ease with which they can be abused. As for the how, you now have at your disposal a number of software packages that will allow you to produce some impressive visuals.

As a starter, we can look at texts, tables and graphs as alternative means of presenting information and you must decide which to use. This is not something we can easily capture in some check list, but there are some guidelines. First, a graph often allows us to present more information than a table, so the amount of information we want to convey will matter. Consider the stock market and the price of stocks. When the stock market is booming as in 1996, everyone seems to be watching stock prices. If you are interested in following your stock, you can access the following information on American Power Conversion, a successful local company whose stock is traded on NASDAQ.

First let's look at a common tabular presentation that conveys information on the volume of shares traded that day (801,600), the price of the stock at the Close of trading (2311/16), the High and Low prices reached during the day (237/8 to 227/8) and the Net Change from the beginning to end of day (+11/16). You also know that over the past year the price has ranged from a High of 31.5 to a Low of 8.5. Taken together we have 8 pieces of information that are conveyed in the table.

52 Weeks     Vol        
Hi Low Symbol 100s High Low Close Net Change
31.5 8.5 APCC 8016 237/8 227/8 2311/16 +11/16

Now turn your attention to the following time-series graphs of APC's stock price and the volume of shares traded for the past 200 days. In these graphs there are 400 pieces of information presented, far more than you would be able to present in a table.  When we want to convey a large volume of data, it is most likely that we will want to use a graph.

If you look at the graph you will note there is something important missing - the actual stock price. While the graph allows us to look quickly for a pattern in the past prices of APC's stock, it does not allow us to quickly determine the price yesterday.

This is another of the important differences between graphs and tables - when you need a few precise numbers, a table will probably serve the purpose, but when we want to demonstrate a relationship between two variables, we would want to use a graph. In the graphs above we are able to see quickly the relationship between stock prices and time, volume and time, and with a little effort, the relationship between volume and price. It takes only seconds to realize APC's price has been falling for most of 1997 after more than doubling during the last four months of 1996. As for volume, there were three episodes of unusually high volume, and it would look as though sharp price changes accompanied the increased activity levels.

As a second example, consider the following table and graphs based on example in Tufte's book. I would suggest you give yourself 30 seconds to review the table and the graphs and compare the information that you extracted from the two.  I suspect that you will find that the graphs help you get a better "feel" for the relationships that are embedded in the tables.

The Table

X

Y

X

Y

X

Y

X

Y

4

8.05

4

5

4

4.32

8

3

5

7.93

5

5.5

5

4.38

19

4

6

7.38

6

6

6

4.43

8

5

7

7.86

7

6.5

7

4.48

8

6

8

10.63

8

7

8

4.52

8

7

9

8.74

9

7.5

9

4.55

8

8

10

11.15

10

8

10

4.58

8

9

11

11.52

11

8.5

11

4.62

8

10

12

11.64

12

9

12

4.64

8

11

13

11.09

13

9.5

13

4.67

8

12

14

10.17

14

10

14

4.70

8

13

15

12.13

15

10.5

15

4.72

8

14

 

Now let's move on to a discussion of individual graph types. Once you have decided to use a graph, you must then decide what type of graph to use. It should not surprise you that technology will allow you to produce almost any type of graph, you must decide which one to use, which one does a best job of conveying the information that you would like the viewer to know.   In this course the two most important graph types are the line graph and the scatter diagram so you should be sure to review these two sections carefully.