Probability
Introduction
What are the odds of winning at the black jack table at Foxwoods? What are the odds of getting an A if someone grades on the bell curve? What the expected payoff of playing the lottery? What are the odds of dying in a plane crash? of winning the lottery? of being struck by lightening? of winning at the blackjack table? of getting a heads when you flip a coin? of getting a six when you roll a die? To answer any of these questions you need to understand some of the basics of probability theory that will allow us to make some sense of the world of uncertainty.
This is why almost all statistics courses begin with a treatment of probability theory. Here we are going to briefly discuss a few of the basic probability concepts. For those interested in a fuller treatment of the subject, you should certainly refer to one of the many statistics texts or sign up for a statistics course.
For the purposes here, you can think of probability as being associated with some outcome - it provides one with a measure of the likelihood of the outcome. If we played the lottery a large number of times, what is the probability of winning? If I tossed a coin many times, what is the probability of getting a six? In both cases the probability of the outcome is the long-run relative frequency of the outcome.
Fortunately, there are some insights into probability that will be possible without excessive investment or mathematical sophistication. We will, however, need to settle on some terminology and map out the probability terrain.
The outline of much of our work appears in the diagram below. We will begin with a set of rules that must be obeyed in probability, rules that are similar to what we have in algebra that define acceptable mathematical relationships. For those visually oriented readers, the tree diagrams should prove quite useful.
Sometimes, what we are examining falls into a framework that can be captured by a particular probability distribution. These distributions exist for both discrete and continuous phenomena and we will be interested in the distributions' mean and variance. We will spend our time with one discrete distribution, the binomial, and one continuous distribution, the normal.

To help us navigate the theory of probability we will look at two examples that you can easily duplicate. The first is estimation of the probability of getting a twelve when rolling two dice. The event is getting a 12. The sample space is presented below and contains 36 possible outcomes. Across the top we have the possible outcomes on the first die and down the side we have the outcomes from the second die. The sum of the two dice give us a number and the only possibility of getting a twelve would be the bottom right cell.
Sample Space: Roll Two Dice
First |
|||||||
| 1 | 2 | 3 | 4 | 5 | 6 | ||
| 1 | 1,1 |
2,1 |
3,1 |
4,1 |
5,1 |
6,1 |
|
| 2 | 1,2 |
2,2 |
3,2 |
4,2 |
5,2 |
6,2 |
|
Second |
3 | 1,3 |
2,3 |
3,3 |
4,3 |
5,3 |
6,3 |
| 4 | 1,4 |
2,4 |
3,4 |
4,4 |
5,4 |
6,4 | |
| 5 | 1,5 |
2,5 |
3,5 |
4,5 |
5,5 | 6,5 |
|
| 6 | 1,6 |
2,6 |
3,6 |
4,6 | 5,6 |
6,6 |
The second is the probability of getting an ace when you pick a card from a deck. The event would be picking an ace and this would happen if one of the four aces were chosen. The sample space would be the 52 possibilities that appear in the table below.
Sample Space: Pick a Card
Hearts |
Diamonds |
Clubs |
Spades |
K |
K |
K |
K |
Q |
Q |
Q |
Q |
J |
J |
J |
J |
10 |
10 |
10 |
10 |
9 |
9 |
9 |
9 |
8 |
8 |
8 |
8 |
7 |
7 |
7 |
7 |
6 |
6 |
6 |
6 |
5 |
5 |
5 |
5 |
4 |
4 |
4 |
4 |
3 |
3 |
3 |
3 |
2 |
2 |
2 |
2 |
| A | A | A | A |
How does one arrive at the probabilities of a twelve or an ace? One method would be empirically oriented and involve examining real data. You would use this to estimate the extent of support for a political candidate and the share of the workforce working in manufacturing. In terms of our two examples, you could repeat a number of times the experiments of rolling the dice and picking a card from a full deck. If the deck of cards and the dice were not rigged, then you would expect to see that as the number of experiments increased, the percentage of experiments that were successful would converge to 1/36 and 4/52.
A second method would be theoretically oriented and would be based on prior knowledge of the process that generated the outcomes. Flipping a coin, playing the lottery, and picking a card would be examples of where a priori probabilities would be utilized. In the dice example, there would be 36 possible outcomes and 12 would be possible only by rolling two 6s. You can determine that by looking at the the sample space and only one of the thirty six possibilities will produce a 12. If you were interested in the probability of a ten, then there would be a number of ways of getting the 10. These possibilities appear in red in the table and you can predict that the probability of a 10 is 3/36 = 1/12.
Similarly, if you wanted to predict the probability of an ace, you would simple count up the possible successes (four green cells in the table) and divide them by the total number of possible outcomes in the sample space (52). The resulting probability would be 4/52 = 1/13 which is what you would expect to see emerge from an empirically centered approach.
There is also a third approach, subjective probability, where the probabilities are assigned by an individual. In this case you would have two individuals looking at the same event and emerging with very different probabilities. At this time we will not explore the area of subjective probability.
Rules and properties of probability
Now that we have introduced the concept of probability, we can discuss some of the properties of probability and some of the rules. For notation we will denote the probability of an event A as P(A).
Given these properties, we can now move onto the rules of probability. In each instance we will use an example from the card and dice problems to highlight the rule.
Addition rule:
Cards: What is the probability of a heart or an ace? In this case there is some overlap. The probability of an ace would be 4/52 and the probability of a heart would be 13/52. There also is a probability of an ace and a heart which would be 1/52. The probability of an ace and a heart would be 4/52 + 13/52 - 1/52 = 16/52 = 4/13.
Dice: What is the probability of a 2 or a 12? The sample set consists of 36 outcomes and the P(2) = 1/36, the same as the P(12). The number cannot be a 2 and 12 simultaneously, so the probability is 2/36.
Conditional probability rule:
Cards: What is the probability it is a jack given that it is a face card? We begin by plugging into the formula P(J|FC) = P(J and FC)/P(FC). Because there are four jacks, queens, and kings, the probability of a face card is 12/52. Of these 12 possibilities, there are 4 jacks so the probability of a jack and a face card 4/52. Plugging into the formula we get that the probability of a jack given that it is a face card would be (4/52) / (12/52) = 4 / 12 = 1/3.
Dice: What is the probability that the number is ten given that it is a double digit number? The probability of a 10 is 3/36 since successes would be 5,5 and 6,4 and 4,6. The probability of a double digit number would be the sum of the probabilities of a 10, 11 and 12. These probabilities would be 3/36 + 2/36 (5,6 and 6,5) + 1/36 (6,6) = 6/36. The probability of a ten given it is a double digit number would be (3/36) / (12/36) = 3/12 = 1/4.
Multiplication rule
Cards: What is the probability that two cards drawn without replacement are two face cards. The probability that the first card is a face card is 12/52. Given that there are now 51 cards remaining and 11 of them are face cards, then the probability of a face card given that the first is a face card would be 11/51. Plugging into the formula we get P(two face cards) = 12/52*11/51= .23*.22 = .05.
Dice: what is the probability that you get two 6s when you throw the dice? The probability of the first six is 1/6. Given that you throw a six on the first die, what is the chance that you will throw a six on the second throw. The probability is still 1/6 so we can plug into the formula and we get P(two 6s) = 1/6*1/6 = 1/36. In this problem the conditional probability of the six was the same as the simple probability so we would say that the events are independent.
Before we leave the discussion of probability, we will look at a graphic representation of probability. These probability tree diagrams depict the events as branches of a tree. To make life easy we will look at the situation of flipping a coin. What is the probability of flipping three heads? The first flip gives us a fifty percent (.5) chance of a heads (H) and a tails (T). [the branches all have a .5 probability]. Regardless of the outcome, each second flip gives you the same possibilities so there are now four possibilities. The chance of two heads would be P(H) *P(H) = .5*.5 = .25. The third flip of the coin produces eight possible events in the sample space and to calculate the probability of any one event, you simply work your way out the probability tree. The probability of three heads would be the multiplication of the three branches along the top of the tree. P (3 heads) = .5*.5*.5 = .125. There is a twelve and one half percent chance of three heads when you flip the coin three times.
Probability Tree Diagram

Probability distributions and random variables
Often times the events that we discussed above can be measured by numbers. How many aces if you pick three cards from a deck? How many 10s will you roll when you roll a set of dice? A variable is a random variable if its value is determined by a chance mechanism. An example would be the number that you get when you roll a set of dice. The possible outcomes (events) are listed below along with the frequency of the successes and the specific successful events.
Sample Set
| Number on dice | Number of Successes |
Events |
2 |
1 |
1,1 |
3 |
2 |
1,2 2,1 |
4 |
3 |
1,3 2,2 3,1 |
5 |
4 |
1,4 2,3 3,2 4,1 |
6 |
5 |
1,5 2,4 3,3 4,2 5,1 |
7 |
6 |
1,6 2,5 3,4 4,3 5,2 6,1 |
8 |
5 |
2,6 3,5 4,4 5,3 6,2 |
9 |
4 |
3,6 4,5 5,4 6,3 |
10 |
3 |
4,6 5,5 6,4 |
11 |
2 |
5,6 6,5 |
12 |
1 |
6,6 |
It is also possible to look at a graphical representation of the distribution of the numbers. In the top graph we simply list the events on horizontal axis and then the number of success are measured on the vertical axis. In the second diagram we have the probability of each event and this would be called the probability distribution or the relative frequency distribution. We can read from the graphs the same information that appears in the table, that six of the possible outcomes would be a six and that the probability of a six is 16% (1/6).


These diagrams should look familiar to those who arrive here from the section on descriptive statistics where we looked at the distribution of grades. As we did earlier, we will be interested in some statistics that give us a sense of the distribution - its central tendency and its variance. The expected value of the random variable is the equivalent of the mean. The formula would be the sum of all possible values of the random variable (r) multiplied by the probability of that number.
Expected value = u = S r*P(r)
In the example of the dice it the expected value would be = 1/36*2 + 2/36*3 + 3/36*4 + 4/36*5 + 5/36*6 + 6/36*7 +5/36*8 + 4/36*9 + 3/36*10 + 2/36*11 + 1/36*12 = 7.
The formula for the variance and standard deviation of the random variable are given below. They are very similar to the formulas for the descriptive statistics with the exception that now each squared difference is weighted by the probability of the event. The standard deviation is once again the square root of the variance
Variance = s2 = S (r-u)2*P(r)
Standard deviation = s = SQRT(S (r-u)2*P(r))
In the example of the dice it the variance would be:
= 1/36*(7-2)2 + 2/36*(7-3)2 + 3/36*(7-4)2 + 4/36*(7-5)2 + 5/36*(7-6)2 + 6/36*(7-7)2 +5/36*(7-8)2 + 4/36*(7-9)2 + 3/36*(7-10)2 + 2/36*(7-11)2 + 1/36*(7-12)2 = 6
= 1/36*25 + 2/36*16 + 3/36*9 + 4/36*4 + 5/36*1 + 6/36*0 +5/36*1 + 4/36*4 + 3/36*9 + 2/36*16 + 1/36*25 = 6.
The standard deviation is the square root of the variance and is 2.45.
These formulas work for all probability distributions and they will allow us to visualize the distribution of probabilities for each of the possible outcomes. Fortunately, sometimes the random variable can be represented by a specific probability distribution that we can use our knowledge of the specific probability distribution to assign probabilities to certain outcomes. Two of the more popular discrete probability distributions are the binomial and the Poisson distributions. When we are dealing with continuous variables, the most popular distribution is the normal distribution. We will look at the binomial and normal distributions in the next section. We will conclude our discussion of probability distributions with a discussion of sampling distributions that will form the bridge to inferential statistics and a treatment of estimation, confidence intervals, and hypothesis testing.