Binomial, Normal and Sampling Distributions
Introduction
Not all frequency / probability distributions are created equal. There are some "special" distributions that we can use to help answer a wide array of probability questions. For example, you may recall that the probability tree used to calculate the probability that three children would be girls looked exactly like the one used to calculate the probability of three heads when you flip a coin three times. In this unit we will examine two "special" distributions. The first will be a discrete distribution (countable numbers) known as the binomial distribution and the second is a continuous distribution (any values) known as the normal distribution.
Binomial distribution
One of the first distributions that you would run into in a statistics course is the binomial distribution that helps in any of those coin flipping or family composition problems. We will The binomial distribution is a valid under very restrictive assumptions.
There is a formula for the probability distribution, and if you are interested in exploring it, then you should check out one of the on-line sites or a text. Here we will use a simple example to demonstrate the use of the formula. The probability of r successes in n trials would be:

The only term that needs any explanation should
be
.
This term is defined as the the ratio of factorials
. For example, if we wanted the probability
of three heads (r) when we tossed a coin five times (n), then this term would be
.
Now let's use the formula to solve a simple problem. Assume that we are concerned with defects on an assembly line and that there is reason to believe that 10 percent of the items are defective. What is the probability that when we choose 5 items, that one will be defective? To obtain the answer we need to plug the values, for p (.1), n (5), and r (1) into the equation. The solution is .328. The probability that one item will be defective will be 32.8 percent.
![]()
There is another way to calculate the probability in this problem. In fact there are three ways. One would be to use on-line calculators that may be found at the UCLA site. You would want to use the PDF (Probability Distribution Function) for the binomial distribution that would give you equivalent number to the formula. A second would be to use excel that has the binomial distribution preprogrammed in the list of function. The function is BINOMDIST.
A third would be to use the binomial table that you will find at the end of most statistics texts. These tables are based on the formula, except they are designed to answer a somewhat different problem. Here we asked, what was the probability of r successes in n tries? In the tables you will find the answers to the questions: what is the probability of at least r successes in n tries? To calculate the probability of r successes, you would need to subtract two table entries from each other - the probability of r successes from the probability of (r-1) successes.
To see how the table works, let's return to our example above. According to the table, the probability of at least one success when n = 5 and p = .1 is is .9185 while the probability of no successes is .5905. The difference between the two would be the (.9185 - .5905 = .328 = 32.8 percent, exactly the number that was obtained using the formula. If you also used the BINOMDIST function and you will obtain the same result. For an example you can check out the Probability Distributions example. On sheets 6a and 6b you will find the answers to these questions from the statistics assignment for Part 3. You should be able to look at the formulas in the cells D3, D6, and D9 and determine how the function works. The one thing to be careful about is the last argument in the function. The 0 means the formula gives you the cumulative distribution. In cell D11 you have the cumulative distribution - which gives you the probability of getting any number of girls. Cell D12 gives you the odds of getting at most 1 boy.
To convince yourself that you have it, let's look at the odds of a family of five with no boys. If we look at a success as a girl, then we plug in the following values into the table [n = 5, r = 0, and p = .5]. The table entry for r = 0 is .9688 and the entry for r = 5 would be 1.00 since this would include all of the possibilities. If we subtracted the probability of four or fewer from the total probability we would have the probability of five boys and this would equal .0312. This number could also be obtained by noting that when r = 0, the probability equals .0313. Since the probability is .5, the probability of five successes is the same as the probability of zero losses.
As for the features of the distribution, the expected value, variance, and standard deviation are given by the following formulas.
Mean, Variance, and Standard Deviation of Binomial Distribution
Mean |
E(r) = p*n |
Variance |
|
Standard Deviation |
|
To convince yourself that you have mastered the binomial distribution, let's look at one last problem. Assume that every shot a basketball player takes has a 50% chance of being made. What is the probability that a player will make at least 10 out of 15 shots? What is the expected value and the standard deviation in the distribution of made shots?
Because this is a problem stated in terms of "at least," the solution will best be found in the binomial table. We will use the n =15, r = 9, and p = .5 and the entry in the table is .8491. This is the probability that there will be at least 9 shots made. The probability that 10 or more shots are made is simply 1 - .8491 = .1509. The probability of at least 10 shots is 15 percent. To calculate the mean and standard deviation, we simply need to plug the numbers into the formulas.
Mean =
Variance =
Standard deviation =
We are now ready to move onto the normal distribution that will prove to be central to all of our discussion of inferential statistics.
Normal distribution
How many of you have been graded on the curve? It is likely that most of you have been graded on that bell-shaped curve at least somewhere in your academic career It turns out that the basis of your grade is simply a very popular probability distribution for continuous variables. The reason that we would use a continuous probability distribution is that the grades can take on any value within a certain range. The distribution that we are talking about is the normal distribution, what you may have heard referred to as the bell-shaped curve.
An example of the normal distribution appears below where the histogram of the grades for 1,000 students on a test with 130 possible points has been approximated by the normal distribution curve. The blue histogram is the actual distribution of the grades while the red line is a normal distribution - the "bell-shaped" curve on which you have probably been graded. Although the fit is not perfect, there is a definite similarity between the two and the normal curve can be used as an approximation for the actual curve.

Because the normal distribution is a continuous function, it can be represented by an equation. The relative frequency / probability density equation appears below. This equation gives you the probability of obtaining a specific value (X) when the distribution has a mean of µ and a standard deviation of s.

The good news is that you need to know nothing about the underlying mathematical structure of the equation at this point. In fact I can assure you that you will never use it unless you find yourself in a mathematically oriented statistics course. The better news is that the entire distribution can be captured by only two parameters - the mean (µ) and the standard deviation (s). If you look at the equation above, these are the only two unknowns. Once you decide on the value for x, you can solve the value for the frequency of x. What this means is that once you have the mean and standard deviation of the distribution, you can plot out the entire curve.
For example, if you looked at the distribution of actual grades, there is simply too much variability in the grades to make it possible to reproduce all of the grades with information on just the mean and standard deviation. If, however, the distribution were normal, then we could reproduce the entire distribution with just two numbers.
The even better news is that there are now some on-line sites that will allow you to harness the power of the computer to save you considerable time and effort doing computations. One of the on-line sites is at the UCLA site. You can also find a calculator at the end of the Introductory Statistics: Concepts, Models, and Applications by David W. Stockburger site. Now let's look at what we can do with the normal distribution. Before you can use these calculators, however, we need to look a bit more closely at the normal distribution.
Properties
As indicated above, the normal distribution can be completely described by just two statistics - the mean and standard deviation. The significance of this can be best seen with a simple grade example. Let's assume that grades in introductory economics courses were distributed normally with a mean of 75 and a standard deviation of 10. With this information you could answer three very important questions: What is the chance of receiving at least a grade of 95 in introductory economics? How many students will receive grades between 65 and 85? and What grade do you need to get so that only 5 percent of the students do better than you?
The picture of the situation appears in the graph below of a normal distribution where the mean (µ) would be75 and the standard deviation (s) would be 10. To the right and left of the mean there are vertical lines drawn ten grade points (one standard deviation) away from the mean.
Normal Distribution of Grades

If that were the end of the story, it would be a short story. But it is not the end, rather it is only the beginning. When we are dealing with a continuous probability distribution, the area under a frequency / density curve represents relative frequency so that any question about relative frequency can be answered by calculating areas under the curve. What we know about the normal distribution is that approximately 68% of the observations are within one standard deviation of the mean, 95% are within two standard deviations, and 99.7% within 3 standard deviations.
To see this in action, let's return to our example of the grades. If we could have assumed that the scores were normally distributed with the mean and standard deviation of 70 and 10, then then we could confidently answer the above questions with the help of a simple on-line program. You could also determine the answers using the excel function NORMDIST. For an example you can check out the Probability Distributions example. On sheets 7 you will find the answers to question7 from the statistics assignment for Part 3. All you need to do is specify the mean, standard deviation, and the value that you are looking at. For example, on cell B* you select Function on the Insert Menu. From the functions you choose Statistical and select NORMDIST. The dialogue box will ask you to input the information. The information in this cell =NORMDIST(105,150,30,TRUE). This tells it that we want to calculate the probability of getting less than a 105 in a distribution with a mean of 150 and a standard deviation of 30.
On the sheet norm grades, NORMDIST has been used to generate a normal distribution of grades. In cells B2 to B27 I have input the possible grades on my test. The formula in cell C2 =NORMDIST(B2,75,10,FALSE). The False argument means this will give the height of the distribution at a grade of 50 (B2) if the grades are distributed with a mean of 75 and a standard deviation of 10. The formula in cell D2 = NORMDIST(B2,75,6,FALSE). This repeats the same task except here the standard deviation is 6. The formula in cell E2 =NORMDIST(B2,65,6,FALSE). This repeats the same task except here the standard deviation is 6 and the mean is 65. You will also find on that sheet the graphs of the distribution.
At the heart of this program is the realization that there are four unknowns / variables in all of the problems below and that once you supply values for three of the unknowns, then the fourth can be solved for. [Note: this on-line program for the normal distribution is part of a larger package that includes a number of other distributions]. You could also go to the back of a statistics book and find the standardized normal distribution, but we will look at that in the next section.
What is the chance of receiving an average of 95 in introductory economics? If you plug these data into the on-line calculator you will find that the probability is 0.977250 which means that 97.7 percent of the students will receive less than a 95.
How many students will receive grades between 65 and 85? To answer this we will actually do the computation twice - once for the number of students who get below a 65 (0.158655) and a second time for those who get below an 85 (0.84134 ). Those that get grades between 55 and 75 would be the difference between these two numbers (approximately .68). Just about two-thirds of the students would get grades in this range.
What grade do you need to get so that only 5 percent of the students do better than you? Here the unknown is the value for the grade and the known is the probability. We want to find the grade such that 95 percent of the students score lower than that grade. The answer is a score of 91.45.
But what if you do not have one of these spiffy calculators? All normal distributions have at their core the same structure, although they may have very different means and variances, which means that we can reduce them to that core. As for shape, all of the normal distributions share the same general shape that appears below.
Normal Distribution

Below we have some examples of comparisons of normal distributions where there are differences in the mean and the variance. Distributions A and B represent two distributions with different means and equal variability. It is equally likely that we will be one standard deviation above the mean in both distributions, but we can expect the numbers to be higher in the situation described by distribution B. Distributions C and B represent two distributions with different variability and equal means. In this case we are likely to get a greater spread of values for the distribution C with the larger standard deviation.
Normal Distributions: Differences in Mean and Variance

The Standard Normal
As if life were not easy yet, we can even make it easier if we recognize that all of the normal distributions have a common core and that we can reduce all normal distributions to that core. In order to obtain the core normal distribution, it is necessary to express any value of X in terms of the number of standard deviation units (s) it is away from the mean (µ). The standardized variable Z is defined as:
Z = (X - µ)/s.
For example, in our grade problem the standardized normal variable would be Z = (x-70)/10. If we plug in the value of 80, then the value of Z equals (80-70)/10 = 10/10 = 1. The interpretation of this number is that 80 is one standard deviation above the mean. To convince yourself that you have mastered this, you should prove that a 90 is two standard deviations above the mean so that Z = 2 and that 60 is one standard deviation below the mean so that Z = -1.
Once we have the variable Z, the resulting core distribution for Z has a mean of 0 and a variance of 1 and is shown below. As we saw earlier, 68 percent of all observations fall within one standard deviation of the mean and 95 percent within two standard deviations. The exact numbers can be found on any normal distribution table that appears in the back of most statistics books.

You can also use excel to solve your problem. To see how you would use it, let us look at the standardized normal curves below where Z represents the number of standard deviations away from the mean. You will find at least one of these at the top of the normal distribution table. In the left-hand diagram, you will find an answer to the question: what is the chance that we will find a number larger than Z standard deviations above the mean? The third diagram provides us with the chance that we will find a number less than Z standard deviations below the mean. In the middle we have a graphic describing the chance that we will find a number within Z standard deviations of the mean. The on-line program that we used in the previous section was based on the right-side diagram where once you put in the value for the variable, it gave you the percentage of the area below that number.

For the on-line calculator, you input a value x and the calculator then computes the probability mass to the left of x. This is equivalent to the right-side diagram. If you want to know how many grades are better than 1.2 standard deviations above the mean, you would simply type in 1.2 and then subtract this from 1. If you wanted how many received grades 1.5 deviations below the mean, you would type in -1.5. If you wanted to know the chances of being within 1.6 standard deviations of each other you would type in 1.6 and the subtract from it the solution to -1.6.
If you were to use excel, you could calculate the value of Z by choosing the STANDARDIZE function.
If you would like a little more background on the normal distribution, you can try the Mathematics 220DX Statistics at the New Hampshire College site and the DAU site.
To convince yourself that you have mastered the normal distribution, let's look at one more problem. Let's assume that the number of phone calls arriving at a hot line at any one time are normally distributed with a mean of 8 calls and a standard deviation of 2 calls. If you install twelve phone lines, what is the probability that someone will get a busy signal? What will happen to the probability of a busy signal if the number of phone lines is reduced to 9 as an economy move?
If you are going to use the normal distribution tables at the end of statistics books the first place to start is the conversion of the problem into a standardized normal variable. In this example Z = (X - 8)/2. If we are interested in the chance that X>12, then we plug the 12 into the formula to get Z = (12-8)/2 = 4/2 = 2. The question about phone calls can thus be reduced to a question of, how often are you likely to find yourself more than two standard deviations above the mean. Looking up in the normal distribution table you will find that the answer is .0287. The odds are nearly 3 in 100 that there will be a busy signal.
If the number of phones is reduced to 9, then the Z variable is Z = (9-8)/2 = 1/2 = .5. The problem is one of determining the probability of getting a number above 1/2 standard deviations above the mean. The answer according to the normal distribution table would be .3085. The odds are nearly 31 in 100 that there will be a busy signal.
Sampling distributions
How did URI students feel about the racism/first amendment issue in the Fall 98 semester? How did Americans feel about the impeachment of president Clinton? Are Rhode Islanders supportive of the development of the Quonset Point? The answer to each of these questions is not going to be determined based on a survey of the 12,000 URI students, the one million RI residents or the 270+ million Americans. The answers to these questions will be based on the findings of a sample of the underlying populations. At the end of the overview we mentioned the relationship between the population and the sample and the need to take great care in selecting the sample.
One of the goals of the sampling procedure is to reveal certain properties of the population distribution that would be captured by some parameters. For example, we may want to use the sample to learn something about the mean or variance of the distribution. The procedure would be to select a sample and calculate a statistic that would be used to make some inferences about the population's mean or variance. In this section we will be discussing the probability distributions of these sample statistics to arrive at a distribution for the sample mean.
For a more concrete example, let's return to our rolls of the dice. You will recall that when you rolled the dice twelve times you each recorded the mean and variance and we compared it to the population mean of 3.5 that we happened to know. The numbers all tended to center around 3.5 which is what we would want if the sample was to provide insight into the population. In fact we could combine all of the means that the class generated and we would have a distribution of means and we could calculate the mean and variance of the sample of means. This is what sampling distributions are all about, although generally there is no information on the population mean or variance.
What can we say about the distribution of the sample mean? We will look at two properties of the sample mean - its mean and standard deviation. The formulas below are technically appropriate when the population is infinite, but they are very close approximations when the population is large. The adjustment for the variance when the sample is finite is simply the infinite variance times (N-n)/(n-1). When the sample size (n) is small or the population size (N) is large, then this number approaches 1 and the two variances are the same. The expected value of the sample of sample means is the mean while the variance of the means equals the variance of the underlying variable divided by the number of samples.
Let's see how this works by returning to our dice rolling example. When we roll the two dice there is the following probability distribution with a mean of 7, a variance of 6, and a standard deviation of 2.45.

If we took a sample of 20, then the mean, variance, and standard deviations of the sample mean would be calculated using the formulas above. The mean value for the sample mean would be 7, the variance of the sample mean would be 6/20 = .3 and the standard deviation of the sample mean would be .548.
What about the situation when the underlying population distribution is normal? In this case the following is true.
. As an example, consider the performance of students in ECN201. There are 1000 students that take the course and the average grade is 70 with a standard deviation of 10. Assume that you take a sample of 10 student grades, what is the probability that the average grade for the sample will be above 75?
As we did with the earlier discussion of the normal distribution, we will begin with the conversion of the problem to a standardized form. The standardized form is Z = (75-70)/(10/Sqrt(10)) = 5/(10/3.162) = 1.58. Using the normal distribution table, we find that the chance of a value >75 would be the same as the chance of being 1.58 standard deviations above the mean which would be approximately .0571. There would be a 5.7 percent chance of a number this large. If the sample size was increased to 16, then we would have Z = (75-70)/(10/Sqrt(16)) = 5/(10/4) = 2. The chance that we would get a mean above 75 would be 2.28 percent.
It turns out that these properties of the normal distribution have considerable value even in situations where the distribution is not normal. The key concept here is the Central Limit Theorem that basically states that the distribution of standardized variable for any distribution is normally distributed.
Central Limit Theorem: As the sample size increases, the sampling distribution of the mean can be approximated by the normal distribution..
Let's try one last example with a little twist. A pollster will be making election predictions based on a sample of 100 voters. The true percentage of the population that will vote for the reform candidate is 52.5%. What is the probability that the winner will be announced early if an early announcement will happen only if a candidate receives at least 55 percent of the vote among the 100 voters sampled? How will the odds change if the true vote is 60%?
The first step is to realize that we are talking about a binomial distribution - a categorical yes / no variable. The formula for the mean and variance are n*p and n*p*(1-p), which in this problem become 100*.525 = 52.5 and 100*.525*.475 = 24.9375. Plugging this information into the normalized variable formula you get Z value for the sampling mean = (55 - 52.5)/(24.9375/10) = (2.5)/2.49375 = approximately 1. The value for Z = 1 from the normal table tells us that a value of 55 will be obtained 15.87 percent of the time.
If the true vote is 60%, then the calculations are redone and you get Z = (55 - 60)/(24.9375/10) = (-5)/2.49375 = -2. The probability of a Z value greater than two standard deviations below the mean is 2.28%. In this case the winner would be announced early nearly 98 percent of the time.
Now we are ready to move onto inferential statistics, to estimation, confidence intervals, and hypothesis testing.