statistics

Statistics Assignment

For those questions with multiple parts you should use the following scheme to determine what part you need to do.

a

b

c

d

2 parts

1,2,3,4,5

6,7,8,9,0

3 parts

1,2,3,4

5, 6,7

8,9,0

4 parts

1,2,3

4,5,6

7,8

9,0

Part 1 (Introduction)

1. Indicate whether descriptive of inferential statistics would be used to accomplish the following tasks.

  1. To compare the efficiency of female workers with that of male workers.
  2. To correlate the scores of students in mathematics with their scores in English.
  3. To predict the lifetime earnings of college graduates.
  4. To estimate the birthrate of illegitimate children ten years from now.
  5. To contrast the longevity of marriages with the longevity of deep relationships.

2. The data below show the length of all babies in a particular hospital in May. The lengths were measured to the nearest inch.

19 21 20 21 20 23 18 21 21 24 19 21 20 19 22 21 17 22 20 20 19 23

For which of the following questions would you use descriptive/inferential statistics?

  1. Establish size measurements for later comparisons.
  2. Correlate body length with body weight.
  3. Make comparisons with diets followed by mothers.
  4. Predict the length of babies born in the future at that hospital.
  5. Predict the length of babies born in other comparable hospitals that same month.
  6. Establish manufacturing sizes for cribs or diapers for the same year.

3. Each of the following is a misuse of statistics. Explain why.

  1. The Marine Corps asked a group of drill sergeants to make a study of recruit training procedures. The complimentary report was used for recruiting purposes.
  2. A study of the financial success of college graduates was to be made by polling every 25th student on the list of graduates. After the study was begun, the members of social clubs and organizations were polled instead.
  3. A major oil company released a report on the favorable effect of the automobile industry on the environment.
  4. Four samples of college students had low reading scores. The next sample had above average reading scores which were then published.
  5. A study by a tobacco company showed that smokers and non-smokers had the same incidence of lung disease. The study was made of people living in a heavily industrialized town.

4. A professor asked each member of his class to interview a representative sample of the University's student body to determine attitudes toward construction of a new sports arena. The sampling methods of four students are given below. In each case, state a reason why the sample may not be representative of the population:

  1. Student A took 50 questionnaires into the student coffee shop. Some students griped at first, but every student asked filled out the questionnaire .
  2. Student B sent the questionnaire by mail to the official address of every student registered at RIU College. Fifteen percent of the students replied .
  3. Student C asked 50 friends to complete the questionnaire and each one did.
  4. Student D got an alphabetical list of every tenth student registered at RIU College. Most of them attended a meeting called for the purpose of filling out the questionnaire and, although there was a lot of conversation about some of the questions, almost every student at the meeting completed a questionnaire.

5. Assume that a survey is taken of ECN201 students to obtain the following information.

Please identify the variables as categorical or numerical and for the numerical variables, whether they are discrete or continuous. 

Part 2 (Descriptive statistics)

1. Now for a little fun. You should take those dice and roll them twelve times and record the results (the numbers that you get).  Once you have done this, you should calculate the mean, variance, and standard deviation.  You should also construct a histogram.  Once you have completed the work, you are to roll the die again so that you generate a total of eight samples of twelve rolls [you will end up rolling the die 96 times].  You should  "add" these 96 observations together and calculate descriptive statistics on the sample of 96 rolls.  You should also construct a histogram and compare the means and standard deviations for the small and large samples.

2.  It is now time to generate some of your own statistics using some of the data that you already have already looked at. You are to create the summary statistics for monthly data on the interest rate on 6-month government debt. I would suggest that you try the time-series page or FRED for the data.  

3. Construct a histogram for the length of the babies in the question in part 1.

4. The Continental Basketball League management council reported the following information on the 1977 salaries of players: What was the expected value (average) salary for Centers and Guards combined?

Position Number of Players Mean Salary
Center 12 $350,000
Guards 46 $180,000

Part 3 (Probability)

1a. It's time to go back to the dice and do some calculations. You should calculate the following probabilities given the sample space of possible outcomes described in the probability section.  You should also indicate what rule of probability that you used.   Assume that you are rolling two dice.  What is the probability that you roll:

  1. a ten
  2. a five and a five
  3. a ten given that it is a double digit number

1b.  Now for a card example.  You should calculate the following probabilities given the sample space of possible outcomes described in the probability section.   You should also indicate what rule of probability that you used.  Assume that you are picking two cards from a deck.  What is the probability that you pick:

  1. a ten
  2. a five and a five
  3. a heart and a club

2a.  It's now time to use a probability tree diagram to calculate the following probabilities.

  1. That three children are girls
  2. That there are three girls given that the first child is a girl
  3. That there are at least two girls

2b.  It's now time to use a probability tree diagram to calculate the following probabilities.

  1. That three heads when you flip a coin three times
  2. That there are three heads given that the first flip is a head
  3. That there are at least two heads

3a. What is the probability of winning a lottery when the winning number is four digits and the four numbers are drawn with replacement?  What would be the expected number of winners in the state of RI where there are 1 million people if everyone played the lottery? What is the probability when the winning number is five digits? What is the probability if the four numbers drawn can be double digit numbers?

3b.  If telemarketers have no list and simply choose phone numbers randomly, what is the chance that you will be picked to receive a call if the telemarketers are restricted to a seven digit number within a specific area code?

4. You are to return to the section on probability and calculate the mean and standard deviation of the probability distribution based on the Distribution of Events graph. 

5. Now that you have gotten much better at the grading you can draw the grade distributions that correspond to the following statements.

  1. There is a bimodal distribution of grades - there are a large number of students with high grades and a large number with low grades.
  2. Grades are symmetrically distributed with a mean score of 75.
  3. Grades in section 1 have a higher mean than those in section 2

6a.  It's now time to use the binomial distribution to calculate the following probabilities. [The UCLA on-line site would be one option or you could use a table]

  1. That three children are girls
  2. That there are three girls given that the first child is a girl
  3. That there are at least two girls

6b. Assume that the probability of rain on any given day is 10 percent.  Use the binomial distribution to calculate:

  1. That over a ten day period there is no rain.
  2. That over a ten day period there is rain every day.
  3. That you get five days of rain followed by five days without rain
  4. That you get at least five days without rain
  5. That it will rain on Saturday

7a. The RI Visitors' Bureau reports tourists coming to RI spend an average of $150 per day. Assume the standard deviation is $30 and the distribution of expenditures is approximately normal. Answer the following:

  1. What percentage of visitors will have an average daily expenditure of between $120 and $180 per day?
  2. What percentage of visitors will have an average daily expenditure of between $105 and $195 per day?

7b. Assume that the test scores from a college admissions test are normally distributed, with a mean of 550 and a standard deviation of 100.

  1. What percentage of the people taking the test score between 500 and 600?
  2. Suppose that someone receives a score of 630. What percentage of the people taking the test score better? What percentage score worse?
  3. If a particular university will not admit anyone scoring below 480, what percentage of the persons taking the test would be acceptable to the university?

8a. The first ECN exam is always a tough one.   Experience has been that grades are distributed normally and that the mean grade on the first exam is 65 and the standard deviation is 12.  If a sample of 36 students that have taken the exam, what is the probability that the mean is below 62?  What is the probability that it is above 75?

8b. A review of long distance phone calls was made and the mean and standard deviation for the calls' duration were 4.5 minutes and 1.5 minutes respectively. If you took a sample of 36 phone calls, what is the probability that the mean would be above 5.0 minutes?  below 4.3 minutes?

8c. Is everything OK on the assembly line? The specs on the machine are that the mean size of the widget is 80 cm and the standard deviation is 10.  If you took a sample of 25 widgets from the assembly line and found the mean to be 77, what is the probability of getting a mean this low if the mean is actually 80?

8d.  Consider the problems facing the drug business.   Their drugs work, but they must be very careful with the dosage.  Too much or too little can have often have serious consequences.  For example, 1000 milliliters of drug X contains on average 100 milliliters of compound TFL.  Patients can have serious side-effects if the level of compound TFL rises above 120 milliliters.  The company that makes the machinery indicates that the machine should produce 1000 milliliters of drug X with a mean of 100 milliliters of TFL and a standard deviation of 9 milliliters of TFL.


i. What is the probability that a 1000 milliliter container of drug X will reach the danger level of 120 milliliters of TFL?
ii. Assume you took a sample of 100 containers and found that the mean level of TFL was 104.  What is the chance that this would happen if the mean was in fact 100? 

Part 4: Inferential Statistics

1. Assume that the Economics Department has decided to let the free market determine salaries. Salaries are to be set according to the level of enrollments in ECN 125 and ECN 126 courses in the upcoming semester. You are in charge of monitoring advertising to see that there is no faulty advertising with inflated claims. Below is a sample of advertising that I have seen elsewhere (Rhode Island is never the first with any new idea). You are to determine whether the following statements could be justified under some set of circumstances in light of the legitimate reply made by the opposition:

  1. Claim: I gave out more A's than anyone.  Reply: The chance of getting an unsatisfactory grade (below C) in his course is larger than in any other course.
  2. Claim: You have the greatest chance of an A in this course.  Reply: He gave out fewer A' s than anyone else .
  3. Claim: More people passed in my course than in any other section.  Reply: The chance of failure is greater in his course than my course.

2a. Assume that a local mall was trying to determine the economic impact of URI students. A survey of 49 students was completed and it was determined that URI students spent an average of $120 a month in the mall and the SD of expenses was $28. Find the 95 percent confidence interval for student spending if you assume that spending is distributed normally.

2b.  Assume that grades in ECN201 are distributed normally.  If you take a sample of 20 students and compute the mean and standard deviation to be 70 and 15, what is the 95% confidence interval for the mean score?

3a. Consider the situation at Pear Computers that is dealing with ONTEL, a supplier of macrochips. Pear has looked at the chips carefully and they are a bit concerned.  They had a deal with ONTEL that the error tolerance levels would be 2 microdrams and that the standard deviation would be .6 microdrams.  Pear has taken a sample of 100 chips and found that the sample mean is 2.2 microdrams.  They feel that they are experiencing excessive error levels and are considering a change in suppliers or a suit.  What do you think?  Does Pear have a case?

3b.  The admissions officer at RIU has stated that the mean SAT score for incoming students is 1200 and the standard deviation is 100.  You have taken a sample of 100 students and observed that the sample mean SAT score is 1100.   What can you say about the claims of the admissions officer?

4a. Consider the outlines of a study to evaluate the "effectiveness" of information operators.  A study was conducted where 25 operators were monitored and it was determined that the mean time taken to get the number was 54 seconds with a standard deviation of 6 seconds.  What would you say about the chance of selecting a sample of 9 operators and finding the mean time per call being one minute (60 seconds)?

4b. A review of transactions at a small boutique has revealed that 75% of transactions are credit card transactions.  How likely is it that on a given day that a sample of 25 transactions will find that credit cards represent 50% of transactions?

4c. The tourist bureau in Rhode Island is attempting to evaluate the impact of a new advertising campaign designed to promote car pooling among commuters.  Before the advertising campaign was undertaken, the percentage of cars entering Providence during rush hour with only one person was 70 percent.  After the advertising was in place, the percentage dropped to 65%.  Is there any evidence that the campaign worked at the 95 percent significance level?