Show all work and/or explain.  Answers without work or explanation will not receive any credit.  Work and explanation are more important than the answer.  Work could include R code and output, or showing how you used calculator features.  You can use R, unless otherwise stated in the problem.  Include R code and output with each part of each problem when you use R.  Do NOT use an appendix for R code and output.  You should interpret any R output you provide (do not expect the instructor to interpret your output for you).  When asked to perform a hypothesis test, you must address and LABEL all 8 steps, unless the problem says otherwise.  If a problem does not tell you what assumptions to make, clearly state any assumptions you make in order to finish the problem.

  1. When we discussed one quantitative variable (OQV) and two independent quantitative populations (TIQP), we had confidence intervals (CI’s) and hypothesis tests (HT) that both involved the t distribution. For CI’s, we used notation of t*, while for HT’s we used notation of t.  Explain very briefly the difference between t* and t, in terms of how you acquire their values for a given problem.
  2. A biologist is studying the effect of soil type on growth rate (as measured by weight in grams) of a particular species of plant. She randomly selects 10 plants to serve as the parent, and takes two seeds from each of these plants, and puts one seed in soil type A and the other in soil type B.  Note this means that the two seeds that come from the same plant are paired together, and you must take that into account.  The results are given in the following table.
Plant 1 2 3 4 5 6 7 8 9 10
Type A 5 8 9 4 5 5 6 7 10 9
Type B 6 8 7 6 9 8 8 10 9 11

 
(a)  Does this sample provide evidence that Type B soil leads to faster growth (as measured by higher weight in grams), on average, in the population?  For only 2a, you can assume the population is normal, though you should use the sample to explore if that is reasonable in the appropriate place.
(b)  Redo (and label) only the steps that change if the question in 2a said “different” instead of “faster”.
(c)  Ignore 2b for this problem.  Redo (and label) only the steps that change if the sample size were 40 instead of 10.  Assume the sample mean and sample standard deviation remain the same as from the original data.
(d)  Ignore 2b and 2c for this problem.  Compute and interpret a confidence interval for the population mean weight difference, and how it could be used to decide to reject Ho or not in 2a.
(e)  Ignore 2b, 2c and 2d for this problem.  Explain very briefly how your approach changes conceptually for 2a if you decide to assume the population is symmetric instead of normal.    You do not need to address specific steps here.
(f)  Ignore 2b, 2c, 2d and 2e for this problem.  Explain very briefly how your approach change conceptually for 2a if you decide to assume the population is moderately to severely skewed.  You do not need to address specific steps here.
 

  1. In a study of factors thought to be responsible for the adverse effects of smoking on human reproduction, cadmium level determinations (nanograms per gram) were made on placenta tissue of a random sample of 14 mothers who were smokers and an independent random sample of 18 nonsmoking mothers. The results were as follows:

Nonsmokers:  10.0, 8.4, 12.8, 25.0, 11.8, 9.8, 12.5, 15.4, 23.5, 9.4, 25.1, 19.5, 25.5, 9.8, 7.5, 11.8, 12.2, 15.0
Smokers: 30.0, 30.1, 15.0, 24.1, 30.5, 17.8, 16.8, 14.8, 13.4, 28.5, 17.5, 14.4, 12.5, 20.4
Do these samples provide evidence that the level of cadmium is higher among smokers than nonsmokers, on average, in the populations.
 

  1. In the context of a treatment group and a placebo, explain very briefly how random assignment allows us to better determine if there is a cause and effect relationship between the response and explanatory variables.

 

  1. Explain very briefly what statistical bias is.

 

  1. We will reuse the data from #2 for this question. The goal now is to analyze if and how Type A and Type B are related.  You should assume Type B is the response variable.

(a)  Create a scatterplot and describe the scatterplot according to the four components discussed in class.  Remember that (very brief) explanations are more important than the answers.
(b)  Explain very briefly what a residual is in general.
(c)  Calculate and interpret an appropriate statistic to summarize the strength of relationship.
(d)  Explain very briefly why your choice of statistic in 6c is appropriate.
(e)  Determine the equation of the best fitting line.  Write the equation of the line.
(f)  Interpret the slope of the best fitting line in context.
(g)  Determine if there is evidence in the sample that Type A has a positive linear relationship with Type B in the population.