HW 06: Foundations for Inference
A quick recap
Overview
Purpose
To understand the conceptual and mathematical foundations for conducting statistical inference.
Submission instructions
- Download this [QMD] file and use it to write your answers in.
- Right click and ‘save as’, put this in your
scripts
folder.
- Right click and ‘save as’, put this in your
- Render to pdf and upload your final PDF to Canvas by the due date.
PART 1 - Study Design
1 Parameters and statistics.
Identify which value represents the sample mean and which value represents the claimed population mean.
a. American households spent an average of about $52 in 2007 on Halloween merchandise such as costumes, decorations, and candy. To see if this number had changed, researchers conducted a new survey in 2008 before industry numbers were reported. The survey included 1,500 households and found that average Halloween spending was $58 per household.
b. The average GPA of students in 2001 at a private university was 3.37. A survey on a sample of 203 students from this university yielded an average GPA of 3.59 a decade later.
2 Stealers, scope of inference.
In a study of the relationship between socio-economic class and unethical behavior, 129 University of California undergraduates at Berkeley were asked to identify themselves as having low or high social class by comparing themselves to others with the most (least) money, most (least) education, and most (least) respected jobs. They were also presented with a jar of individually wrapped candies and informed that the candies were for children in a nearby laboratory, but that they could take some if they wanted. After completing some unrelated tasks, participants reported the number of candies they had taken. It was found that those who were identified as upper-class took more candy than others. (Piff et al. 2012)
a. Identify the population of interest and the sample in this study.
b. Comment on whether the results of the study can be generalized to the population, and if the findings of the study can be used to establish causal relationships.
3 Course satisfaction across sections.
A large college class has 160 students. All 160 students attend the lectures together, but the students are divided into 4 groups, each of 40 students, for lab sections administered by different teaching assistants. The professor wants to conduct a survey about how satisfied the students are with the course, and he believes that the lab section a student is in might affect the student’s overall satisfaction with the course.
a. What type of study is this?
b. Suggest a sampling strategy for carrying out this study.
4 Internet use and life expectancy.
The scatterplot in the textbook was created as part of a study evaluating the relationship between estimated life expectancy at birth (as of 2014) and percentage of internet users (as of 2009) in 208 countries for which such data were available.
a. Describe the relationship between life expectancy and percentage of internet users.
b. What type of study is this?
c. State a possible confounding variable that might explain this relationship and describe its potential effect.
5 Haters are gonna hate, study confirms.
A study published in the Journal of Personality and Social Psychology asked a group of 200 randomly sampled participants recruited online using Amazon’s Mechanical Turk to evaluate how they felt about various subjects, such as camping, health care, architecture, taxidermy, crossword puzzles, and Japan in order to measure their attitude towards mostly independent stimuli. Then, they presented the participants with information about a new product: a microwave oven. This microwave oven does not exist, but the participants didn’t know this, and were given three positive and three negative fake reviews. People who reacted positively to the subjects on the dispositional attitude measurement also tended to react positively to the microwave oven, and those who reacted negatively tended to react negatively to it. Researchers concluded that “some people tend to like things, whereas others tend to dislike things, and a more thorough understanding of this tendency will lead to a more thorough understanding of the psychology of attitudes.” (Hepler and Albarracı́n 2013)
a. What are the cases?
b. What is (are) the response variable(s) and explanatory variable(s) in this study?
c. Does the study employ random sampling? Explain. How could they have obtained participants?
d. Can we establish a causal link between the explanatory and response variables?
e. Can the results of the study be generalized to the population at large?
6 Reading the paper
Below are excerpts from two articles published in the NY Times:
a. An excerpt from an article titled Risks: Smokers Found More Prone to Dementia is below. Based on this study, can we conclude that smoking causes dementia later in life? Explain your reasoning. (Rabin 2010)
b. An excerpt from an article titled The School Bully Is Sleepy is below. A friend of yours who read the article says, “The study shows that sleep disorders lead to bullying in school children.” Is this statement justified? If not, how best can you describe the conclusion that can be drawn from this study? (Parker-Pope 2011)
7. Flawed Reasoning
Identify the flaw(s) in reasoning in the following scenarios. Explain what the individuals in the study should have done differently if they wanted to make such strong conclusions.
a. Students at an elementary school are given a questionnaire that they are asked to return after their parents have completed it. One of the questions asked is, “Do you find that your work schedule makes it difficult for you to spend time with your kids after school?” Of the parents who replied, 85% said “no”. Based on these results, the school officials conclude that a great majority of the parents have no difficulty spending time with their kids after school.
b. A survey is conducted on a simple random sample of 1,000 women who recently gave birth, asking them about whether they smoked during pregnancy. A follow-up survey asking if the children have respiratory problems is conducted 3 years later. However, only 567 of these women are reached at the same address. The researcher reports that these 567 women are representative of all mothers.
c. An orthopedist administers a questionnaire to 30 of his patients who do not have any joint problems and finds that 20 of them regularly go running. He concludes that running decreases the risk of joint problems.
PART 2 - Sampling Distribution Theory (IMS Ch 13.1)
Describe what a sampling distribution is, what the central limit theorem states, and what are the technical conditions that must be met to use the normal model for inference.
You can use AI to aid your understanding, but your writing must be your own. Any sign of plagiarism or cheating among students will result in a 0 grade for the entire assignment
1. Sampling Distribution
2. Central Limit Theorem
3. Conditions for a normal approximation to provide valid inference
PART 3 - The Normal Model (IMS 13.2)
This section uses \(\LaTeX\) math type. You can copy/paste the symbols from within this document, but do not copy/paste symbols from the internet or PDF webpages or you may likely have trouble when rendering.
What are the two parameters that govern the shape and location of a Normal distribution? Write the symbol, and what it means in words
What does \(X \sim N(\mu, \sigma^{2})\) mean in English? What is this type of notation called? (You may have to Google this)
Bonita computed a z-score after receiving the results of her exam in order to compare her results with the rest of the class. Her z-score was 2.1. Interpret her Z score in a sentence.
Example: Head lengths of brushtail possums follow a nearly normal distribution with mean 92.6 mm and standard deviation 3.6 mm. Compute the Z scores for possums with head lengths of 95.4 mm and 85.8 mm.
Let \(X\) be the head length of a brushtail possum, where \(X \sim N(92.6, 3.6^{2})\) Use R to calculate the probability that a possum will have a head length lower than 85mm. That is, find \(P(X < 85)\).
Would you consider it to be unusual if you found a possum with a head smaller than 85mm? Explain your answer.
A study found that the mean amount of time cars spent in drive-through of a certain fast-food restaurant was 154.6 seconds. Assume drive-through times are normally distributed with a standard deviation of 27 seconds
Define a random variable X in this context, and write it’s distributional notation. Refer to problems 4 & 5 as examples.
What is the probability that someone spends more than 230 seconds in the drive through? Write this out as a probablity statement/equation before using R to solve
How long do 75% of the cars typically have to wait in the drive through? (That is, what is the 75th percentile?)
Estimate the average wait time for 40 cars in this drive-through.
What is the margin of error on this estimate?
Assuming that the wait time is normally distributed, what values bound the middle 95% of wait times for this average of 40 cars? Give a lower and an upper value. Show your work.