Statistical Inference with Randomization

Robin Donatello

2024-10-02

Statistical Inference

(A.K.A Hypothesis Testing)

https://xkcd.com/1478/

Follow along printed note available for this section

Statistical Inference

  • Primarily concerned with understanding and quantifying the uncertainty of parameter estimates.
  • Equations and details change depending on the setting, the foundations for inference are the same throughout all of statistics.
  • We start with a case studies designed to motivate the process of making decisions about research claims.
  • We formalize the process through the introduction of the hypothesis testing framework, which allows us to formally evaluate claims about the population.

Variability in data

  • Sometimes the dataset at hand represents the entire research question.
  • More often than not, the data have been collected to answer a research question about a larger group of which the data are a (hopefully) representative subset.
  • One dataset will not be identical to a second dataset even if they are both collected from the same population using the same methods.

Motivating Example: Case study in sex discrimination

We consider a study investigating sex discrimination in the 1970s, which is set in the context of personnel decisions within a bank. The research question we hope to answer is, “Are individuals who identify as female discriminated against in promotion decisions made by their managers who identify as male?” (Rosen and Jerdee 1974 Note: The study allowed only for a binary classification of sex. The identities being considered are not gender identities.

Methods

48 male-identifying bank supervisors were asked to assume the role of the personnel director and were given personnel files to judge whether the person should be promoted. The files were identical except half indicated that the candidate identified as male, the other half the candidate identified as female. The files were randomly assigned to the participants, 13 individuals were promoted and the results were recorded.

Setup

Is this an observational study or an experiment? Why?

How does the type of study impact what can be inferred from the results?

What is the hypothesis being tested? (in words with no statistical jargon).

Observed Results

The results from the study are displayed in the following contingency table. Use it to fill in the blanks below.

Show the code
library(openintro) # to get the sex_discrimination data
table(sex_discrimination$sex, sex_discrimination$decision) |> addmargins()
        
         promoted not promoted Sum
  male         21            3  24
  female       14           10  24
  Sum          35           13  48

In this study, a (bigger/smaller) proportion of female identifying applications were promoted than males (______________ versus ______________).

Calculate the difference (male - female) between these proportions.

Point Estimate

This observed difference is what we call a point estimate of the true difference in promotional rates between males and females. This difference suggest there might be discrimination against women in promotion decisions, is unclear whether there is convincing evidence that the observed difference represents discrimination or is just due to random chance when there is no discrimination occurring.

Competing Hypothesis

Chance can be thought of as the claim due to natural variability; discrimination can be thought of as the claim the researchers set out to demonstrate. We label these two competing claims in the following way:

\(H_{0}\): Null Hypothesis:




\(H_{A}\): Alternative Hypothesis:




Often times, the null hypothesis takes a stance of no difference or no effect. This hypothesis assumes that any differences seen are due to the variability inherent in the population and could have occurred by random chance.

Data or it didn’t happen

We will choose between the two competing claims by assessing if the data conflict so much with \(H_{0}\) that the null hypothesis cannot be deemed reasonable. If data and the null claim seem to be at odds with one another, and the data seem to support \(H_{A}\), then we will reject the notion of independence and conclude that the data provide evidence of discrimination.

Inference via Randomization

To decide whether variability in data is due to

    1. the causal mechanism (the randomized explanatory variable in the experiment)
    1. natural variability inherent to the data

we set up a sham randomized experiment as a comparison.

“Sham” randomized experiment

  • We assume that each observational unit (person) would have gotten the exact same response value (chance of being promoted) regardless of the treatment level (sex).
  • By reassigning the treatments many many times, we can compare the actual experiment to the sham experiment.

Decision

If the actual experiment has more extreme results than any of the sham experiments, we are led to believe that it is the explanatory variable which is causing the result and not just variability inherent to the data.

Variability of the statistic

Suppose the bankers’ decisions were independent of the sex of the candidate. Then, if we conducted the experiment again with a different random assignment of promotion to the files, differences in promotion rates would be based only on random fluctuation in promotion decisions. Let’s simulate this as a class so we can get lots of replications for this experiment.

You’re in power now

Simulation Study

It’s your turn to assume the role of the bank personnel director and make a promotional decisions. Your group has been handed a stack of 48 confidential applications. Keep them face down for now.

To promote, or not to promote.

  1. Identify someone to record your results. This person logs into our shared Google drive, opens the Randomization Experiment spreadsheet, and fills out the first two columns.
  2. While they are still face down, choose 13 of your applicant pool to be promoted. Put these files in one pile, those not promoted in another pile. Fill out the \# of promotions column in the spreadsheet.
  3. Flip your candidates over to reveal their sex. Record the number of males and females, and the # of each that have been promoted in each group.
  4. Calculate your point estimate for the male-female difference in promotional rate and write it in your notes.
  5. Shuffle thoroughly your application pool (face down) and repeat this experiment for simulation round 2.

Class Results

Show the code
sim.data <- googlesheets4::read_sheet(URL) %>% 
  select(mfdiff = `(Male - female) difference`) 

ggplot(sim.data, aes(x = mfdiff)) +
  geom_histogram(binwidth = .01) +
  theme_bw() +
  geom_vline(xintercept = .292, color = "blue", lwd=2) +
  xlab("point estimate") + ylab("Count")

Describe the distribution of this graph. What does it seem to be centered around?

In what percent of simulations did we observe a difference of at least 29.2% (0.292)?

Show the code
sum(sim.data$mfdiff >= .29) / NROW(sim.data)
[1] 0.03703704

Decision

In our analysis, we determined that there was only a \(\approx\) .037 chance of obtaining a sample where \(\geq\) 29.2% more male candidates than female candidates get promoted under the null hypothesis, so we conclude that the data provide some evidence of sex discrimination against female candidates by the male supervisors. In this case, we can reject the null hypothesis in favor of the alternative.

Closing

Statistical inference is the practice of making decisions and conclusions from data in the context of uncertainty.

Errors do occur, just like rare events, and the dataset at hand might lead us to the wrong conclusion. While a given dataset may not always lead us to a correct conclusion, statistical inference gives us tools to control and evaluate how often these errors occur.