Statistical Inference with Randomization

Robin Donatello

2023-10-02

Statistical Inference

(A.K.A Hypothesis Testing)

https://xkcd.com/1478/

Statistical Inference

  • Primarily concerned with understanding and quantifying the uncertainty of parameter estimates.
  • Equations and details change depending on the setting, the foundations for inference are the same throughout all of statistics.
  • We start with a case studies designed to motivate the process of making decisions about research claims.
  • We formalize the process through the introduction of the hypothesis testing framework, which allows us to formally evaluate claims about the population.

Variability in data

  • Sometimes the dataset at hand represents the entire research question.
  • More often than not, the data have been collected to answer a research question about a larger group of which the data are a (hopefully) representative subset.
  • One dataset will not be identical to a second dataset even if they are both collected from the same population using the same methods.

Motivating Example: Police Shootings since 2015

The Washington Post has been keeping a database on use of fatal force by police since 2015. This database tracks all incidents where someone was shot by police, and records various characteristics such as the seniority level of the officer, date, location, gender, age and ethnicity of the victim along with information on if the person was armed, fleeing or acting erratic.

(board work)

Randomization

To decide whether variability in data is due to

    1. the causal mechanism (the randomized explanatory variable in the experiment)
    1. natural variability inherent to the data

we set up a sham randomized experiment as a comparison.

“Sham” randomized experiment

  • We assume that each observational unit (person) would have gotten the exact same response value (shot by police) regardless of the treatment level (race).
  • By reassigning the treatments many many times, we can compare the actual experiment to the sham experiment.

Decision

If the actual experiment has more extreme results than any of the sham experiments, we are led to believe that it is the explanatory variable which is causing the result and not just variability inherent to the data.

Case Study: Sex discrimination

Turning to something equally important, but less potentially distressful, let’s do an experiment on sex discrimination in promotional decisions

(in class)

Case Study: Sex discrimination - collab notes

Open the collaborative notes and work with your neighbor to answer the following questions under the “Inference using Randomization” section.

https://hackmd.io/@norcalbiostat/06b-stat_inference

Statistical inference is the practice of making decisions and conclusions from data in the context of uncertainty.

Errors do occur, just like rare events, and the dataset at hand might lead us to the wrong conclusion. While a given dataset may not always lead us to a correct conclusion, statistical inference gives us tools to control and evaluate how often these errors occur.