Choosing Appropriate Analysis

Robin Donatello

2023-10-11

A unifying model framework

A good way to think about all statistical models is that the observed data comes from some true model with some random error.

DATA = MODEL + RESIDUAL

The MODEL is a mathematical formula (like \(y = f(x)\)).

The formulation of the MODEL will change depending on the number of, and data types of explanatory variables. One goal of inferential analysis is to explain the variation in our data, using information contained in other measures.

Which model to use?

Moving from:

“What descriptive measures should be used to examine the data”

to

“What statistical analyses should be performed?


REF: PMA6 Chapter 6

Why selection can be difficult

  • Classroom instruction tends to present methods in logical order from a learning perspective. Building from simple to more complex. Exposure to complex models limited.
  • Real life data often contain a mixture of data types, missing data and complex patterns. This can make the choice of analysis somewhat arbitrary at times.
  • Two trained statisticians presented with the same data will often approach the analysis from different perspectives, and may different analyses depending on what assumptions they are willing to make.

Not always clear

  • The primary assumption of most standard statistical procedures is that the records are independent of each other.
  • Often program evaluation relies on paired measurements before and after a certain exposure or treatment (pre-post).
    • In this case, the approach is to calculate a pairwise difference for each individual and compare the average difference to 0 (any change vs no change).
  • For the purposes of this class, we will only concern ourselves with independent groups.

Repeated measures is a topic typically taught in MATH 456 (but also covered in Chapter 18 of PMA6)

Choose your adventure

Table 6.2 in PMA6 shows which statistical analyses procedures are appropriate depending on the combination of explanatory and response variable.

Case study practice

Go to the HackMD collaborative notes on Choosing appropriate analysis and work in pairs to answer an assigned question.

Tweet from Indrajeet Patil saying 'Your periodic reminder that the most common statistical tests are nothing but special cases of linear models.  https://lindeloev.github.io/tests-as-linear/. I wish someone had pointed out this beautiful simplicity and parsimony when I was learning statistics'

Additional References

PMA6 does not go over T-test, ANOVA or \(\chi^{2}\) tests. To see more examples (with R code) and more mathematical detail see Chapter 5 in the Applied Stats Course Notes