Preparing Data for Analysis

Where an inordinate amount of time is spent


Learning Path

Where we’ve been

Learning how to read our data and ask questions that are potentially answerable with this data.

Where we’re at

  • Starting to write code.
  • Recognizing that this stage will take time and that you are learning something challenging.
  • Willing to ask for help and collaboration



Know several questions to ask yourself when preparing data for analysis

Understand options for literate programming

Import your chosen data into your SPC

Create a reproducible workflow

Create a reproducible script file that imports the raw data, performs data management tasks, exports an analysis ready data set


In Homework 00 you downloaded and installed R and R Studio, and used it to import a data file in Homework 01. There are a few more steps to take that will allow you to be best setup for using R to analyze data.

Follow the instructions in the Appendix of the Applied Stats Course Notes to do the following:

  1. Set preferences for sanity (19.5)
  2. Install the tidyverse and here packages (19.6)
  3. Create a R project using the MATH615 folder that you created in step 1. (ASCN 19.7)
    • Shut down R studio fully, navigate to your class folder, and open your project file (the cube) before you continue.
  4. Read through and follow the “Hello Quarto” tutorial (19.8 intro)
  5. Install the tinytex program so you can create PDFs from your Quarto document. This is required (19.8)

If you have difficulty with any of this, visit Community Coding, my office hours, or post in Discord.

Learning Materials

Slides (Will open in full screen. Right click to open in a new tab)

We will also be using the Data Management chapter of the Applied Stats notebook for this topic.

📚 Reading


📝 Collaborative notes





This quiz will contain questions that reference topics in the PMA6. Don’t rely only on the slides for your answers.


Other references

Where you’ll start

Where you’ll end up