2024-09-11
These notes assume the following…
tidyverse
and here
packages installedSee ASCN Ch 19.1-19.8 for details if you still need help.
dm_dataset.qmd
where dataset
is YOUR dataset name. E.g. dm_addhealth.qmd
.Math615/scripts
folder.data.csv
with YOUR data set name exactly as it shows in your files window (bottom right).Okay, did it work?
raw
? Does it have an expected number of rows and columns?Use functions like str()
or glimpse()
to see what data type R thinks your variables are for the whole data set
Rows: 344
Columns: 17
$ studyName <chr> "PAL0708", "PAL0708", "PAL0708", "PAL0708", "PAL0708", "PAL0708", "PAL0708", "PAL0708", "PAL0708", "PAL0708", "PAL0708", "PAL0708", "PAL0708", "PAL0708", "PAL0708", "PA…
$ `Sample Number` <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 4…
$ Species <chr> "Adelie Penguin (Pygoscelis adeliae)", "Adelie Penguin (Pygoscelis adeliae)", "Adelie Penguin (Pygoscelis adeliae)", "Adelie Penguin (Pygoscelis adeliae)", "Adelie Peng…
$ Region <chr> "Anvers", "Anvers", "Anvers", "Anvers", "Anvers", "Anvers", "Anvers", "Anvers", "Anvers", "Anvers", "Anvers", "Anvers", "Anvers", "Anvers", "Anvers", "Anvers", "Anvers"…
$ Island <chr> "Torgersen", "Torgersen", "Torgersen", "Torgersen", "Torgersen", "Torgersen", "Torgersen", "Torgersen", "Torgersen", "Torgersen", "Torgersen", "Torgersen", "Torgersen",…
$ Stage <chr> "Adult, 1 Egg Stage", "Adult, 1 Egg Stage", "Adult, 1 Egg Stage", "Adult, 1 Egg Stage", "Adult, 1 Egg Stage", "Adult, 1 Egg Stage", "Adult, 1 Egg Stage", "Adult, 1 Egg …
$ `Individual ID` <chr> "N1A1", "N1A2", "N2A1", "N2A2", "N3A1", "N3A2", "N4A1", "N4A2", "N5A1", "N5A2", "N6A1", "N6A2", "N7A1", "N7A2", "N8A1", "N8A2", "N9A1", "N9A2", "N10A1", "N10A2", "N11A1…
$ `Clutch Completion` <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Y…
$ `Date Egg` <date> 2007-11-11, 2007-11-11, 2007-11-16, 2007-11-16, 2007-11-16, 2007-11-16, 2007-11-15, 2007-11-15, 2007-11-09, 2007-11-09, 2007-11-09, 2007-11-09, 2007-11-15, 2007-11-15,…
$ `Culmen Length (mm)` <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, 37.8, 37.8, 41.1, 38.6, 34.6, 36.6, 38.7, 42.5, 34.4, 46.0, 37.8, 37.7, 35.9, 38.2, 38.8, 35.3, 40.6, 40.5, 37…
$ `Culmen Depth (mm)` <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, 20.2, 17.1, 17.3, 17.6, 21.2, 21.1, 17.8, 19.0, 20.7, 18.4, 21.5, 18.3, 18.7, 19.2, 18.1, 17.2, 18.9, 18.6, 17.9, 18…
$ `Flipper Length (mm)` <dbl> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186, 180, 182, 191, 198, 185, 195, 197, 184, 194, 174, 180, 189, 185, 180, 187, 183, 187, 172, 180, 178, 178, 188, 184,…
$ `Body Mass (g)` <dbl> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, 4250, 3300, 3700, 3200, 3800, 4400, 3700, 3450, 4500, 3325, 4200, 3400, 3600, 3800, 3950, 3800, 3800, 3550, 3200, 31…
$ Sex <chr> "MALE", "FEMALE", "FEMALE", NA, "FEMALE", "MALE", "FEMALE", "MALE", NA, NA, NA, NA, "FEMALE", "MALE", "MALE", "FEMALE", "FEMALE", "MALE", "FEMALE", "MALE", "FEMALE", "M…
$ `Delta 15 N (o/oo)` <dbl> NA, 8.94956, 8.36821, NA, 8.76651, 8.66496, 9.18718, 9.46060, NA, 9.13362, 8.63243, NA, NA, NA, 8.55583, NA, 9.18528, 8.67538, 8.47827, 9.11616, 8.73762, 8.66271, 9.222…
$ `Delta 13 C (o/oo)` <dbl> NA, -24.69454, -25.33302, NA, -25.32426, -25.29805, -25.21799, -24.89958, NA, -25.09368, -25.21315, NA, NA, NA, -25.22588, NA, -25.06691, -25.13993, -25.23319, -24.7722…
$ Comments <chr> "Not enough blood for isotopes.", NA, NA, "Adult not sampled.", NA, NA, "Nest never observed with full clutch.", "Nest never observed with full clutch.", "No blood samp…
tibble [344 × 17] (S3: tbl_df/tbl/data.frame)
$ studyName : chr [1:344] "PAL0708" "PAL0708" "PAL0708" "PAL0708" ...
$ Sample Number : num [1:344] 1 2 3 4 5 6 7 8 9 10 ...
$ Species : chr [1:344] "Adelie Penguin (Pygoscelis adeliae)" "Adelie Penguin (Pygoscelis adeliae)" "Adelie Penguin (Pygoscelis adeliae)" "Adelie Penguin (Pygoscelis adeliae)" ...
$ Region : chr [1:344] "Anvers" "Anvers" "Anvers" "Anvers" ...
$ Island : chr [1:344] "Torgersen" "Torgersen" "Torgersen" "Torgersen" ...
$ Stage : chr [1:344] "Adult, 1 Egg Stage" "Adult, 1 Egg Stage" "Adult, 1 Egg Stage" "Adult, 1 Egg Stage" ...
$ Individual ID : chr [1:344] "N1A1" "N1A2" "N2A1" "N2A2" ...
$ Clutch Completion : chr [1:344] "Yes" "Yes" "Yes" "Yes" ...
$ Date Egg : Date[1:344], format: "2007-11-11" "2007-11-11" "2007-11-16" "2007-11-16" ...
$ Culmen Length (mm) : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
$ Culmen Depth (mm) : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
$ Flipper Length (mm): num [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
$ Body Mass (g) : num [1:344] 3750 3800 3250 NA 3450 ...
$ Sex : chr [1:344] "MALE" "FEMALE" "FEMALE" NA ...
$ Delta 15 N (o/oo) : num [1:344] NA 8.95 8.37 NA 8.77 ...
$ Delta 13 C (o/oo) : num [1:344] NA -24.7 -25.3 NA -25.3 ...
$ Comments : chr [1:344] "Not enough blood for isotopes." NA NA "Adult not sampled." ...
- attr(*, "spec")=
.. cols(
.. studyName = col_character(),
.. `Sample Number` = col_double(),
.. Species = col_character(),
.. Region = col_character(),
.. Island = col_character(),
.. Stage = col_character(),
.. `Individual ID` = col_character(),
.. `Clutch Completion` = col_character(),
.. `Date Egg` = col_date(format = ""),
.. `Culmen Length (mm)` = col_double(),
.. `Culmen Depth (mm)` = col_double(),
.. `Flipper Length (mm)` = col_double(),
.. `Body Mass (g)` = col_double(),
.. Sex = col_character(),
.. `Delta 15 N (o/oo)` = col_double(),
.. `Delta 13 C (o/oo)` = col_double(),
.. Comments = col_character()
.. )
Both views show you the variable names, data types, and what the data in the first few rows looks like.
You can also look at the data type for a single variable at a time.
[1] "character"
[1] "character"
chr [1:344] "Torgersen" "Torgersen" "Torgersen" "Torgersen" "Torgersen" "Torgersen" "Torgersen" "Torgersen" "Torgersen" "Torgersen" "Torgersen" "Torgersen" "Torgersen" "Torgersen" "Torgersen" ...
Confirm these values follow the expected values according to the codebook.
Use table()
for categorical variables
and summary()
on numeric variables to see the range of values present.
Questions to ask yourself (and the data) while reviewing the codebook to choose variables to be used in an analysis.
janitor::clean_names()
function to resolve this.Some of these answers will come only after you look at your data. This can be looking at the raw data itself but also looking at tables and charts generated from the data.
Often when you try to create a plot or table you will encounter an error or something odd looking that will be the notification that something has to be adjusted.
R project
or searching for package vignette
sYou will start to build up your repertoire of bookmarked sites you like to use.
Example prompt
Using the palmer penguins dataset in R write me R code to change the factor order of the islands variable to put Dream first
What did it give?
Sure, here’s an example of how you can change the factor order of the island
variable in the Palmer Penguins dataset to put “Dream” first using R:
# Load necessary packages
library(palmerpenguins)
# Load the dataset
data("penguins")
# Check the current levels of the 'island' factor
levels(penguins$island)
# Reorder the levels of the 'island' factor to put "Dream" first
penguins$island <- factor(penguins$island, levels = c("Dream", "Torgersen", "Biscoe"))
# Check the new order of levels
levels(penguins$island)
In this code, we’re first loading the palmerpenguins
package, which provides access to the dataset. Then we load the dataset itself using data("penguins")
. We check the current levels of the island
factor using levels(penguins$island)
.
Next, we use the factor()
function to reorder the levels of the island
variable. We specify the desired order of levels using the levels
argument. In this case, we put “Dream” first, followed by the other islands. Finally, we check the new order of levels to make sure the change was successful.
Make sure you have the palmerpenguins
package installed. You can install it using install.packages("palmerpenguins")
if you haven’t done so already.
Not only did it write the code, it wrote literate code. It has code comments (lines 1, 4, 7, prefixed with a #
), and a full text explanation.
This is GREAT for learning how to write code!
Plagerism warning
Read the Syllabus on what is expected of you if you use this tool to aid in your writing https://math615.netlify.app/syllabus#use-of-ai
⚠️ Trust but verify! AI is not always correct! Also, this does not replace the necessity of you learning.
🔗 https://math615.netlify.app / Preparing your data for analysis