6  Preparing Yourself

6.1 What do I need to do to get started in this class?

To prepare for the course, here is what you are expected to do by the end of the first week (specific information is below this list):

  • Join the ZOO-5500 Slack (private link sent via email)
  • Download, Install, and set up RStudio Desktop
  • Gather your dataset files for your analysis

6.1.1 Join the ZOO-5500 Slack

Just click on the list and follow instructions. We will use Slack to encourage ongoing, low-stakes discussion within the class. This will be for sharing questions, ideas, and insights in real time so that learning can continue outside of scheduled class meetings. This allows everyone (even me) to get answers to questions that I might be thinking about at odd times. It also allows for problems to be solved faster; rather than waiting until our synchronous Zoom session to ask about an issue, we can group-think over the course of the week. Remember that this is set up to improve collaboration; it does not mean that anyone should be responding outside of work hours or on the weekends. Below are the preferred channels for different kinds of communication:

  • #announcements >Read-only channel for important course information like deadlines, corrections, and post-class clarifications. Check this regularly.

  • #course-questions >For general course or assignment questions (due dates, unclear wording on websites) that others might also have. Expect peer responses and occasional instructor follow-ups.

  • #data-help >For data wrangling and technical data issues (imports, joins, NAs, plots). Not for model choice or interpretation.

  • #model-talk >For conceptual questions about models and assumptions. Discussion-focused; no code debugging.

  • #analysis-workflow >For questions about reproducibility, organization, reporting, and good analytical habits.

  • #ai-use-and-logs For discussing allowed AI use and posting AI interaction logs when requested. No private communication or graded work.

  • #papers-good-bad-ugly >For highlighting examples of good and bad analyses in published work.

  • #random >For off-topic chat, random links, and fun stuff. This will help keep everything else focused.

6.1.2 Download, Install, and set up RStudio Desktop

Install on your local machine. All coding and file organization will be done in RStudio. Note that the following is a very basic way of creating a set of project folders. You will see boxes with R code. Feel free to copy and run these, if necessary. Here are the basic steps:

  • Download and install RStudio Desktop. Navigate to your folder (in this case, your student folder).
  • In “Files” window (lower right panel in RStudio), click “More/Set as Working Directory”. This sets your working directory; this is always a good idea to do.
  • Create a New R Project in your student folder: “File/New Project/Existing Directory”. Name this project appropriately.
  • Create your File structure: Refer to this RStudio tutorial. In this class, we will modify this set of directories to make things a bit easier and more transparent for you. After completing the following steps, place your dataset in your “data/raw_data” folder (after you create it below). Navigate to this folder on your computer and drag-and-drop or copy-paste your data file(s) into this folder.

Create three folders (one suggested framework):

  • /data (for raw, processed, clean datasets)
  • /r_scripts (for R scripts, each containing a set of related functions)
  • /output output (for models, graphs, tables)

We can do this by staying in our directory (within RStudio) and then using the dir.create function.

dir.create("data")

You can see your new folder (“data”) appear in the Files in the lower right window within RStudio. Now let’s create the other two folders.

dir.create("r_scripts")
dir.create("output")

To help with data processing (i.e. cleaning and organization)–something that we will discuss in coming days within the context of your analytical workflow–, we will create some subfolders within your newly created “data” folder. Let’s do this now. Note how you create a folder within the “data” folder by specifying the path:

dir.create("data/raw_data")
dir.create("data/processed_data")
dir.create("data/metadata")

And that is how you create a basic R Project in RStudio! What you have done is create a set of folders and subfolders that have names that can be easily understood by other users. Feel free to create other subfolders now. Or, should you not like the file nomenclature used above, change the names to whatever you wish. Just be sure that this structure is as simple as possible so that other users understand what you have done.

There is an added benefit to creating an R Project in this way. What you have done is create a minimally reproducible data structure. There are two primary ways this can be accomplished (if you are not familiar with this. First method: A collaborator (or instructor or classmate) can simply navigate to your subfolder, set that as their working directory from within RStudio, and then open the .RProj file. Alternatively, you can create zip your root folder (the one labelled as your name, in this case) and then share your entire analysis, etc. All the recipient would need to do is unzip the folder, open the .Rproj file, and then “Set as Working Directory.”

6.1.3 Gather your dataset files for your analysis

Place your data in your “data/raw_data” subfolder and then confirm that your files are present by running the list.files function.

list.files("data/raw_data")
Note

Working order means more than a spreadsheet or two.
A dataset is only complete if it includes metadata. Metadata describe: - what the variables are
- how the data were collected
- under what conditions
- with what assumptions and limitations

If you have questions about metadata, I have an old, dedicated lecture on metadata structures, and I strongly encourage you to reach out.

You are now ready to move to the next steps of loading necessary R packages and beginning your journey of data exploration!