Intro to R

Get Ready!

  • Before the first session, make sure you have installed or updated R, RStudio, and the tidyverse package. Instructions are here.
  • Download the data files and R scripts:
    • Download this zipped folder. It contains one data file (albemarle_homes_2020.csv) and two R scripts (intro_R.R, intro_R_answers.R).
    • Unzip it and put it somewhere you can locate it on your machine.

Goals for our Workshop

These are the goals for today’s workshop:

  • Understand the features of R
  • Know where to look for help and to learn more about R
  • Orient yourself to R and RStudio
  • Understand the basics of working with data: load, explore, and save data
  • Learn some best practices for using R scripts, using data, and projects
  • Understand the basics of objects, functions, and indexing

The intended audience is beginner-level, with no previous experience using R.

We will focus on using R for data analysis throughout this series.

Features of R

  • R is free!
  • R is everywhere, and has an active user base. This is useful because you can find a lot of people in various disciplines using R in blogs, forums, Stack Overflow, etc., and you can often find help online there.
  • R is flexible! Since R is open source, the active R user base quickly implements new methods as libraries in R. Over 10,000 packages are available.
  • R is cool! It is highly regarded for its:
    • Graphical functionality. See ggplot2, ggplot extensions.
    • Interactive web functionality. See shiny.
    • Reproducible output, such as documents, presentations, and dashboards. See R Markdown.
    • Easy integration with other open-source or data science applications, such as Sublime Text, Jupyter Notebooks, GitHub, etc.

Orientation to R and RStudio

R is the underlying statistical computing environment. You can think of this like the engine of a car. That makes RStudio like the dashboard1.

R is the engine RStudio is the dashboard

Basically, RStudio makes it easier to use R because it is easier to run and execute code. After you install R and RStudio, you only need to run RStudio.

Panes

This is what RStudio looks like when you first open it:

Image of RStudio at full screen

Image of RStudio at full screen

RStudio shows four panes by default. The two most important are the Console (bottom left) and the Script Editor (top left).

  • Console: The console pane allows you to quickly and immediately execute R code. You can experiment with functions here, or quickly print data for viewing.
    • Type next to the > and press Enter to execute.
  • Script Editor: In contrast to the Console, which quickly runs code, the Script Editor does not automatically execute code. The Script Editor allows you to save the code essential to your analysis. You can re-use that code in the moment, refer back to it later, or publish it for replication.
    • To open a new R Script go to File…New File…R Script
  • Top Right:
    • Environment: Lists all objects that are saved in memory.
    • History: Lists all commands recently used.
    • Connections: Allows you to connect to external data stores.
  • Bottom Right:
    • Files: Shows the files available to you in your working directory.
    • Plots: Graphical output goes here.
    • Packages: User library of all the packages you have installed.
    • Help: Find help for R packages and functions. Don’t forget you can type ? before a function name in the console to get info in the Help section.
    • Viewer: View locally stored web content (e.g., .rmd files knitted to .html, Shiny)

Working with Scripts and Data

Use R scripts to save your work for future analysis. They are an essential part of reproducibility, either for collaborators, or your future self. FYI, R script files end with “.R”

Open the R script that you downloaded and unzipped in advance of the workshop:
File…Open File…Navigate to the folder where you saved the workshop files and open this file: intro_R.R Let’s look at our R script together.

Using packages

Recall that functions are the “verbs” that allow us to manipulate data. Packages contain functions, and all functions belong to packages. For example, read.csv belongs to the utils package.

R comes with about 30 packages (“base R”). There are over 10,000 user-contributed packages; you can discover these packages online in Comprehensive R Archive Network (CRAN). You can find more in active development on GitHub, etc. CRAN packages are validated to a certain degree; “buyer beware” with packages on GitHub.

A prevalent collection of packages is tidyverse. Tidyverse is actually a collection of several packages. We will take a look at that package in our script.

In order to use a package, you must install it first.

  • You can install packages via point-and-click: Tools…Install Packages…Enter tidyverse (or a different package name) then click on Install.
  • Or you can use this command in the console: install.packages("tidyverse")

Once you have installed the package, you must load the package in any new R session when you want to use that package: type library(tidyverse) to load the tidyverse package.

Install a package once, but load it every time!

View and Change Your Working Directory

The working directory is where R pulls files to work with. This is where your datasets, scripts, etc. live. It can be any folder location. (It doesn’t have to be the same folder where you installed R.)

R always has a working directory set. Get your working directory with this command in the console: getwd()

Let’s change the working directory to wherever you saved the R script and albemarle_homes_2020.csv. You can change the working directory a few ways.

  • You can set the working directory via point-and-click: Session (at the top)…Set Working Directory…Choose Directory
  • You can also set the working directory in the console.

To get your file path:

  • Windows users: Open File Explorer, navigate to the folder you want to use, and copy the path at the top of the window. Windows users note: the default for Windows is the \ (backslash) to separate folders in file paths, but R requires / (forward-slash).
  • Mac users: Using Finder, navigate to the folder you want to use, right click on the folder, and select “Copy Path.”

This is an example of how I set my working directory using the console: setwd("C:/Users/jah2ax.ESERVICES/Box Sync/_R/workshops/workshops_teaching/intro_to_R_Spring_2020")

Verify that you have the right directory with getwd(). Note that you can see the working directory listed at the top of the Console.

Import data into R

We will be working with Albemarle County real estate data.

You can import almost any kind of data into R. Your best bet for figuring out how to import a dataset is to google “how to import [file type] into R.” You will likely have to install a package in order to do it. For example, haven is a popular package for importing Stata, SPSS, and SAS files. Remember you can find the official documentation of packages online at CRAN: https://cran.r-project.org/web/packages/haven/index.html

Jump back to our R script “Clear our workspace” (~line 173).

Organizing your project with R

Congrats for making it this far! How do we save your R files? What should we save?

I know I just showed you setwd(). This is actually terrible for reproducibility. Recall the fiddly part where I gave you a script with my working directory set, e.g., setwd("C:/Users/jah2ax/Box Sync/_R/workshops/intro_to_R_Spring_2020").The script I gave you doesn’t work, because the path I gave you is only good for me and my computer, at this point in time. Using these absolute paths for your directories with setwd() makes it very hard to share your data with collaborators, and eventually yourself in the future if you move your files without updating your scripts. Don’t worry, there is a better way!

You can keep all the files associated with your project together into a R Project. Click on the icon that looks like an R in a cube in the upper right corner of RStudio.

Image of Rstudio Project

Image of Rstudio Project

Why is this helpful? Projects allow RStudio to leave notes for itself (e.g., history). But you also get:

  • a dedicated R process;
  • file browser is pointed to the Project directory; and
  • the working directory is set to the Project directory.

You don’t have to worry about setting the working directory at the top of a script, which is good for your collaborators. Remember, your most important collaborator is you six months from now!

Additionally, you have the option of clearing your workspace when you start a new R session. By doing this, you can be sure that you haven’t unintentionally pre-loaded something that won’t work for your collaborotor (or you in a few months). Go to: Tools…Global Options…General - make sure “Restore .RData into workspace at startup” is unchecked and “Save workspace to .RData on exit” is set to Never.

Image of Rstudio General Options

Image of Rstudio General Options

You can make sure that you are storing calculations, and not simply the results. You can verify by using this keystroke pattern:

  • Press Cmd/Ctrl + Shift + F10 to restart RStudio.
  • Press Cmd/Ctrl + Shift + S to rerun the current script.

What should you save? Your script (plus your raw, unprocessed data, of course) is your most important thing. If you have your script, you can reproduce everything. You’ll likely end up saving cleaned data, plots, etc., but those all flow from your script.

Jump back to RStudio and our R script: “Projects” (~line 379).

A basic R project file structure

Let’s take a look at how to organize your file structure. It’s a good idea to structure your files so that someone else (or future you!) can make sense of your analysis easily. Here’s a visual of a simple file structure2.

Image of simple file structure with R Project, image credit Martin Chan

Image of simple file structure with R Project, image credit Martin Chan

Let’s take a look at a super-simplified project based on the work we just did today. Download and unzip the project_example.zip file, and let’s investigate!

Your projects are going to get more complex than this simple structure. Here’s another example where there are multiple datasets, scripts, and results3.

Image of more complex file structure with R Project, image credit Leon Jessen

Image of more complex file structure with R Project, image credit Leon Jessen

Keeping R up to date

Since most people in our workshop have freshly installed R and RStudio, we hopefully didn’t encounter any issues with old versions today. But eventually, you will have to update R.

Remember that at the top of the Console, you will see session info, e.g.  R version 3.6.2 (2019-12-12) – “Dark and Stormy Night”

This tells us what version of R that RStudio is using.

You can also check the version with the version command:

version
##                _                           
## platform       x86_64-w64-mingw32          
## arch           x86_64                      
## os             mingw32                     
## system         x86_64, mingw32             
## status                                     
## major          3                           
## minor          6.2                         
## year           2019                        
## month          12                          
## day            12                          
## svn rev        77560                       
## language       R                           
## version.string R version 3.6.2 (2019-12-12)
## nickname       Dark and Stormy Night

After you load packages, sometimes you will see a Warning Message in red text below the any conflicts:

Warning message: package 'tidyverse' was built under R version [x.y.z]

If you see a message like that, it is time to update R. Updating R means that you have to download and install R again. By default, your computer will keep your old version of R, and you can decide if you want to delete it or not. RStudio will automatically recognize the new version of R. When you install a new version of R, you have to re-install your packages. Windows users can try the installr package.

It is a good idea to occasionally check for package updates (e.g., tidyverse): Tools…Check for Package Updates.

It is a good idea to occasionally check for RStudio updates: Help…Check for Updates.

Keep learning

  • Come to more R Workshops at UVA Library!
    • Data Wrangling in R
    • Data Visualization with R (ggplot2)
    • Shine Web Apps in R
    • Introductory Statistics with R
    • Linear Modeling with R
    • Modeling Count Data with R
    • New this year! Partnering with Research Computing on “R for Big Data” workshops
    • Also Intro to Git and GitHub (version control)
  • Register for the Research Data Services newsletter to be notified of future workshops.

In-person R communities and help!

For those of you looking to build community or just want one-on-one support, we are lucky that we have plenty of in-person/local opportunities

Finding help online

The great thing about R is that you can very often find an answer to your question online.

Don’t forget the “official” help resources from R/RStudio.

  • Read official package documentation, see vignettes, e.g., Tidyverse documentation
  • Use the RStudio Cheat Sheets
  • Use the RStudio Help viewer by typing ? before a function or package
  • Check out the Keyboard Shortcuts Help under Tools in RStudio for some good tips

  1. Credit to Modern Dive for the R and RStudio analogies, and to Marieke Jones and David Martin’s HSL Intro to R Workshop.

  2. Credit to Martin Chan RStudio Projects and Working Directories: A Beginner’s Guide.

  3. Credit to Leon Jessen How to Organize a Project.