These are the goals for today’s workshop:
The intended audience is beginner-level, with no previous experience using R.
We will focus on using R for data analysis throughout this series.
R is the underlying statistical computing environment. You can think of this like the engine of a car. That makes RStudio like the dashboard1.
Basically, RStudio makes it easier to use R because it is easier to run and execute code. After you install R and RStudio, you only need to run RStudio.
This is what RStudio looks like when you first open it:
RStudio shows four panes by default. The two most important are the Console (bottom left) and the Script Editor (top left).
>
and press Enter to execute.?
before a function name in the console to get info in the Help section.Use R scripts to save your work for future analysis. They are an essential part of reproducibility, either for collaborators, or your future self. FYI, R script files end with “.R”
Open the R script that you downloaded and unzipped in advance of the workshop:
File…Open File…Navigate to the folder where you saved the workshop files and open this file: intro_R.R Let’s look at our R script together.
Recall that functions are the “verbs” that allow us to manipulate data. Packages contain functions, and all functions belong to packages. For example, read.csv
belongs to the utils
package.
R comes with about 30 packages (“base R”). There are over 10,000 user-contributed packages; you can discover these packages online in Comprehensive R Archive Network (CRAN). You can find more in active development on GitHub, etc. CRAN packages are validated to a certain degree; “buyer beware” with packages on GitHub.
A prevalent collection of packages is tidyverse
. Tidyverse
is actually a collection of several packages. We will take a look at that package in our script.
In order to use a package, you must install it first.
tidyverse
(or a different package name) then click on Install.install.packages("tidyverse")
Once you have installed the package, you must load the package in any new R session when you want to use that package: type library(tidyverse)
to load the tidyverse package.
Install a package once, but load it every time!
The working directory is where R pulls files to work with. This is where your datasets, scripts, etc. live. It can be any folder location. (It doesn’t have to be the same folder where you installed R.)
R always has a working directory set. Get your working directory with this command in the console: getwd()
Let’s change the working directory to wherever you saved the R script and albemarle_homes_2020.csv. You can change the working directory a few ways.
To get your file path:
\
(backslash) to separate folders in file paths, but R requires /
(forward-slash).This is an example of how I set my working directory using the console: setwd("C:/Users/jah2ax.ESERVICES/Box Sync/_R/workshops/workshops_teaching/intro_to_R_Spring_2020")
Verify that you have the right directory with getwd()
. Note that you can see the working directory listed at the top of the Console.
We will be working with Albemarle County real estate data.
You can import almost any kind of data into R. Your best bet for figuring out how to import a dataset is to google “how to import [file type] into R.” You will likely have to install a package in order to do it. For example, haven
is a popular package for importing Stata, SPSS, and SAS files. Remember you can find the official documentation of packages online at CRAN: https://cran.r-project.org/web/packages/haven/index.html
Jump back to our R script “Clear our workspace” (~line 173).
Congrats for making it this far! How do we save your R files? What should we save?
I know I just showed you setwd()
. This is actually terrible for reproducibility. Recall the fiddly part where I gave you a script with my working directory set, e.g., setwd("C:/Users/jah2ax/Box Sync/_R/workshops/intro_to_R_Spring_2020")
.The script I gave you doesn’t work, because the path I gave you is only good for me and my computer, at this point in time. Using these absolute paths for your directories with setwd()
makes it very hard to share your data with collaborators, and eventually yourself in the future if you move your files without updating your scripts. Don’t worry, there is a better way!
You can keep all the files associated with your project together into a R Project. Click on the icon that looks like an R in a cube in the upper right corner of RStudio.
Why is this helpful? Projects allow RStudio to leave notes for itself (e.g., history). But you also get:
You don’t have to worry about setting the working directory at the top of a script, which is good for your collaborators. Remember, your most important collaborator is you six months from now!
Additionally, you have the option of clearing your workspace when you start a new R session. By doing this, you can be sure that you haven’t unintentionally pre-loaded something that won’t work for your collaborotor (or you in a few months). Go to: Tools…Global Options…General - make sure “Restore .RData into workspace at startup” is unchecked and “Save workspace to .RData on exit” is set to Never.
You can make sure that you are storing calculations, and not simply the results. You can verify by using this keystroke pattern:
What should you save? Your script (plus your raw, unprocessed data, of course) is your most important thing. If you have your script, you can reproduce everything. You’ll likely end up saving cleaned data, plots, etc., but those all flow from your script.
Jump back to RStudio and our R script: “Projects” (~line 379).
Let’s take a look at how to organize your file structure. It’s a good idea to structure your files so that someone else (or future you!) can make sense of your analysis easily. Here’s a visual of a simple file structure2.
Let’s take a look at a super-simplified project based on the work we just did today. Download and unzip the project_example.zip file, and let’s investigate!
Your projects are going to get more complex than this simple structure. Here’s another example where there are multiple datasets, scripts, and results3.
Since most people in our workshop have freshly installed R and RStudio, we hopefully didn’t encounter any issues with old versions today. But eventually, you will have to update R.
Remember that at the top of the Console, you will see session info, e.g. R version 3.6.2 (2019-12-12) – “Dark and Stormy Night”
This tells us what version of R that RStudio is using.
You can also check the version with the version command:
version
## _
## platform x86_64-w64-mingw32
## arch x86_64
## os mingw32
## system x86_64, mingw32
## status
## major 3
## minor 6.2
## year 2019
## month 12
## day 12
## svn rev 77560
## language R
## version.string R version 3.6.2 (2019-12-12)
## nickname Dark and Stormy Night
After you load packages, sometimes you will see a Warning Message in red text below the any conflicts:
Warning message: package 'tidyverse' was built under R version [x.y.z]
If you see a message like that, it is time to update R. Updating R means that you have to download and install R again. By default, your computer will keep your old version of R, and you can decide if you want to delete it or not. RStudio will automatically recognize the new version of R. When you install a new version of R, you have to re-install your packages. Windows users can try the installr package.
It is a good idea to occasionally check for package updates (e.g., tidyverse): Tools…Check for Package Updates.
It is a good idea to occasionally check for RStudio updates: Help…Check for Updates.
For those of you looking to build community or just want one-on-one support, we are lucky that we have plenty of in-person/local opportunities
The great thing about R is that you can very often find an answer to your question online.
Don’t forget the “official” help resources from R/RStudio.
?
before a function or packageGrolemund. G., and Wickham, H., R 4 Data Science, 2017.
Check out Stat545 - Data wrangling, exploration, and analysis with R. University of British Columbia.
Twitter:
Jennifer Bryan, Jim Hester. What They Forgot to Teach You About R For more information about R Projects, file structure, maintaing R
Leon Jesson, How to Organize a Project, 2018 Feb 15.
Credit to Modern Dive for the R and RStudio analogies, and to Marieke Jones and David Martin’s HSL Intro to R Workshop.↩
Credit to Martin Chan RStudio Projects and Working Directories: A Beginner’s Guide.↩
Credit to Leon Jessen How to Organize a Project.↩