Introduction to R and R Studio
|
|
Using R
|
R has the usual arithmetic operators and mathematical functions.
Use <- to assign values to variables.
Use ls() to list the variables in a program.
Use rm() to delete objects in a program.
Use install.packages() to install packages.
|
Data
|
The basic data types in R are double, integer, complex, logical, and character.
Vectors are an ordered collection of data of the same type.
Create vectors with c() .
Lists are an ordered collection of data that can be any type.
|
Getting help in R
|
|
Reading Data In
|
|
Tidy Data
|
For data to be tidy, each variable must be in its own column
For data to be tidy, each case must be in its own row
For data to be tidy, each value must be in its own cell
|
Dataframes
|
Dataframes (or tibbles in the tidyverse) are lists where each element is a vector of the same length
Use read_csv() to read comma separated files into a data frame
Use nrow() , ncol() , dim() , or colnames() to find information about a dataframe
Use head() , tail() , summary() , or glimpse() to inspect a dataframe’s content
|
Selecting columns
|
Use select() to choose variables from a dataframe.
Helper functions make it easier to select the correct columns.
Use rename() to rename variables without dropping columns.
|
Extract rows
|
|
Creating New Columns
|
Use mutate() to create new variables from old ones.
You can create new variables using any function that returns a vector of the same length as the data frame.
Use group_by() to group your data based on a variable.
|
Summarise and Grouping
|
|
Adding and Combining Datasets
|
|
Putting it all together
|
Data analyses can be broken down into discrete stages
Most data analysis stages fit into a small number of types
Pipes pass their left hand side through as the first argument of the right hand side.
Pipes make your code more readable, but be careful of going overboard.
|
Gather & Spread
|
Use the tidyr package to change the layout of dataframes.
Use gather() to go from wide to long format.
Use spread() to go from long to wide format.
|
Cleaning Data
|
|
Writing Data
|
Intermediate data objects do not need to be written to disk
Write data in an appropriate format
Write data to the most useful location
|
Reproducibility
|
A script is a discrete unit of analysis
A script will be run in the context of an environment
Software (and compute) dependencies need to be considered
|