Tidy your data
Overview
Expected Time: 6 to 12 hours
Due Date: negotiablePrerequisites
completion of data school FOCUS units on git, Rstudio, R & tidyverse
access to genuine data you have permission to work with and can share
Objectives
Practice good project set up using R projects, folder structures, and git repositories
Develop problems solving skills using tidyverse verbs and functions
Appreciate how tidy data and the tidyverse relates to your own work
Background
For this assignment, you will go through the process of setting up a project, importing your data, tidying it, and tracking your work on git. Applying what you have learned to your own data is the best way to reinforce your learning, and ensure your new skills are relevant to your work.
Requirements
A data set that relates to your current or future work, that you have permission to work with. Your data set can be legacy data, data from a previous publication, or newly collected data that you are about to start working on. Ideally, your data set should:
- have between 4 and 10 variables;
- have between 20 and 100 observations for each variable
- be as ‘raw’ as possible (e.g., instead of reporting a mean value, it should include every individual observation that went into calculating that mean)
If your typical data set does not meet these suggested parameters, please discuss with your lead facilitator.
Task
1
Set up a new R project for your work. Make sure your project is under version control using git.
Hint:
2
Establish a suitable directory (folder) structure for organising your work.
Hint:
Here, we discuss project organisation and directory structures. This is a guide, not an authoritative list. Which folders to you absolutely need for your project? What folders do you think you might need in the future?
3
Setup github as a remote repository. Link your local and your remote repository.
- what is the difference between git and github?
- why do you want a remote repository?
Hint:
Instructions (Start reading from ‘Create a new repository on GitHub’)
4
Download your data to the appropriate directory.
- Do you want your data available on github? If not, add the path to your data to your .gitignore file
5
Set up a readme file Consider including:
- description of your data, project, and directory structure
- any licences or restrictions on your data
- how to obtain your data
6
Create a new R script that loads and tidies your data. Track your work using git.
- are there other files your want to stage and commit as well?
- where will you put your ‘tidied’ data?
- what format do you want to save your ‘tidied’ data in?
7
Once you are happy with your work, make sure your have pushed everything and submit the URL for your github repository to your lead facilitator via slack.
- how often do you think you should push your work?
- how often do you think you should commit your work?
Peer assessment
You will be given a colleagues’ URL for their github repository. Your job is to provide some feedback and reflection on their work. Please be kind!
- Clone their repository to your local machine. You can do this by clicking file > new project > versions control > git and following the prompts. Look closely at where you are cloning to. You don’t want to end up with nested git repositories or R projects!
- Create a new text file called ‘feedback.txt’ inside a docs folder (make the docs folder if it doesn’t exist already).
- Provide some feedback to your colleague. Consider the following:
- do you like their choice of project name? Why?
- does a readme exist, and do you understand it? What did you find really helpful? What information do you think is missing?
- is the directory structure logical? Can you suggest any changes?
- (if you have access to the data) try running the code. Do you get the output you expect?
- how easy is the code to understand? Are their comments throughout the code?
- look at their commit history (either through R studio or github). How frequently have they committed? How easy is it to understand their work from the commit messages and history?
- Save your feedback file, stage and commit, and push to github. Send a friendly slack message once you are done, so they can read your feedback.
Key Points