Writing Data
Overview
Teaching: 15 min
Exercises: 5 minQuestions
How do I write data to disk after working in R?
Objectives
Know when and how to write data
At some point, you will probably want to write some data out from R when you have finished your
analysis. Just as with the read_csv
function (or other read_XXX
functions we covered in the
data import lesson),
there are various write_XXX
functions to write data frames out in different formats. It is usually best to write
your data into a plain text format, particularly if you may need to use it again in later analysis.
What to write
Not all data needs to be written to a file when you have finished an analysis. Any data from intermediate steps in your analysis can always be recreated by running your code again, so does not need to be saved. Only keep the final results of your analysis, ie. the figures or tables you would show to other people to explain what you have found.
Where to keep it
Where your data should be stored will depend on how large it is. For small to moderately sized data, use a relative path to save it within your project’s structure. This allows your entire project - data, results, and the code to create the results - to be portable and shared easily if needed. It’s a good practice to keep your raw data and your modified data distinct, so make sure you set up a new folder to store your results.
Never overwrite your raw data with modified data
For large data, you will likely have a fixed location for storage. This could be a hard drive or a cloud storage server. Use an absolute path to make sure your results are being written to the correct location. But consider if saving the storage path as a variable is of benefit should you need to change the storage location in the future.
Challenge 1
Create a new folder called
processed_data
in your project folder. Write just the Australian gapminder data to a csv file in this folder. Open the created file in a text editor to confirm that it has written correctly.Solution to Challenge 1
aust_data <- gapminder %>% filter(country == "Australia") write_csv(aust_data, path = "processed_data/aust_gapminder.csv")
Key Points
Intermediate data objects do not need to be written to disk
Write data in an appropriate format
Write data to the most useful location