Project directories
Overview
Teaching: 15 min
Exercises: 20 minQuestions
How should I organise my project files?
Objectives
Understand the key features of a project directory
Design a project structure that separates components logically
Challenge 1 - understanding your filesystem
Download this data file and save it on your computer.
Files on your computer are stored in a kind of ‘heirachy’ or ‘tree’ structure called a filesystem.
On a piece of paper, using windows explorer (or any other tool of your choice), make a drawing that represents the location of your file on your computer.
Solution
An example might be:
“Users” folder -> “stephen” folder -> “Downloads” folder -> “gapminder.csv” file
If the previous lesson provided a bad example of project organisation, what would a good one look like? The specifc answer will depend on your project, but general idea is that someone else (probably yourself in six months) should be able to look at your project and understand what everything does. Some good principles to help achieve this can be found in Good Enough Practices for Scientific Computing and A Quick Guide to Organising Computational Biology Projects.
Challenge 2
Look at the example from Figure 1 of A Quick Guide to Organising Computational Biology Projects. The large text shows folder names and the smaller text file names. Can you explain the purpose of each folder and file?
Some specific recommendations to help achieve this goal are the following. Feel free to use them if they work for your projects.
Put each project in its own directory, which is named after the project.
All the files necessary for the project should be kept within this directory, and no unnecessary files should be kept here. This means that the project directory is a self-contained unit of analysis easily able to be backed up, transferred or managed without having to worry about the state of other files elsewhere on the computer.
Put text documents associated with the project in a doc
directory.
This location stores long form text documents describing and documenting the project as a whole. It is also a good place to store documents that start to interpret the results of a project, such as the preparation of a manuscript for publication.
Put raw data and metadata in the data
directory,
The raw data you use for a project should be be considered read only and not modified. If there are
steps required for data processing and cleaning, make sure to document those steps carefully (learning
to script your data processing will help here) and store the cleaned data separately (perhaps in a
derived_data
directory)
Put files generated during analysis in a results
directory.
Anything that could be considered output of our analysis goes in the results folder. Examples of files that might go here include figures, modelling output, summarised data tables, etc.
Put source for the project’s scripts and programs in the scripts
directory, and programs brought in from elsewhere or compiled locally in the bin
directory.
This distinction is most relevant when you are both writing your own code, as well as using external
programs in an analysis. For most simple projects however, you can make do with a single scripts
folder that stores your analysis code.
Name all files to reflect their content or function.
It’s no use having a sensible directory structure if your files are named cryptically. When you look
back on the project in the future, fig1.png
will likely be meaningless and you will have to open
several files to find the one you are after. Instead, a name like weight_comparison_plot.png
makes
it very clear what the file contains, allowing you to find the relevant file faster.
Challenge 3
Sketch out a proposed project directory for one of your own projects. Include some example file names that you would create in the course of the analysis.
Key Points
Project directories should contain a single self-contained unit of work
Clear and logical organisation and naming aids understanding of the project