Introduction
- Computing is essential in science and (almost) all data are digital
- A set of good enough practices can make you more efficient
- Future you will thank past you for adopting good practices
- Shared Principles: planning, modular organisation, names, documentation
Data Management
- Raw data is the data as originally generated – it should be kept read-only
- Raw data has to be backed up in more than one location
- Create the data you wished you have received
- Keeping track of your actions is a key part of data management
- The Digital object identifiers (DOIs) is a unique identifier that permanently identifies data and makes it findable
- Finding a repository tailored to your data is key to making it findable and accessible by the broader community
Code and Software
- Any code that runs on your research data is research software
- Write your code to be read by other people, including future you
- Decompose your code into modules: scripts and functions, with meaningful names
- Be explicit about requirements and dependencies such as input files, arguments and expected behaviour
Collaboration
- Create an overview of your project
- Create a shared “to-do” list
- Decide on communication strategies
- Make the license explicit
- Make the project citable
Project Organization
- A good file name suggests the file content
- Good project organization saves you time
Keeping Track of Changes
- Small, frequent changes are easier to track
- Tracking change systematically with checklists is helpful
- Version control systems help adhere to good practices
Manuscripts
- Have all authors agree on a workflow before the writing starts
- Email-based workflows work better with informative filenames and clear co-ordination
- Text-based documents with version control scale better, if co-authors are familiar with the tools
- Single Master Online approaches can be an effective compromise
What To Do Next
- Learning good practices is a long-term process
- Different people make different contributions to good practice
Agile
- The Agile approach is to break problems into smaller tasks and fully address them in short, iterative work cycles (sprints), with each cycle ending in review and discussion before planning the next cycle.
- Aspects of this approach may be useful in data science work.