Introduction to Literate Programming
Overview
Teaching: 10 min
Exercises: 20 minQuestions
What is literate programming?
Why would I want to use it?
Objectives
To explain the features of literate programming
To discuss the benefits of literate programming
Data analysis reports
Data analysts tend to write a lot of reports, describing their analyses and results, for their collaborators or to document their work for future reference.
Many new users begin by first writing a single R script containing all of the work. Then simply share the analysis by emailing the script and various graphs as attachments. But this can be cumbersome, requiring a lengthy discussion to explain which attachment was which result.
Writing formal reports with Word or LaTeX can simplify this by incorporating both the analysis report and output graphs into a single document. But tweaking formatting to make figures look correct and fix obnoxious page breaks can be tedious and lead to a lengthly “whack a mole” game of fixing new mistakes resulting from a single formatting change.
Creating a web page (as an html file) by using R Markdown makes things easier. The report can be one long stream, so tall figures that wouldn’t ordinary fit on one page can be kept full size and easier to read, since the reader can simply keep scrolling. Formatting is simple and easy to modify, allowing you to spend more time on your analyses instead of writing reports.
Discussion
What does your current document creation process look like?
Literate programming
Ideally, such analysis reports are reproducible documents: If an error is discovered, or if some additional subjects are added to the data, you can just re-compile the report and get the new or corrected results (versus having to reconstruct figures, paste them into a Word document, and further hand-edit various detailed results).
This sort of idea has been called “literate programming”. It links descriptive text blocks with code to produce a document that is both human and computer readable.
Markdown as universal format
We will be using R Markdown to create reports, which as the name suggests mixes R code with Markdown. Markdown is a light-weight mark-up language initially created for writing simple web pages. A number of other conversion options exist however, allowing us to turn Rmarkdown files into HTML, PDF, Word, and other document formats.
This allows us to produce outputs that are directly linked with our data and that can be recreated as needed if the input data changes.
Discussion
What are some of the potential benefits of this approach in your own work? Are there any challenges you can identify?
Know your audience
As these documents are not just code for yourself, but outputs that are meant to be shared with others, knowing who you are writing for becomes important. You may need to spend a lot of time explaining a complex figure if you are producing a document for a more general audience. Conversely, a more technical audience may want a lot more detail about the particulars of your analysis methods before they will be interested in seeing your results.
Key Points
Literate programming allows mixing text, code and outputs
It enables reproducible research by coupling analysis with reporting