As a statistical consultant, I often find myself repeating similar steps for data analysis projects. These steps follow a pattern of reading, cleaning, summarising, plotting and analysing data then producing a report. This is always an iterative process because many of these steps need to be repeated, especially when quality issues are present or overall goals change. Reproducibility becomes more difficult with increasing complexity.
For very small projects or toy examples, we may be able to do all analysis steps and reporting in a single markdown document. However, to increase efficiency for larger data analysis projects, a modular programming approach can be adopted. Each step in the process is then carried out using separate R syntax or markdown files. GNU Make automates the mundane task of regenerating output given dependencies between syntax, markdown and data files in a project. For instance, if we store results from time consuming analyses and radically change a report, we only need to rerun the R markdown file for reporting. On the other hand, if initial data are changed, we rerun everything. In both cases, we can set up our favourite IDE to use Make and simply press the 'build' button.
To extend Make for R, Rmarkdown, SAS and STATA, I have written pattern rules which are available on github. These are used by adding a single line to the project Makefile. An overall strategy and constructing a simple Makefile for a data analysis project will be briefly outlined and demonstrated.