Loading…
This event has ended. Visit the official site or create your own event on Sched.
Click here to return to main conference site. For a one page, printable overview of the schedule, see this.
View analytic
Wednesday, June 29 • 1:36pm - 1:54pm
GNU make for reproducible data analysis using R and other statistical software

Log in to save this to your schedule and see who's attending!

As a statistical consultant, I often find myself repeating similar steps for data analysis projects. These steps follow a pattern of reading, cleaning, summarising, plotting and analysing data then producing a report. This is always an iterative process because many of these steps need to be repeated, especially when quality issues are present or overall goals change. Reproducibility becomes more difficult with increasing complexity.

For very small projects or toy examples, we may be able to do all analysis steps and reporting in a single markdown document. However, to increase efficiency for larger data analysis projects, a modular programming approach can be adopted. Each step in the process is then carried out using separate R syntax or markdown files. GNU Make automates the mundane task of regenerating output given dependencies between syntax, markdown and data files in a project. For instance, if we store results from time consuming analyses and radically change a report, we only need to rerun the R markdown file for reporting. On the other hand, if initial data are changed, we rerun everything. In both cases, we can set up our favourite IDE to use Make and simply press the 'build' button.

To extend Make for R, Rmarkdown, SAS and STATA, I have written pattern rules which are available on github. These are used by adding a single line to the project Makefile. An overall strategy and constructing a simple Makefile for a data analysis project will be briefly outlined and demonstrated.

Moderators
HE

Hana Ševčíková

University of Washington

Speakers
PJ

Peter John Baker

Senior Lecturer/Statistician, University of Queensland
Statistical Consultant in Public Health. R user since late 90s. Written several R packages. Teach statistics and R


Attendees (82)