Tuesday, June 28 • 2:30pm - 3:30pm
Writing a dplyr backend to support out-of-memory data for Microsoft R Server

Poster #10

Over the last two years, the dplyr package has become very popular in the R community for the way it streamlines and simplifies many common data manipulation tasks. A feature of dplyr is that it’s extensible; by defining new methods, one can make it work with data sources other than those it supports natively. The dplyrXdf package is a backend that extends dplyr functionality to Microsoft R Server’s xdf files, which are a way of overcoming R’s in-memory limitations. dplyrXdf supports all the major dplyr verbs, pipeline notation, and provides some additional features to make working with xdfs easier. In this talk, I’ll share my experiences writing a new back-end for dplyr, and demonstrate how to use dplyr and dplyrXdf to carry out data wrangling tasks on large datasets that exceed the available memory.


Hong Ooi


Tuesday June 28, 2016 2:30pm - 3:30pm
Sponsor Pavilion 326 Galvez Street Stanford, CA 94305-6105

