Loading…
This event has ended. Visit the official site or create your own event on Sched.
Click here to return to main conference site. For a one page, printable overview of the schedule, see this.
View analytic
Wednesday, June 29 • 10:48am - 11:06am
Size of Datasets for Analytics and Implications for R

Log in to save this to your schedule and see who's attending!

With so much hype about "big data" and the industry pushing for distributed computing vs traditional single-machine tools, one wonders about the future of R. In this talk I will argue that most data analysts/data scientists don't actually work with big data the majority of the time, therefore using immature "big data" tools is in fact counter productive. I will show that contrary to widely-spread believes, the increase of dataset sizes used for analytics has been actually outpaced in the last 10 years by the increase in memory (RAM), making the use of single-machine tools ever more attractive. Furthermore, base R and several widely used R packages have undergone significant performance improvements (I will present benchmarks to quantify this), making R the ideal tool for data analysis on even relatively large datasets. In particular, R has access (via CRAN packages) to excellent high-performance machine learning libraries (benchmarks will be presented), while high-performance and parallel computing facilities have been part of the R ecosystem for many years. Nevertheless, the R community shall of course continue pushing the boundaries and extend R with new and ever more performant features.

Moderators
avatar for Dirk  Eddelbuettel

Dirk Eddelbuettel

Debian and R Projects

Speakers
avatar for Szilard  Pafka

Szilard Pafka

Chief Data Scientist, Epoch
Szilard studied Physics in the 90s and has obtained a PhD by using statistical methods to investigate the risk of financial portfolios. Next he has worked in a bank quantifying and managing market risk. About a decade ago he moved to California to become the Chief Scientist of a credit card processing company doing everything data (ETL, analysis, modeling, visualization, machine learning etc). He is also the founder/organizer of several data... Read More →


Wednesday June 29, 2016 10:48am - 11:06am
Econ 140 579 Serra Mall, Stanford, CA 94305

Attendees (140)