Loading…
This event has ended. Visit the official site or create your own event on Sched.
Click here to return to main conference site. For a one page, printable overview of the schedule, see this.
View analytic
Thursday, June 30 • 1:35pm - 1:55pm
Scalable Machine Learning in R with H2O

Log in to save this to your schedule and see who's attending!

The focus of this talk is scalable machine learning using the H2O R packages. H2O is an open source, distributed machine learning platform designed for big data, with the added benefit that it's easy to use on a laptop (in addition to a multi-node Hadoop or Spark cluster). The core machine learning algorithms of H2O are implemented in high-performance Java, however, fully-featured APIs are available in R, Python, Scala, REST/JSON, and also through a web interface.

Since H2O's algorithm implementations are distributed, this allows the software to scale to very large datasets that may not fit into RAM on a single machine. H2O currently features distributed implementations of Generalized Linear Models, Gradient Boosting Machines, Random Forest, Deep Neural Nets, dimensionality reduction methods (PCA, GLRM), clustering algorithms (K-means), anomaly detection methods, among others. The ability to create stacked ensembles, or "Super Learners", from a collection of supervised base learners is also available in H2O.

R code with H2O machine learning examples will be demoed live and made available on GitHub for attendees to follow along on their laptops.  A demo of how to run H2O on a multi-node cluster on Amazon EC2 will also be given.

Moderators
avatar for Douglas Wood

Douglas Wood

Sr. Software Developer, Stanford University School of Medicine

Speakers

Thursday June 30, 2016 1:35pm - 1:55pm
Lane & Lyons & Lodato 326 Galvez Street Stanford, CA 94305-6105