Wednesday, June 29 • 2:30pm - 3:30pm
The Use of Ensemble Learning Methods in Open Source Data Challenges

Poster #9

As data collection grows in size and complexity across a variety of industries, open source data challenges are becoming more widespread. We present our experience developing prediction models within the context of data challenges. With the goal of maximizing predictive performance, we explored ensemble learning methods to train our models. We demonstrate the use of these methods using R packages such as h2o and h2oEnsemble and cloud computing platforms. In order to obtain an approximation of our predictive ability prior to challenge submission, we developed wrapper code to perform cross validation on the H2O ensembles. We display our process for determining the expected level of performance of the trained model on external data sources. References: Spencer Aiello, Tom Kraljevic, Petr Maj and with contributions from the H2O.ai team (2015). h2o: R Interface for H2O. R package version https://CRAN.R-project.org/package=h2onnErin LeDell (2016). h2oEnsemble: H2O Ensemble Learning. R package version 0.1.6. https://github.com/h2oai/h2o-3/tree/master/h2o-r/ensemble/h2oEnsemble-package


Rebecca Zabel Krouse

Rho Inc., Federal Systems Division, Chapel Hill, NC

Wednesday June 29, 2016 2:30pm - 3:30pm
Sponsor Pavilion 326 Galvez Street Stanford, CA 94305-6105

