Loading…
This event has ended. Visit the official site or create your own event on Sched.
Click here to return to main conference site. For a one page, printable overview of the schedule, see this.
View analytic
Wednesday, June 29 • 11:42am - 12:00pm
Classifying Murderers in Imbalanced Data Using randomForest

Log in to save this to your schedule and see who's attending!

In order to allocate resources more effectively with the goal of providing safer communities, R's randomForest algorithm was used to identify candidates who may commit or attempt murder. And while crime data within the general population may be highly imbalanced, one may expect the rate of murderers within a high-risk probationer population to be much less imbalanced. However, the County of Los Angeles had nearly 130 probationers commit or attempt murder out of nearly 17,000, a ratio close to 1:130). Classic methods were used to overcome class imbalance, including under/over stratified sampling and variable sampling per tree. The results were encouraging. Model validation tests demonstrate an 87% overall accuracy rate at relatively low costs. The agency currently uses a risk assessment tool that was outperformed by randomForest up to 52% (both in overall accuracy and a reduction in false positives). This work is based on research conducted by Berk, R. et al. (2009) originally published by Journal of the Royal Statistical Society.

Moderators
avatar for Karthik Ram

Karthik Ram

co-founder, rOpenSci
Karthik Ram is a co-founder of ROpenSci, and a data science fellow at the University of California's Berkeley Institute for Data Science. Karthik primarily works on a project that develops R-based tools to facilitate open science and access to open data.

Speakers
avatar for Jorge Alberto Miranda

Jorge Alberto Miranda

Analyst, County of Los Angeles
I have been an R user since 2013 when I first started working with a data reporting team at the Los Angeles County Probation Department. One of my goals in life is to convert more of my colleagues into R users and make R part of the County toolkit. With nearly 100,000 employees, I realize that may take a while. However, as a huge promoter of R, I have managed to train a few others and make it part of some daily tasks. | | Looking forward to... Read More →


Wednesday June 29, 2016 11:42am - 12:00pm
McCaw Hall 326 Galvez Street Stanford, CA 94305-6105

Attendees (112)