Environment

Environmental Sustainability


Predictive Enforcement of Hazardous Waste Regulations

The Environmental Protection Agency (EPA) regularly conducts inspections of facilities that handle hazardous materials. Of the 1,500 inspections that the EPA conducts annually, approximately 30-40% of these inspections lead to finding a violation. The process for prioritizing inspections varies between regions; some inspections are prescribed (for example, large quantity generators are supposed to be inspected at least once every five years), some are chosen based on national priorities (a certain chemical or industry might be of interest) and the remainder are chosen by regional managers. Regional managers use their domain knowledge to select facilities to inspect. The EPA wants to adopt a data driven approach to investigation targeting, using historical inspection data to predict the risk of severe violations.

Using EPA data on reporting, monitoring, and enforcement, DSaPP developed and evaluated predictive models to identify likely violators. We used temporal cross-validation to evaluate our best models, and found that in terms of precision in the top 5%, our model was able to perform nearly twice as well as the baseline. As a result, the EPA will be able to rank potential violators, better allocate inspection resources, and maximize the impact of each investigation to keep America‚Äôs air and water clean. In fact, this is projected to correspond to an additional reduction of 620,000 tons of pollution every year. DSaPP’s predictive model will also serve as proof of concept for how the EPA can use predictive analytics in the future.

DSaPP is currently working with the EPA and EPIC (University of Chicago Urban Labs) to implement a field trial to test the efficacy of the model, and hope to start the trial in 2016. We are also working to improve the model, which will include incorporating new data sets from the EPA and from external sources, generating features based on more refined spatial and temporal scales, and analyzing the performance of the model at a regional level in addition to the national level.