Home 2018-05-20T05:45:37+00:00

Data. Policy. Impact.

The Center for Data Science and Public Policy (DSaPP) at the University of Chicago is a collaboration between The Harris School of Public Policy and the Computation Institute to further the use of data science in policy research and practice. Our work includes educating current and future policymakers, doing data science projects with government, nonprofit, academic, and foundation partners, and developing new methods and open-source tools that support and extend the use of data science for public policy and social impact. Our team is made up of data scientists and researchers from computer science, statistics, and social science backgrounds to bring in methods from all of these disciplines, software engineers to make sure our work becomes usable code and implemented, topic and policy experts to provide context and relevance, and project managers who help get things done.

We believe that effective use of data and computational methods is critical in making adaptive and personalized policies that improve lives of everyone in a measurable, fair, and equitable manner.

Our Work


We run training programs, workshops, and tutorials for students, government agencies, non profits, foundations, and corporations. Some of our trainings include:
  • The value of data driven decision-making (for managers and executives in government agencies and non profits)
  • How to scope data science projects
  • Assessing your Data maturity
  • Hands-on Technical Trainings including the Applied Data Analytics for Public Policy (run jointly with NYU and UMD)
Our trainings for governments and non profits are designed for Directors and Executives of organizations as well as Analysts and Policymakers.

Data Science Projects

We work with governments, non-profits, and other organizations on data science projects across health, criminal justice, public safety, education, economic development, transportation, and more. Most of our projects tackle operational problems that have tangible impact, and result in software that can be used by our partner organizations (and others) for social impact and improved policies. Recent examples of our projects include:
  • Building Data-Driven Police Early Intervention Systems
  • Prioritizing Preventative Lead Hazard Inspections
  • Prioritizing Health and Safety Housing Inspections
  • Reducing incarcerations by identifying at risk individuals in need of social services

Research Areas

Our research initiatives are motivated by working on hands-on data science projects with governments, non-profits, and other policy organizations. As we tackle policy problems, we identify open areas where existing methods from computer science, machine learning, artificial intelligence  or social sciences are lacking and formulate our research initiatives to fill those gaps. We then push the results of our research back into our data science tools so they can be used across our projects and by our project partners. We are currently working on:
  • Auditing  and Correcting for Bias and Equity Issues in Data Science Systems
  • Increasing the interpretability and transparency of machine learning models used in policy decisions
  • Designing experimental validation methodologies for machine learning systems
  • Developing methods for monitoring and updating deployed data science systems

Data Science Pipelines and Tools

We believe in open and reusable code and tools. All of our (non-confidential) project code is available under an open source license on our github page. All of our internal data science tools are also available for other organizations to use. Examples of such tools include:
  • Triage: Our data science pipeline platform that’s used in many of our internal projects, which contains components for generating features, building machine learning models, and evaluating those models.
  • Entity Deduplication Tool (pgdedupe)
  • Post-Modeling Tools for analyzing the models built, feature importances, and exploring the outputs of those models before deployment.
  • Bias Audits: To run bias audits on the outputs of machine learning models



  • Introducing the Training Provider Outcomes Toolkit August 29, 2017
      Trying to find a good job training program can be daunting. How do you know if the skills they teach are valuable in the job market? And how can you be sure they’re doing a good job of teaching thos

  • Introducing the Data@Work Research Hub and Funding Opportunity August 3, 2017
      With the rapid growth and adoption of technologies like artificial intelligence having uncertain and disparate effects on labor markets, the need for high-quality interdisciplinary research into lab

  • Data-Driven Inspections for Safer Housing in San Jose, California July 14, 2017
    The Multiple Housing team in San Jose’s Code Enforcement Office is tasked with protecting the occupants of properties with three or more units, such as apartment buildings, fraternities, sororities, a

  • Combining Data and Behavioral Science to Reduce Water Shut Offs June 28, 2017
    Behavioral science and psychology can help explain the underlying reasons for seemingly irrational behavior which cities aim to address, such as using payday loans or committing property crimes. Unfor

  • Introducing pgdedupe! June 21, 2017
    Combining datasets and performing large aggregate analyses are powerful new ways to improve service across large populations. Critically important in this task is the deduplication of identities acros

Our Project Partners


The future of Public Policy is open, adaptive, scalable, micro-policies that  benefit everyone in a measurable, equitable, and fair manner. We can help get there.



Contact Us