Data. Policy. Impact.
The Center for Data Science and Public Policy (DSaPP) at the University of Chicago is a collaboration between The Harris School of Public Policy and the Computation Institute to further the use of data science in policy research and practice. Our work includes educating current and future policymakers, doing data science projects with government, nonprofit, academic, and foundation partners, and developing new methods and open-source tools that support and extend the use of data science for public policy and social impact. Our team is made up of data scientists and researchers from computer science, statistics, and social science backgrounds to bring in methods from all of these disciplines, software engineers to make sure our work becomes usable code and implemented, topic experts and project managers who help get things done.
We run training programs, workshops, and tutorials for students, government agencies, non profits, foundations, and corporations. Some of our trainings include:
- The value of data driven decision-making (for managers and executives in government agencies and non profits)
- How to scope data science projects
- Assessing your Data maturity
- Hands-on Technical Trainings including the Applied Data Analytics for Public Policy (run jointly with NYU and UMD)
Our trainings for governments and non profits are designed for Directors and Executives of organizations as well as Analysts and Policymakers.
Data Science Projects
We work with governments, non-profits, and other organizations on data science projects across health, criminal justice, public safety, education, economic development, transportation, and more. Most of our projects tackle operational problems that have tangible impact, and result in software that can be used by our partner organizations (and others) for social impact and improved policies. Recent examples of our projects include:
- Building Data-Driven Police Early Intervention Systems
- Prioritizing Preventative Lead Hazard Inspections
- Prioritizing Health and Safety Housing Inspections
- Reducing incarcerations by identifying at risk individuals in need of social services
Our research initiatives are motivated by working on hands-on data science projects with governments, non-profits, and other policy organizations. As we tackle policy problems, we identify open areas where existing methods from computer science, machine learning, artificial intelligence or social sciences are lacking and formulate our research initiatives to fill those gaps. We then push the results of our research back into our data science tools so they can be used across our projects and by our project partners. We are currently working on:
- Auditing and Correcting for Bias and Equity Issues in Data Science Systems
- Increasing the interpretability and transparency of machine learning models used in policy decisions
- Designing experimental validation methodologies for machine learning systems
- Developing methods for monitoring and updating deployed data science systems
Data Science Pipelines and Tools
We believe in open and reusable code and tools. All of our (non-confidential) project code is available under an open source license on our github page. All of our internal data science tools are also available for other organizations to use. Examples of such tools include:
- Triage: Our data science pipeline platform that’s used in many of our internal projects, which contains components for generating features, building machine learning models, and evaluating those models.
- Entity Deduplication Tool (pgdedupe)
- Post-Modeling Tools for analyzing the models built, feature importances, and exploring the outputs of those models before deployment.
- Bias Audits: To run bias audits on the outputs of machine learning models