Policymakers, researchers, practitioners, and community leaders rely on data to understand their communities’ needs and the effectiveness of programs and policies aimed at improving lives. Yet many data systems were created to meet immediate reporting requirements, not to serve multiple purposes or support long-term analysis. As a result, these systems often fall short in providing the evidence needed to inform public understanding and improve policy outcomes.
At the Urban Institute, we’ve partnered with local, state, and federal agencies to design and support more-sustainable data systems. Across a diverse range of projects, including national open data platforms and secure research environments, we’ve navigated a core challenge: balancing accessibility, privacy, and sustainability in data systems to expand the value of data over time.
In this post, we offer a behind-the-scenes look at our approach and the lessons we’ve learned, with the aim of helping other organizations and leaders build the foundations for data systems that unlock insights and improve outcomes.
Start privacy and governance conversations early
Successfully managing risks requires treating privacy, governance, and disclosure protections as first-order design choices—not afterthoughts. Starting these conversations early helps align goals, clarify decisionmaking authority, and design processes that scale, especially in multiagency or cross-jurisdictional projects with significant legal and operational constraints. These early conversations can enable trust and efficiency down the line.
Urban’s work with the DC Education Research Collaborative demonstrates the importance of early governance. From the start, the collaborative was structured as a research-practice partnership, bringing together education agencies, researchers, and community stakeholders through formal governance bodies. These included a cross-sector advisory committee and a research council of academic and analytic partners.
Governance and disclosure processes were established early and collaboratively, allowing agencies to securely and easily share data once with the collaborative, where a central team generates consistent, research-ready datasets that are then used by researchers across different organizations and teams. This approach, combined with regular meetings and feedback loops, reduces administrative burden, provides predictable data access, and keeps use aligned with the collaborative’s shared values.
- Design for change
Data systems must evolve as new data sources, users, and technologies emerge. The Education Data Portal, launched in 2018, was intentionally designed with this flexibility in mind. It uses an API-first, metadata-rich design to harmonize datasets over time and across sources and pairs those data with detailed, programmatically accessible documentation. This approach allows us to add new data sources and build downstream tools, such as programming libraries and interactive dashboards, without requiring changes to the underlying system.
Because the portal’s core architecture is modular and scalable, it’s remained resilient while expanding to support new use cases and audiences. Today, it’s evolving in response to federal policy shifts and advances in artificial intelligence. We are incorporating nonfederal data from states, for example, and integrating with initiatives like Google’s Data Commons to broaden access and usability. While Urban could not have anticipated these specific developments in 2018, the decision to prioritize an API-first design and curated metadata has enabled us to adapt the portal to new datasets, users, and tools without reengineering its foundation.
- Reuse and improve what works
Investing in reusable infrastructure saves time, reduces risk, and improves consistency. The Education-to-Workforce (EW) Framework Data Tool, which provides interactive visualizations to track and compare student success and system conditions over time and place, illustrates this principle in two ways. First, it relies on a shared set of indicators developed collaboratively and refined over time with input from more than 20 national and community organizations. Second, it draws data directly from the Education Data Portal, saving hundreds of hours of duplicative data collection, cleaning, and documentation. Together, these design choices allow our team to update the EW tool efficiently, scale it to new geographies, and sustain it with continued stakeholder buy-in.
This focus on reuse has also shaped Urban’s work developing synthetic datasets, or datasets designed to imitate a confidential dataset while limiting information about individual records in the confidential dataset. Urban has iteratively developed and refined the open-source tidysynthesis R package and accompanying training materials to help agencies expand access to sensitive data without relying on proprietary software or indefinite external support. To date, several policy stakeholders have used tidysynthesis, including state longitudinal data systems such as the Nebraska Statewide Workforce & Educational Reporting System, local agencies including Allegheny County’s Department of Human Services, and federal agencies.
With Urban’s continued investment in this shared, open-source infrastructure, agencies at all levels can make sensitive data more accessible for research and policy analysis while protecting privacy and minimizing staff burden.
- Build a shared understanding of trade-offs
Collecting, storing, and using data always involves trade-offs. Lawmakers express requirements through statute, executives through policy, and technologists through implementation, so translation across these perspectives is essential. Without a shared understanding, laws can fall out of step with technology, and data stewards can make implementation decisions that drift from policy intent.
Urban’s work developing secure analytic environments for the DC Education Research Collaborative illustrates how intentional communication can help navigate these trade-offs. Rather than adopting the traditional, costly remote desktop model common in restricted computing environments, Urban built on-demand cloud-based workspaces using primarily open-source technologies, which reduced costs and lowered barriers to access for analysts across partner institutions.
Throughout the design and build, Urban involved the public agencies responsible for stewardship of the data to make sure our system met their goals, requirements, and constraints. While many stakeholders did not need to dive into technical details like serverless architectures or containerized applications, their close involvement in defining system parameters enabled collaborative security planning and ongoing conversations among agency staff and technologists to meet both policy and technical requirements.
- Expect to sustain the systems that sustain evidence
Building sustainable data systems takes time, resources, and coordination. Even with the latest tools and platforms, effective systems depend on sustained attention, long-term investment, and champions across organizations who can sustain momentum as priorities shift and stakeholders change.
Urban’s long-standing partnership with the Statistics of Income (SOI) Division of the Internal Revenue Service illustrates this reality. For nearly a decade, Urban and SOI have worked to develop and implement modern privacy-preserving approaches to safely expand access to administrative data. Although these data are uniquely valuable for evaluating tax policy, modeling distributional effects, and studying mobility and inequality, they are also among the most sensitive data held by the federal government.
Recent milestones—including the internal release of synthetic public-use files and a prototype validation server—reflect meaningful progress, which rests on decades of leadership within SOI. Staff within SOI have consistently championed this work despite rapid technological change, advances in privacy-enhancing technologies, and shifting policy contexts.
Sustaining this work has also required significant financial investment. Urban’s work to develop privacy-enhancing technologies, for example, has been supported by millions of dollars over many years from government agencies, philanthropic foundations, and corporate partners: resources that have made long-term progress possible.
As policymakers and researchers increasingly look to AI and other emerging technologies to build and understand datasets, strong foundations for data systems matter more than ever. Through projects such as the Education Data Portal, the Education-to-Workforce Framework Tool, tidysynthesis, and the DC Education Research Collaborative, Urban continues to reinforce that foundation. Each reflects a different approach to the same goal: ensuring data can inform better decisions for many years to come.