Project Name:
Building Capacity for State, Local, and Territorial Governments to Use Administrative Data for Evidence-Building
Contractor: BrightQuery, Inc.
Lessons Learned
- AI Backend Setup: The process of setting up the AI backend and website was a learning experience that will help streamline future developments.
- State Collaboration: Early engagement with state partners is critical to ensure project milestones are met on schedule.
- Data Sharing Agreements: We leveraged existing Data Sharing agreements wherever possible. Some states like CA have data linked up end to end and ready to be used while others have an MoU in place between different participating agencies where data needs to be linked and extracted on specific uses like in the case of CT. Both arrangements have their own pros and cons and are useful in different states of maturity of insight generation. The final long-term objective should be to generate insights at the state level which can then be aggregated at the federal level for a concerted policy making and evidence building.
- AI Backend Setup: The process of setting up the AI backend and website was a learning experience that will help streamline future developments.
- Platform Development: The development of the UI and data exploration tools provides valuable experience for future iterations of the platform.
- Data Sharing Concerns: Data sharing concerns between agencies are legitimate and based on legal and privacy considerations. These cannot be solved simply through effort; new techniques will be required in combination with political and administrative considerations.
- Data Silos: Data siloed within individual states will always have a “visibility horizon” at the state’s borders.
- State Size: Larger states have an inherent advantage in collecting and linking data. Smaller states have a higher proportion of their population leaving and entering the state.
- Workforce Churn: People drop out and re-enter the workforce in different states due to multiple reasons like the workforce/skills being seasonal, people taking a break from working and re-entering the workforce, people taking retirement etc. Eg: moose hunting or fishing which is possible in summer only, some people in financial services did not seek jobs for a few quarters after the 2008 stock market crash. In these cases, linking at a statistical level is the only alternative.
- Data Sharing & Governance:
- Existing Agreements Are Crucial:
Leveraging pre-existing Data Sharing Agreements (DSAs) expedited project execution in
some states (e.g., California). These agreements allowed for smoother integration and
insight generation. - Different States, Different Models:
States like CA have fully linked datasets, whereas others like CT rely on MoUs for data
use on a case-by-case basis. Each model has its pros and cons depending on the state’s
readiness and legal context. - Privacy & Legal Concerns Persist:
Legal frameworks and privacy considerations remain a major barrier. Technical solutions
alone are insufficient; policy-level collaboration is essential for broader data integration.
- Existing Agreements Are Crucial:
- Data Infrastructure & Standardization
- Value of SDMX & .Stat Suite Adoption:
Using international standards (like SDMX) enabled consistent, interoperable data
handling. These tools reduced manual errors and supported real-time data updates. - Scalability Through Open Standards:
Open-source tools like .Stat Suite proved both cost-effective and scalable, making them
well-suited for long-term use across varied government agencies.
- Value of SDMX & .Stat Suite Adoption:
- Platform & Technology Development
- Rapid Progress Through Prototyping:
The development of the AI-powered platform and visualization tools significantly
accelerated data usability. These early-stage efforts lay a strong foundation for future
expansion. - Natural Language Search Was a Win:
A natural language interface helped make data accessible to non-technical users,
promoting more inclusive engagement with the platform. - Website Deployment Provided Real-World Testing:
Deploying a live version of the platform with California data allowed for valuable
feedback and iteration, helping refine both backend and frontend components.
- Rapid Progress Through Prototyping:
- Analytical Frameworks & Usability
- Standardized Templates Are Transformative:
Canned analysis templates promote repeatability and ease of use, especially for states
with limited analytical capacity. These templates balance rigor with usability. - Need for Tailored Visual Outputs:
Customizable visualizations and recommended methods make analysis outputs more
actionable for policymakers.
- Standardized Templates Are Transformative:
- Strategic & Policy Insights
- Interstate Comparisons Are Limited by Data Silos:
Siloed data systems and the “visibility horizon” at state borders challenge cross-state
analyses, especially for smaller states with high population mobility. - Workforce Churn Affects Longitudinal Studies:
Seasonal labor patterns, economic shifts, and population movement complicate
workforce tracking. Statistical linking remains the most viable long-term solution. - Larger States Have an Advantage:
Bigger states naturally collect more comprehensive data, aiding longitudinal tracking
and more complex analysis
- Interstate Comparisons Are Limited by Data Silos:
Disclaimer: America’s DataHub Consortium (ADC), a public-private partnership, implements research opportunities that support the strategic objectives of the National Center for Science and Engineering Statistics (NCSES) within the U.S. National Science Foundation (NSF). These results document research funded through ADC and is being shared to inform interested parties of ongoing activities and to encourage further discussion. Any opinions, findings, conclusions, or recommendations expressed above do not necessarily reflect the views of NCSES or NSF. Please send questions to ncsesweb@nsf.gov.