Project Name:

Artificial Intelligence for Enhancing Data Quality, Standardization, and Integration for Federal Statistics

Contractor: NORC at the University of Chicago

Lessons Learned

  • To enhance data quality, standardization, and integration in the federal statistical system, prioritize opportunities for (a) survey data, (b) public sector administrative records, (c) third-party or private sector data, and (d) geospatial data.
  • Availability of data documentation is critical for data use and integration across data types.
  • When survey, administrative, or third-party data are used for purposes other than those for which they were originally collected, such as to link records, determinations about data quality hinge on representativeness and fitness for use.
  • A review of existing AI tools for non-spatial (tabular) and spatial data revealed that machine learning and rule-based methods are more commonly applied than LLMs or related techniques.
  • Considerations for use of AI include balancing data privacy with utility and disclosing AI sources and methods used to insure explainability for decision-making
  • Key mitigation techniques for AI risks involve including humans in the loop to direct and review AI performance in acquiring, describing, and transforming data for use

Disclaimer: America’s DataHub Consortium (ADC), a public-private partnership, implements research opportunities that support the strategic objectives of the National Center for Science and Engineering Statistics (NCSES) within the U.S. National Science Foundation (NSF). These results document research funded through ADC and is being shared to inform interested parties of ongoing activities and to encourage further discussion. Any opinions, findings, conclusions, or recommendations expressed above do not necessarily reflect the views of NCSES or NSF. Please send questions to ncsesweb@nsf.gov.