RFS: Measuring Large Language Model Understanding of Federal Statistical Data
On June 27, 2025, ATI published the following Request for Solutions (RFS): Measuring Large Language Model Understanding of Federal Statistical Data.
The submission deadline for the project is July 18, 2025, at 5 PM ET. Membership in ADC is not required for submission. However, if chosen, the selected organization must join ADC.
Be sure to explore our Consortia Member Training for helpful resources and support tools as your prepare your proposal.
Measuring Large Language Model Understanding of Federal Statistical Data
Generative AI applications offer transformative opportunities for how Americans interact with public data. By enabling interaction through natural language and multimodal prompts, these technologies facilitate more intuitive access to complex data collections through chat-based interfaces, reducing technical barriers and expanding the accessibility of public data to a broader range of users. To ensure that federal data are increasingly valuable in the training of generative AI applications, the federal government must optimize and enrich its data assets with the appropriate context for this rapidly evolving ecosystem.
This Request for Solutions (RFS) seeks to develop an empirical evaluation that measures the ability of large language models (LLMs) to accurately respond to questions that require an understanding of federal statistical open Government data assets and their associated metadata.[1] This will involve the creation of prompt-response pairs necessary to assess the accuracy, relevancy, and explainability of LLMs in federal statistical use cases. In addition, this effort will result in a tool that will evaluate LLM performance in response to these evaluation prompts, while also providing insight into how well federal statistical data assets are structured to support LLM interaction – highlighting opportunities to improve metadata quality, accessibility, and machine-readability. Ultimately, this RFS envisions the development of a tool that may be offered as part of a shared service within a future National Secure Data Service (NSDS) and lay the groundwork for replication and expansion across additional statistical subject-matter domains and agencies.
[1] Open government data assets are defined in statute (44 USC 3502(20)) and in M-25-05. Solutions should consider both datasets and their accompanying contextual information (e.g., metadata, documentation, and formatting) as this information can influence how an LLM interprets the dataset.
A Teaming Speed Networking Event will also be held on July 8 at 1 PM ET. Learn more and register here.
Request for Solutions
RFS Release Date | Submission Due Date | Documents |
---|---|---|
June 27, 2025 | July 18, 2025 5 PM ET |
|