NSDS Project Awards
Project Name | Award Date | Project Objective |
---|---|---|
AI-Ready Data Products to Facilitate Discovery and Use (AI-RD-24) | September 2024 | This project explores how to make agencies' statistical data products more readily ingestible by AI technologies. It will produce an AI readiness assessment as a shared resource for any agency looking to test the machine understandability of its public data products and an AI readiness prototype tool to transform public data products into machine-understandable, AI-ready data. The project ends in April 2026. |
Artificial Intelligence for Enhancing Data Quality, Standardization, and Integration for Federal Statistics (AI-DQSI-24) | September 2024 | This project aims to develop a set of data processing tools using AI to enhance data standardization and integration activities. The project will begin with interviews of key stakeholders in the federal statistical system to identify current best practices, data processing gaps, and confidentiality concerns. It will then prototype a user-friendly toolkit and user interface for a future NSDS, providing an accessible and unified system for agencies addressing data quality. The project ends in April 2026. |
Building Capacity for State, Local, and Territorial Governments to Use Administrative Data for Evidence-Building (ADEB-24) | August 2024 | This project explores how nonfederal administrative databases could be used to produce new data products. It will prototype a tool to help jurisdictional governments ingest, visualize, and explore their own administrative data, and it will provide a report that can be used as a roadmap for other state, local, and territorial governments. The project ends in February 2026. |
Creating and Validating Synthetic Data (NCSES/Census, Annual Business Survey) (ABSSyn-23) | September 2023 | This project explores two methods of producing synthetic versions of a large-scale restricted use microdata file (NCSES’s Annual Business Survey). The two synthetic files will be compared for accuracy and quality, with one selected to undergo disclosure review for public release. This dataset will then be used in an evidence-building project and its accuracy tested using verification metrics. Lessons learned will inform future possibilities for creating synthetic data files to support a tiered access model for the NSDS. The project ends in March 2026. |
Creation of Synthetic Data and Development and Use of Verification Metrics (Survey of Earned Doctorates) (SEDSyn-23) | October 2023 | This project explores the creation of a synthetic data file, demonstrates examples of uses of synthetic data for evidence-building, and tests the use of verification metrics in validating estimates produced from synthetic data. NCSES’s Survey of Earned Doctorates, an annual census conducted since 1957 of all individuals receiving a research doctorate from an accredited U.S. institution in a given academic year, serves as the case study for this work. Lessons learned will inform future possibilities for creating synthetic data to support a tiered access model for the NSDS. The project ends in October 2025. |
Data Access Alternatives: Artificial Intelligence Supported Interfaces (DAA-24) | August 2024 | This project seeks to develop and pilot an AI “chatbot” that answers natural language user questions based on public data products from federal statistical agencies. In the first part of the pilot, the team is building a Retrieval Augmented Generation (RAG) based system that is compatible with and builds on the open-source framework behind Google’s Data Commons. The chatbot will focus on types of data products that represent how statistical agencies publish public data.(1) public use files, (2) data tables, and (3) analytical reports. These features are designed to make public data more accessible, useful, and relevant for a broad range of users, including those in science, policy, journalism, and more. In addition to a pilot tool, this project will record lessons learned about the size of input data tables, making statistical data “AI ready”, and engineering issues encountered while building the pilot tool. The project ends in August 2025. |
Data Integration to Estimate Science, Technology, Engineering and Mathematics (STEM) Attrition and Workforce Supply: A Pilot Approach (STEM-24) | August 2024, September 2024 | This project seeks to develop an analytic approach that researchers, policymakers, and other interested parties can replicate when analyzing data from different sources (e.g., survey and administrative data, state and local data). This project uses an evidence-building question as a use case, seeking to understand the impact of STEM attrition on future STEM workforce supply. The project will result in a framework for replicating the study's approaches to using disparate data sources to answer a question. One project ends in March 2026 and the other ends in June 2026. |
Data Protection Toolkit Use Case Analysis (DPT-23) | July 2023 | This project conducted a use case analysis on the Federal Committee on Statistical Methodology's (FCSM’s) Data Protection Toolkit, holding interviews with 15 individuals working for federal agencies, state governments, and other institutions. The project resulted in feedback on the Data Protection Toolkit and recommendations for improvement. The project ended in January 2024. |
Engaging Policy Stakeholders to Inform a Future National Secure Data Service (EPS-24) | September 2024 | This project seeks to identify the data needs of federal policy stakeholders as future users of an NSDS using a human-centered design approach. It will conduct a landscape analysis of the data needs within the federal policy ecosystem and conduct a detailed case study with the National Science Board. This project will result in recommendations for the navigation and data concierge services needed by policy stakeholders and a prototype service framework or policy toolkit. The project ends in November 2025. |
Evaluation of Noise Infusion for Large-Scale Demographic Sample Survey (Survey of Doctorate Recipients) (SDRN-23) | September 2023 | This project seeks to evaluate noise infusion for a sample survey. It will investigate different methods for noise infusion to evaluate data quality with each method and explore public messaging surrounding noise infusion. The project will result in a noise-infused sample survey with documentation of methodology and data quality assessment. The project ends in August 2025. |
Expanding Equitable Access to Restricted-Use Data through Federal Statistical Research Data Centers (FSRDC-23) | October 2023 | This project explores strategies to expand access to the restricted-use data made available through Federal Statistical Research Data Centers (FSRDCs) beyond its traditional base of users at high research activity (R1) universities. This project is conducting a national survey and focus groups to identify barriers to data access and potential strategies to promote data access within the FSRDCs. It will result in a report informing future project phases, the FSRDCs, and the NSDS. The project ends January 2025. |
Federated Data Usage Platform (DUP-23) | September 2023, October 2023 | These projects seek to prototype a data usage platform to illuminate instances of how federal data are being used across a wide variety of audiences and use cases. These prototypes will inform the development of a data usage platform dashboard that federal agencies can use as a shared service within the NSDS. Both projects end in September 2025. |
Foreign Born Scientists and Engineers in the Workforce (FBSE-22) | February 2022, March 2022 | The foreign-born scientists and engineers (FBSE) projects aim to integrate data sources to fill knowledge gaps and better understand this subpopulation in the U.S. workforce. The projects test novel approaches to create data sources and demonstrate the feasibility of acquiring, analyzing, and disseminating data files to inform this and other topics within a future NSDS. |
Informing Evidence-Building Capacity among State, Local, Territorial, and Tribal Governments within a National Secure Data Service (IEBC-24) | July 2024 | This project explores how an NSDS could support capacity for evidence building among state, local, territorial, and tribal governments. "Capacity building" here refers to skill building for staff, continuous learning opportunities, and/or access to infrastructure and tools. This project will conduct a needs analysis with all 50 states as well as local, tribal, and territorial governments. The project will produce 3 reports: 1). Needs analysis by group; 2) Gap analysis by group; 3) Recommendations for a future NSDS. The project ends in August 2026. |
Models for a Data Concierge Service for a National Secure Data Service (DCS-23) | September 2023 | This project explores models for a data concierge service, conducting an environmental scan of service request types that federal agencies receive and interviews of federal data providers and data users to inform a data concierge service. It will result in two or more models for a data concierge service as well as resource needs for each and potential staffing requirements. The project ends in March 2025. |
National Vital Statistics System Modernization — New Opportunities for Interoperable Data (NVSS-23) | August 2023 | This project explored the National Vital Statistics System (NVSS) ecosystem as a way to inform shared services in a future NSDS because of the system’s experience with data interoperability, implementation of governance considerations and authorized roles and responsibilities, and tiered data access structure. The project ended in September 2024 and resulted in a final report outlining considerations for a future NSDS. |
Privacy Preserving Technologies Phase 1: Environmental Scan (PPT-23) | July 2023 | This project conducted an environmental scan to understand the current landscape of privacy -enhancing technologies, resulting in a report documenting the analysis. The results of this project have informed project testing and piloting using privacy-enhancing technologies (such as privacy-preserving record linkage and synthetic data generation), which inform the NSDS secure compute environment and Capacity Building Center. The project ended in January 2024. |
Secure Compute Environment Scan (SCE-23) | September 2023 | This project conducted an environmental scan of secure compute environments. Over 20 federal stakeholders were interviewed to share perspectives on benefits, challenges, and requirements for successful utilization of a secure compute environment within the federal space. It produced a final report detailing findings to inform the requirements needed for the NSDS secure compute environment build. The project ended in July 2024. |
Secure Compute Environment Testbed for a National Secure Data Service (SCET-24) | July 2024 | This project builds a secure compute environment, a core component of the NSDS. The secure compute environment allows approved researchers to access, link, and analyze data for approved projects and enables testing and use of state-of-the-art privacy-enhancing technologies. The secure compute environment will undergo operational testing in early 2025 with an operational testbed available in summer of 2025. The project ends in August 2026. |
Synthetic Data Generation with Large, Real-World Data (DG-RWD-24) | September 2024 | This project explores how synthetic data generation, a type of privacy-enhancing technology, works with large real-world data (that is, datasets with over 30 billion rows of data) in a secure super compute environment. It will produce a framework to inform a synthetic data toolkit that will include but not be limited to methods to assess privacy risk, data utility and open-source AI methods for generating synthetic data. This is a joint project between the National AI Research Resource (NAIRR) pilot and the NSDS demonstration project. These are independent initiatives with expected synergies as reflected in the CHIPS and Science Act requirement that the NSDS demonstration project consult with the NAIRR Task Force in NSDS development. The project ends in August 2026. |
Utilizing Privacy Preserving Record Linkage to Link Data from Two Federal Statistical Agencies (NCSES/NCHS) (PPRL1-23) | September 2023 | This project explores the development of a data sharing agreement between two federal statistical agencies that have not previously developed data sharing relationships, deploys a commercial privacy preserving record linkage (PPRL) tool to link data from these two agencies, and uses a secure environment to analyze the resulting linked data file. It will inform linkages across the federal government by developing agreements and deploying PPRL as a model to improve the availability, quality, accessibility, and interoperability of data sharing. The project ends in September 2025. |
Utilizing Privacy Preserving Record Linkage with Parent Agency Data and Statistical Agency to Inform Programs and Policies (NCSES/NSF) (PPRL2-23) | September 2023 | This project explores the development of a data sharing agreement between a federal statistical agency and its parent agency, deploys an open-source privacy-preserving record linkage (PPRL) tool to perform the linkage, and uses a secure environment to analyze the resulting linked data file. This project will inform linkages across the federal government, including within-agency collaborations, by developing agreements and deploying PPRL as a model to improve the availability, quality, accessibility, and interoperability of data sharing. The project ends in September 2025. |
Other NCSES Project Awards
Project Name | Award Date | Project Objective |
---|---|---|
Development of a Prototype for the Standard Application Process Portal (SAP-23) | September 2023 | These projects developed multiple prototypes of an online portal allowing users to search for confidential data held by federal statistical agencies, apply for access to that data, and allowing data providers to review those applications and render a decision. This portal supports the implementation of Section 3583 of the Foundations for Evidence-based Policymaking Act of 2018. The projects ended in September 2024. |