AMERICA’S DATAHUB CONSORTIUM LESSONS LEARNED
The Coleridge Initiative FBSE-22-05
May 2022 Report:
- We encountered challenges with contracts with our state partners due to clauses regarding the ownership of state data. These clauses are direct flow-down from ATI’s agreement with NCSES. We are working on a resolution (we will send revised statements to ATI, and they will take these to NCSES for review). There are difficulties with navigating planning during contract negotiation period.
- The Coleridge team is continuing with the initial review of Foreign Born Scientists and Engineers-related federal data sources that will be used for benchmarking. The team obtained and analyzed tables from the American Community Survey (ACS): Aggregate Data and Public Use Microdata Sample (PUMS). The analysis included advanced degree holders by nativity over time; foreign-born post-education advanced degree holders by origin and decade of immigration (aggregates); and socioeconomic characteristics (PUMS). These were obtained for New Jersey, Arkansas, and Kentucky, as well as the nation. ACS is nationally representative, and available over time; it provides detailed origin, immigration, education, work, and socioeconomic status information. Moreover, it has a flag for S&E and S&E-related fields which are useful to identify scientists and engineers. However, ACS data does not allow us to capture whether the individuals obtained their degree in the US or abroad. Therefore, administrative data from the states will be of great importance.
June 2022 Report:
- As mentioned in the last report, we encountered challenges regarding contracts with our state partners due to clauses regarding the ownership of state data. This led to a slower progress in the last month (e.g., state partners postponed sending invitations to the potential advisory panel members). This week, we were informed by ATI that the proposed changes were accepted, hence the contracting with the states can now be finalized.
August 2022 Report:
- Due to limitation on the number of common attributes (aligned schema approach) in education and wage data, AR team has decided to test the record linkage methods using First Name, Last Name and Middle Name on the administrative data in initial phase. Due to this limitation, the truth dataset is generated using SSN match.
September 2022 Report
- Building the data infrastructure must be accompanied by a clear process through which users will request access. The request process should follow protocols similar to federal restricted data assets. A governance model should also be developed that provides representation for all data stewards of the requisite data assets used to develop the data infrastructure. We must be mindful about the population in question and the potential misuse of data.
- State UI wage records will likely be unable to provide the level of coverage needed since immigrant workers must have both an authorization to work in the U.S. and a social security number. Linkages to state and federal income tax data may be able to fill in these gaps.
- Linkages leveraging names must consider that U.S. reporting name fields may be unable to adequately account for the way in which individuals from different countries report their names.
October 2022 Report
- The need for a “current state” data model needs to be separate from the aspirational data model, acknowledging what can be completed within state longitudinal data systems in the near term while documenting what stakeholders would like to see from this work in the future.
- Each federal agency (NSF, USCIS, BLS, etc.) has different definitions of S&E and STEM fields. One can be narrower than the other. We must be careful in defining the S&E fields so that the infrastructure can cover all populations of interest.
- Assessing match accuracy requires a known truth set and we are addressing this by looking at other state administrative data that may be leveraged for alternate identifying relationships (which will be withheld from evaluated record linkage models).
- Initial profiling of AR administrative data indicates that quarterly Unemployment Insurance (UI) wage data has the most limited set of available attributes and not what would traditionally be considered sufficient for linkage without the presence of a valid SSN. We are exploring options for mitigating this challenge including the assessment of probabilistic record linkage for the population of interest, alternative sources of supplemental identifying attributes, and assessing the potential value of current approaches to enhance labor market information such as state legislation and the Jobs and Employment Data Exchange (JEDx).