Data Linkage


Since the 1980s, the National Center for Health Statistics (NCHS) Data Linkage program has combined NCHS survey data with administrative sources from Medicare, Medicaid, National Death Index, and Social Security records, supporting over 1,000 peer-reviewed publications. The program maximizes resource efficiency by using state-of-the-art data science matching techniques to link existing data, avoiding costly re-collection efforts. While most program resources require restrictive-access permissions to ensure participant privacy, the program takes every opportunity to make publicly available versions, including through innovative synthetic data generation 1,2.

Updated: December 12th, 2026

Overview

The National Hospital Care Surveys (NHCS) Data Linkage program addresses complex policy questions by combining detailed data from NCHS-run surveys with existing vital statistics and health-related administrative sources 1. This effort intends to maximize the usability of existing resources to avoid the need for costly and time-consuming re-administration of data collection. Since the 1980s, the program has continually updated linkage sets with decades of follow-up and expanded linked datasets as new sources become available 1,2. To date, these efforts have supported epidemiological research published in over 1,000 peer-reviewed papers in top scientific journals 2.

To achieve these outcomes, the program focuses on maximizing linkage-eligible participants and survey-administrative record combinations while creating high-quality products with minimal errors through state-of-the-art data science techniques 1,2. However, participant eligibility requires NCHS survey respondents to provide consent and minimum necessary Personally Identifiable Information (PII) for record linking. Because PII collection requirements and consent procedures vary by survey and over time, different percentages of participants qualify for linkage attempts 2. Once eligible participants are identified, matched records undergo a two-stage assessment before being considered linked 35:

  1. Deterministic: Records are matched using participants’ Social Security Number (SSN) and Health Insurance Claim Number (HICN), with identifier formats varying by survey and year. Additionally, percentage matching is conducted across demographic fields including first and last name, middle initial, birth date components, 5-digit ZIP code, state of residence, and sex.

  2. Probabilistic: Deterministic pairs are weighted using an approach based on the Fellegi-Sunter paradigm. This method defines the relationship between agreement probabilities and their corresponding agreement/non-agreement weights for each identifier used in the record linkage process.

Records are then classified as linked or non-linked based on successful linkage between survey and administrative data, with linked records further classified as matched or non-matched survey data. Non-linked records are analyzed to determine linkage failure reasons, including participant death between collection periods or non-receipt of administrative benefits 2. Detailed methods are available in Methodology and Analytics Considerations documents for each administrative record type 4,5.

NCHS’s Research Ethics Review Board (ERB) carefully assesses all linkage efforts for confidentiality risk and scientific justification. Consequently, linked data are typically restricted access only 4,5. However, the program provides limited public-use versions of qualifying datasets and is piloting synthetic data creation that maintains analytical properties while protecting privacy. Feasibility files with limited variables and linkage indicators are also available to help researchers assess whether sample sizes are adequate for their research questions 2.

Linked datasets are primarily organized by administrative dataset type, with all linked NCHS survey programs listed for each. Each administrative page contains details about the specific linkage effort, including methods, publicly available datasets, and data dictionaries. As of this writing, the available administrative pages include 6:

  • National Death Index (NDI) Mortality Files: Provides a central, computerized index of death record information for mortality ascertainment and follow-up studies 7.

  • Centers for Medicare & Medicaid Services (CMS): Contains administrative records on healthcare utilization, costs, and patient outcomes for Medicare and Medicaid beneficiaries 8,9.

  • Department of Housing and Urban Development’s (HUD) Housing Assistance Programs: Includes administrative records on individuals receiving housing assistance, focusing on their health and well-being 10.

  • Department of Veterans Affairs (VA): Covers healthcare utilization, health outcomes, and related factors for U.S. veterans 11.

  • United States Renal Data System (USRDS) End Stage Renal Disease (ESRD) Data: Consists of records on kidney disease progression, treatment, and patient outcomes for individuals with end-stage renal disease 12.

  • Social Security Administration (SSA) Old Age, Survivors and Disability Insurance (OASDI) and Supplemental Security Income (SSI) Benefit Records: Provides data on the economic and health status of individuals receiving social security benefits 13.

Each has been linked with different qualifying NCHS survey programs. Across all administrative datasets, the following NCHS surveys have been linked. Additional information about some of these programs can be found on the respective pages from our Dataset listing page.

  • National Health Interview Survey (NHIS): Established in 1957, this survey collects comprehensive health data from the U.S. civilian population through face-to-face household interviews, providing nationally representative data on health trends, illness, disability, and healthcare utilization 14.

  • National Hospital Care Survey (NHCS): This survey collects data on healthcare utilization and patient care in U.S. hospitals to provide insights into hospital capacity, services, and treatment outcomes 15.

  • National Health and Nutrition Examination Surveys (NHANES): Assesses the health and nutritional status of adults and children in the United States through interviews, physical examinations, and laboratory tests, providing critical data on public health and nutrition 16.

  • NHANES I Epidemiologic Follow-up Study (NHEFS): Follows up with participants from NHANES I to study long-term health outcomes, including morbidity, mortality, and risk factors for chronic diseases 17,18.

  • National Nursing Home Surveys (NNHS): Collects data on nursing home facilities, services, and resident characteristics to inform policies and programs for long-term care 19.

  • The Second Longitudinal Study of Aging (LSOA II): Follows a cohort of older adults to study aging-related health changes, disability progression, healthcare utilization, and social factors affecting the elderly population between the 1980s and 1990s 20.

  • National Ambulatory Medical Care Survey (NAMCS) Health Center (HC) Component Data: Collects data on the utilization and provision of ambulatory care services (hospital emergency or outpatient departments) in health centers, providing insights into primary care practices and patient demographics 21.

Gaining Access

Do I Qualify?

All individuals seeking to use the data for statistical analysis and reporting purposes may access and download it.

Typical Timeline

There are no time constraints for accessing these data.

Step-by-Step Guide

The NCHS Data Linkage program offers three types of publicly available data, each with different considerations and intended uses. Each dataset linkage may include any one or all of these types of publicly available datasets:

  1. Limited Versions of Restricted-Use Datasets: Reduced variable sets from the full restricted files

  2. Synthetic Data: Created to mimic the restricted-use files without limiting which variables are accessible while maintaining analytical properties and protecting privacy

  3. Research Feasibility Files: Limited variables and linkage indicators to help researchers assess whether sample sizes are adequate for their research questions

No specific steps are required to access these publicly available datasets. Users can navigate to the main NCHS Data Linkage Activities page and find available linked datasets listed in the left navigation bar. Data downloads are available through their respective pages 6.

Do I Qualify?

Individuals seeking to use data for statistical analysis and reporting may submit research proposals to the NCHS Research Data Center (RDC) through the Standard Application Process (SAP) via the ResearchDataGov (RDG) portal 22,23.

Applicants must fulfill criteria during application review and, in some cases, after approval. The four core criteria below are summarized from the RDG User Guide 24. If additional post-approval steps are required, data-owning agencies contact applicants directly to initiate those processes.

  1. Identification: Researchers verify identity, job title, organizational affiliation, and skill level. Some agencies require U.S. citizenship confirmation.

  2. Training: Some agencies require post-approval training on data use, management, confidentiality, and cybersecurity.

  3. Agreements: Researchers sign agreements such as non-disclosure or data use agreements. Some data sources require security plans outlining data protection methods.

  4. Investigation: Researchers undergo background checks.

How Is My Application Assessed?

All applications submitted through SAP are evaluated against the same criteria regardless of the agency or unit, unless required by law or regulation. Full criteria are available in the RDG User Guide and summarized below 24:

  1. Statistical Purpose: Data is used solely for statistical purposes, not to identify individuals or businesses, nor for law enforcement, legal cases, or regulatory actions.

  2. Allowed Use: Researchers plan to use data in compliance with applicable rules and restrictions.

  3. Statistical Disclosure Limitation: Researchers must employ techniques that protect individual, organizational, or business privacy.

  4. Demonstrated Need: Researchers must demonstrate that confidential data is necessary for project goals and that publicly available data is insufficient.

  5. Feasibility: Researchers can achieve project goals with requested data. This is evaluated in three ways:

  1. Project Design: Detailed planned methods and how technical and logistical needs will be met.

  2. Agency or Unit Support: Confirmation that agencies can provide space, technical assistance, logistics, and data preparation.

  3. Applicant Ability: Applicants possess the knowledge, skills, and ability to execute the project.

  1. Maintaining Public Trust: Projects are expected to maintain public trust and confidence in the agency or unit.

Additionally, the NCHS Review Committee evaluates studies for well-defined public health research questions, benefits to data-providing agencies, and appropriateness of planned outputs (papers, articles, presentations) 25.

Important

NCHS does not evaluate studies for scientific merit or relevance (substantive, methodological, theoretical, or policy). Study output release is not guaranteed due to privacy concerns and application compliance requirements.

Typical Timeline

Zoomable Image
A summary of the application process, including information on when additional reviews may be necessary.

Upon receiving applications, assigned RDC analysts and the review committee—comprised of data system representatives and the confidentiality officer—assess applications for approval, disapproval, or revision requests. If denied, researchers receive justification for the decision and may request appeals through SAP 26.

Studies may require amendments to approved applications. Amendment approvals typically take up to four weeks, but complicated requests may require eight to twelve weeks.

Step-by-Step Guide

a. Prepare Your Application Before Completing the Form

The RDG portal provides access to applications for data from 13 agencies and 3 units within the Federal Statistical System (FSS), including NCHS. Each agency may have additional application requirements. While the RDG User Guide and RDC Application Process page contain detailed information, researchers are encouraged to contact relevant agencies before submitting applications 22,24.

Example applications, data dictionaries, and complete details are available on the RDC Application Process page 22. This summary provides an outline and directs researchers to relevant information sources. Contact the RDC at rdca@cdc.gov for further assistance 24.

  1. Confirm the study requires restricted data access and that FSS data will meet research needs. Contact all relevant agencies or units to verify that required data can be combined, whether restricted-use, publicly available, or non-FSS.

  2. Select a preferred access location and confirm eligibility. Non-U.S. citizens should contact the intended facility before writing proposals to confirm eligibility for data access. Refer to the Data Distribution Centers section for more details.

Warning

Data access methods and locations may vary, especially if the data comes from different agencies or units within the FSS.

  1. The NCHS requires additional documents as *.pdf or *.docx and additional concerns to be addressed beyond the base SAP application. Before starting the application, it is recommended to prepare the following:
  1. A data dictionary listing all requested variables from publicly available, restricted-use, and non-NCHS data sources, which can largely be created using SAP. Refer to the Providing the Public Use and Non-NCHS Data page for additional guidelines on preparing applications with these data types 27.

  2. A description of the research methodology and a list of supporting references.

  3. A detailed description of the intended data product (e.g., paper, journal article, or presentation).

  4. A timeline for managing the project within a three-year period.

  5. An explanation of how the project will benefit NCHS or other data-providing agencies for the Agency Benefits section of the application.

b. Complete the SAP Application Form

  1. In SAP, search for and select the datasets needed for the study and add them to the checkout basket. Any data added to the same basket will be included in a single application.
Note

Additional data can be added to the application if needed after submission.

  1. When ready to proceed, open the basket and select Start Application. It is recommended that the principal investigator (PI) or co-principal investigator (co-PI) create the application, as only they will be able to make edits after submission.

  2. In the application, the researcher will need to provide their information, a description of the project that demonstrates the need for restricted-use data, and upload the required documents as *.pdf or *.docx. The SAP application and some agency- or unit-specific requirements can be found in the RDG User Guide 24.

  3. Submit the application for review.

c. Review the Committee’s Decision and Finalize the Application

Upon receiving the application, the RDC director will assign the group an RDC analyst who will work with them throughout the entire process of applying for data, accessing the data, and finalizing the output of their results 25. The RDC analyst will:

  • Facilitate application review and accept NCHS-required confidentiality paperwork.
  • Accept payments incurred by data center access.
  • Create datasets by compiling data specified in data dictionaries and linking using designated variables. Analytical datasets are provided during data center visits.
  • Review results for disclosure risk and provide them once analysis is completed at data centers.

Researchers with approved applications must complete the following steps to prepare for data access and utilization.

  1. Discuss the committee’s approval with the assigned RDC analyst and address any revision requests as needed.

  2. Provide the RDC analyst with approved data dictionaries, public-use and non-NCHS data, descriptions of desired data linkages with intended final formats, and clearly defined derived variables, including arithmetic code or algorithms.

  3. Complete the Confidentiality and Disclosure training and forms, then provide them to the RDC analyst. Additional forms, documents, and tools are available on the RDC Reference Materials page 28,29.

  4. When invoiced, pay fees incurred by the request and applicable to data access at the intended location and frequency. Complete details are in Fees and Invoicing 30.

Amendments for project changes are possible but require additional approval prior to implementation. Common reasons for an amendment include, but are not limited to, changes in the research team, requests for additional variables, new methods, or requests for additional types of output 22. However, if the scope or research question changes, a new application is required instead.

Contact the RDC directly at rdca@cdc.gov for further guidance.

d. Accessing the Data and Publication Expectations

  1. After completing the steps in Part C, the RDC analyst will prepare the dataset. Once it is ready, schedule an appointment to access the data at the chosen RDC location. Schedule appointments at least one week in advance.
Note

Different data centers have different access procedures and may incur additional fees depending on frequency of data access. Refer to the Data Distribution Centers section and associated links for additional guidance.

  1. Upon completion of analysis, submit an output request to the RDC analyst for review and approval. Full details about requirements and expectations are on the Output Policies and Procedures page 31.
Warning

NCHS does not guarantee that study-generated output will be released due to concerns related to, but not limited to, privacy and alignment with the approved application.

  1. When output is nearly complete, send it to the RDC analyst for review before submitting for publication or distribution. Full details about requirements and expectations are on the Publishing Guidelines page 32.

Data Distribution Centers

NCHS data is available at two types of data centers: the Census Bureau’s Federal Statistical Research Data Centers (FSRDC) and the NCHS Research Data Centers (RDCs). Additional details about preparing to access data at one of these sites can be found on the RDC Location of Access page or their respective subpages: FSRDC and NCHS RDC 22,3335.

Important

Submit an electronic copy of any notes or reference materials needed to the RDC analyst prior to visiting a center. Hard copies of these materials are not allowed. Electronic communication devices, such as phones, pagers, and laptops, are also not permitted in the RDC.

Different software products or add-ons can be requested, though not all requests will be approved. Be aware that requesting additional software accommodations may delay project approval. Contact the specific RDC where data will be accessed for further guidance.

Federal Statistical Research Data Centers

Researchers must be affiliated with a university or agency to qualify for FSRDC access. They must also meet physical and information security requirements, including obtaining Census Bureau Special Sworn Status (SSS) and passing a background investigation. Non-U.S. citizens are generally encouraged to use FSRDCs.

Researchers are assigned both an NCHS RDC analyst and an FSRDC administrator from the location they intend to visit. The NCHS RDC analyst roles were described previously under Part C of the Step-by-Step Guide. The FSRDC (Census RDC) administrator will:

  • Be available to answer questions pertaining to SSS, access and entry to a FSRDC location, software availability, and additional Census fees.
  • They may be available to help develop the application.
  • Transfer output to the NCHS after researchers complete their analysis at a data center. The RDC analyst will review the output for disclosure risk and provide researchers with the results.

If accessing an FSRDC, the data must be transferred from NCHS after completing all requirements outlined by the assigned RDC analyst seven days prior to the intended visit. These additional steps incur extra costs beyond the NCHS RDC data access fee and may delay data access.

  • In-person: Census Bureau facilities at partner institutions.
  • Remote access: A secure Virtual Desktop Interface (VDI) may be available.
  • Software Available: Anaconda, Gurobi, Intel Composer, Knittro, MADD, Mathematica, NATLAB, OpenGeoda, R and Rstudio, SAS, Stat/Transfer, Stata, Stata-MP, SUDAAN, and Tomlab.
NCHS Research Data Centers
  • In-person: Facilities located in Hyattsville, MD, Atlanta, GA, and Rockville, MD require appointments scheduled at least one week in advance.
  • Remote access: Not available due to computers being disconnected from the internet.
  • Software Available: Microsoft Office products, R, SAS, Stata, Python, SPSS (v. 19), and SAS-callable SUDAAN.
Required Documentation

When accessing the data, researchers must provide the following documentation in addition to an approved application:

  • Proof of identification, such as a REAL ID
  • Curriculum vitae (CV) for each applicant
  • For student projects, an agreement form completed by the student’s advisor 36
  • Permission to use proprietary data
  • Table shells for requested output
  • Data dictionary listing all necessary variables (restricted-use, public-use, and external) for the research project (refer to agency dataset data dictionaries for available variables)

Publications

This section presents a selection of PubMed articles that utilize the dataset and are authored by individuals affiliated with the Yale University. These articles are provided to inspire researchers and students to use the data in their own work.

Back to top

References

1.
Centers for Disease Control and Prevention (CDC) & National Center for Health Statistics (NCHS). Realizing the power of data. https://www.cdc.gov/nchs/data-linkage/datalinkagestory.htm (2021).
2.
Centers for Disease Control and Prevention (CDC) & National Center for Health Statistics (NCHS). Webinar: The NCHS Data Linkage Program. (2020).
3.
National Center for Health Statistics (NCHS) & Centers for Disease Control and Prevention (CDC). Public-use linked mortality files. (2025).
4.
Division of Analysis and Epidemiology, National Center for Health Statistics (NCHS) & Centers for Disease Control and Prevention (CDC). The linkage of national center for health statistics survey data to medicare enrollment, claims/encounters and assessment data (2014-2018): Linkage methodology and analytic considerations. (2023).
5.
Division of Analysis and Epidemiology, National Center for Health Statistics (NCHS) & Centers for Disease Control and Prevention (CDC). The linkage of the national center for health statistics (NCHS) survey data to u.s. Department of housing and urban development (HUD) administrative data: Linkage methodology and analytic considerations. (2023).
6.
Centers for Disease Control and Prevention (CDC) & National Center for Health Statistics (NCHS). NCHS data linkage activities. https://www.cdc.gov/nchs/data-linkage/index.htm (2025).
7.
Centers for Disease Control and Prevention (CDC) & National Center for Health Statistics (NCHS). NCHS data linked to NDI mortality files. https://www.cdc.gov/nchs/data-linkage/mortality.htm (2024).
8.
Centers for Disease Control and Prevention (CDC) & National Center for Health Statistics (NCHS). NCHS data linked to CMS medicare data files. https://www.cdc.gov/nchs/data-linkage/medicare.htm (2024).
9.
Centers for Disease Control and Prevention (CDC) & National Center for Health Statistics (NCHS). NCHS data linked to CMS medicaid enrollment and claims files. https://www.cdc.gov/nchs/data-linkage/medicaid.htm (2023).
10.
Centers for Disease Control and Prevention (CDC) & National Center for Health Statistics (NCHS). NCHS data linked to HUD housing assistance program files. https://www.cdc.gov/nchs/data-linkage/hud.htm (2023).
11.
Centers for Disease Control and Prevention (CDC) & National Center for Health Statistics (NCHS). NCHS data linked to department of veterans affairs administrative data files. https://www.cdc.gov/nchs/data-linkage/va.htm (2022).
12.
Centers for Disease Control and Prevention (CDC) & National Center for Health Statistics (NCHS). NCHS data linked to USRDS end-stage renal disease files. https://www.cdc.gov/nchs/data-linkage/esrd.htm (2023).
13.
Centers for Disease Control and Prevention (CDC) & National Center for Health Statistics (NCHS). NCHS data linked to SSA social security benefit history files. CDC archive. https://archive.cdc.gov/www_cdc_gov/nchs/data-linkage/ssa.htm (2023).
14.
Disease Control, C. for, (CDC), P. & Health Statistics (NCHS), N. C. for. About NHIS. National health interview survey. https://www.cdc.gov/nchs/nhis/about/index.html (2025).
15.
Disease Control, C. for, (CDC), P. & Health Statistics (NCHS), N. C. for. About NCHS. National center for health statistics. https://www.cdc.gov/nchs/about/index.html (2025).
16.
Disease Control, C. for, (CDC), P. & Health Statistics (NCHS), N. C. for. About NHANES. National Center for Health Statistics (2024).
17.
Disease Control, C. for, (CDC), P. & Health Statistics (NCHS), N. C. for. NHANES i (1971-1974). National Center for Health Statistics.
18.
Disease Control, C. for, (CDC), P. & Health Statistics (NCHS), N. C. for. NHANES i - epidemiologic followup study (NHEFS). National Center for Health Statistics.
19.
Disease Control {and} Prevention (CDC), C. for & Health Statistics (NCHS), N. C. for. National nursing home survey homepage. CDC archive. https://archive.cdc.gov/www_cdc_gov/nchs/nnhs/index.htm (2015).
20.
Centers for Disease Control and Prevention (CDC) & National Center for Health Statistics (NCHS). The second longitudinal study of aging (LSOA II). CDC archive. https://archive.cdc.gov/www_cdc_gov/nchs/lsoa/lsoa2.htm (2025).
21.
Disease Control {and} Prevention (CDC), C. for & Health Statistics (NCHS), N. C. for. National hospital ambulatory medical care survey. National hospital ambulatory medical care survey. https://www.cdc.gov/nchs/nhamcs/about/index.html (2024).
22.
Disease Control, C. for, (CDC), P. & (RDC), R. D. C. Application process. Research Data Center (2025).
23.
Disease Control, C. for, (CDC), P. & Health Statistics (NCHS), N. C. for. NHANES data release and access policy.
24.
Political, I.-U. C. for, University of Michigan, S. R. (ICPSR) at the, Science, N. C. for & (NCSES), E. S. ResearchDataGov.org (RDG) user guide. 3–22 (2025).
25.
Disease Control, C. for, (CDC), P. & (RDC), R. D. C. Preparing for application submission. Research Data Center (2024).
26.
Disease Control, C. for, (CDC), P. & (RDC), R. D. C. Application review and committee decision. Research Data Center (2024).
27.
Disease Control, C. for, (CDC), P. & (RDC), R. D. C. Providing the public use and non-NCHS data. Research Data Center (24AD).
28.
Disease Control, C. for, (CDC), P. & (RDC), R. D. C. RDC reference materials. Research Data Center (2025).
29.
Disease Control, C. for, (CDC), P. & (RDC), R. D. C. Confidentiality and disclosure. Research Data Center (2024).
30.
Disease Control, C. for, (CDC), P. & (RDC), R. D. C. Fees and invoicing. Research Data Center (2024).
31.
Disease Control, C. for, (CDC), P. & (RDC), R. D. C. Output policies and procedures. Research Data Center (2024).
32.
Disease Control, C. for, (CDC), P. & (RDC), R. D. C. Publishing guidelines. Research Data Center (2024).
33.
Disease Control, C. for, (CDC), P. & (RDC), R. D. C. Location of access. Research Data Center (2024).
34.
Disease Control, C. for, (CDC), P. & (RDC), R. D. C. Federal statistical RDC. Research Data Center (2024).
35.
Disease Control, C. for, (CDC), P. & (RDC), R. D. C. NCHS RDC locations. Research Data Center (2024).
36.
Disease Control, C. for, (CDC), P. & (RDC), R. D. C. NCHS research data center (RDC) student advisor agreement. (2022).
37.
Centers for Disease Control and Prevention (CDC) & National Center for Health Statistics (NCHS). NCHS public-use synthetic linked data. https://www.cdc.gov/nchs/data-linkage/synthetic.data.htm (2025).
38.
Disease Control, C. for, (CDC), P. & Health Statistics (NCHS), N. C. for. Data user agreement. National Center for Health Statistics (2024).
39.
Centers for Disease Control and Prevention (CDC) & National Center for Health Statistics (NCHS). Using linked data products. https://www.cdc.gov/nchs/data-linkage/access.htm (2019).
40.
Centers for Disease Control and Prevention (CDC) & National Center for Health Statistics (NCHS). What’s new. https://www.cdc.gov/nchs/data-linkage/new-linkage.htm (2025).
41.
Centers for Disease Control and Prevention (CDC) & National Center for Health Statistics (NCHS). Data linkage webinar. https://www.cdc.gov/nchs/data-linkage/datalinkage-webinar.htm (2022).