Skip to Main Content

Health Data


Many of the sources found on the international page may be useful.  Some additional sources can be found on the Data and Statistical Services Health Page.


Secondary Data Sources for Public Health. Cambridge University Press, 2007. (F and Stokes) RA409.B66 2007 Excellent overview of major datasets related to health.
National Center for Health Statistics Data Briefs   Statistical publications that provide information about current public health topics.
National Health Statistics Reports from the National Health Interview Survey   Long detailed statistical publications that provide information about current public health topics.
Center for Disease Control and Prevention Publications   Publications on a variety of health related topics (see publications by topic at the bottom of the CDC page)
Health United States. U.S. Dept. of Health and Human Services, 1975- RA407.3.U57a (Current volume in DSS) Presents national trends in United States health status. 
CDC Wonder   Provides AIDS cases by metropolitan and rural area (1981-2002), natality (1995+), cancer statistics (1999+), environmental data, mortality data (1968+), infant deaths (1995+), tuberculosis data (1993+), STD morbidity (1984+), vaccine adverse reporting (1990+), and population estimates (1990+).
NCHHSTP AtlasPlus   Provides an interactive platform for accessing data collected by CDC’s National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention (NCHHSTP).  Provides interactive maps, graphs, tables, and figures showing geographic patterns and time trends of HIV, AIDS, viral hepatitis, tuberculosis, chlamydia, gonorrhea, and primary and secondary syphilis surveillance data typically to the county level.
US Health Map   Health trends in the United States at the county level for both sexes in:
  • Life expectancy between 1985 and 2014. 
  • Hypertension in 2001 and 2009.
  • Obesity from 2001 to 2011.
  • Physical activity from 2001 to 2011.
State Health Policy Research Dataset (SHEPRD): 1980-2010   Developed to study trends in the adoption of state public health laws. Specifically, the dataset covers annual trends in seatbelt laws, speed limits for passenger vehicles on rural interstates, minimum legal drinking ages, drunk driving laws, laws prohibiting the purchase of alcohol on Sundays, regulations for registering purchased kegs and/or prohibitions against selling kegs, beer taxes and total alcohol tax revenues, motorcycle and bicycle helmet laws, cigarette taxes, cigarette advertising bans, bans on workplace smoking, bans on smoking in restaurants and bars, and tobacco taxes (total revenue). Contains information about these laws for each year between 1980 and 2010, inclusive. In addition, it contains variables that describe the social, economic, demographic, health care, political, and crime characteristics of the states in each of these years.
State Marketplace Statistics Online Kaiser database showing by state statistics for the Affordable Care Act (Obamacare).
Source Book of Health Insurance Data. Health Insurance Association of America, 1959-2002. (Ceased publication) (Firestone Noncirculating) HG9396 .S72 Collection of statistical data on health insurance and medical care in the United States. Includes health insurance coverage, managed care, the insurance market, medical care costs, utilization, medical care providers, and morbidity and mortality trends.
State Health Access Data Assistance Center (SHADAC) Online Allows users to customize tables and graphs of health insurance coverage estimates within a pre-defined set of parameters. The Data Center is a user-friendly and easily accessible way to get health insurance coverage estimates from the Current Population Survey's Annual Social and Economic Supplement (CPS) and the American Community Survey (ACS).
Health & Medical Care Archive. Robert Wood Johnson Foundation Online Primarily includes large-scale surveys of the American public about public health, attitudes towards health reform, and access to medical care; surveys of health care professionals and organizations, public health professionals, and nurses; evaluations of innovative programs for the delivery of health care, and many other topics and populations of interest.
Portrait of Health in the United States. Bernan, 2001. Online Charts the health status of the United States presenting data on health correlates, conditions, care, and consequences. Also includes a guide to other resources.  Useful for historical statistics.
Reforming the Health Care System: State Profiles. American Association of Retired Persons, 1990-2005 . RA407.3.R43 (Older issues) State data on issues of concern to the elderly.
State-Level Databook on Health Care Access and Financing. Urban Institute.

(Firestone Noncirculating) RA410.53.L67 1998; RA410.53.L67 1995

Provided state level data on health insurance coverage; the uninsured; indices of health status; and health care costs, access and utilization primarily for 1988-1995.
Health Statistics Online Guide to sources from the National Library of Medicine.



The following sources offer microdata.  For a comparison the major sources, see the Comparative Microdata tab.

Medical Expenditure Panel Survey (MEPS) 1996+
Third (and most recent) in a series of national probability surveys conducted by AHRQ on the financing and utilization of medical care in the United States. MEPS is the most recent in a series of medical expenditure surveys that began in 1977 as the National Medical Care Expenditure Survey and later became the National Medical Expenditure Survey (NMES) (1977, 1980, 1987).
Scope: provides information on health expenditures by or on behalf of families and individuals, the financing of these expenditures, and each person's use of services.
Sample Size and Makeup: (1) Households: national probability sample of the noninstitutionalized USA civilian population (2) Institutional Population: sample of nursing and personal care homes and facilities for the mentally retarded and residents admitted to those facilities. (3) American Indians and Alaskan Natives living on or near federal reservations.
How segmented: Age, Marital status, student status, veteran status, race, ethnicity
History: 1977: National Medical Care Expenditure Survey, 1980: National Medical Care and Utilization Survey, 1987: National Medical Expenditure Survey. Updated by the Medical Expenditure Panel Survey (see earlier in guide)
Where do I get the data? AHRQ;  for historical series see ICPSR. Also available through IPUMS.
Questions: What is the average length of wait to see a doctor in a given specialty? Do physicians follow their own advice about smoking? Does a person's income determine how often they get a mammogram? How often do doctors only recommend necessary surgery? What is the cost of X procedure?


National Center for Health Statistics
The National Center for Health Statistics (NCHS) is a part of the Centers for Disease Control and Prevention (CDC), U.S. Department of Health and Human Services. The mission of the NCHS is to provide statistical information that will guide actions and policies to improve the health of the American people. NCHS data systems include data on vital events as well as information on health status, lifestyle and exposure to unhealthy influences, the onset and diagnosis of illness and disability, and the use of health care. These data are used by policy makers in Congress and the Administration, by medical researchers, and by others in the health community. Princeton participates in the NCHS Public-Use data program.

National Ambulatory Medical Care Survey Series (NAMCS)
Years covered:
1973-1981, 1985, 1989+.
Scope: contains data on medical care provided in physician's offices; Obtains data on the number of office visits and by selected physician characteristics. Data describing the nature of office visits include the patient's problem, prior visit status, referral status, major reason for the visit, physician's diagnosis, diagnostic and therapeutic services provided, and duration of the visit.
Sample Size and Makeup: continuously sampled survey based on a nationwide multistage probability sample of patient records. The physicians from which records are obtained are nonfederally-employed and are primarily involved in office-based patient care, but not engaged in the specialties of radiology, pathology, or anesthesiology.
How segmented: patient (age, race, sex of patient); physician (geographic location, type of practice, specialization)
Where do I get the data? See ICPSR  ICPSR is usually a year or two behind. More recent datasets may be available on the NCHS site.
Questions: How do patients pay? Does the patient smoke? Does drug or alcohol abuse impact the # of doctor's visits? Does race, sex, or age impact the # of doctor's visits? Which demographic groups tend to use X prescription? Which are the most popular specialties? What % of the population is most likely to see a physician for X illness?

National Hospital Ambulatory Medical Care Survey Series
Years covered:
1992+. More recent datasets may be available on the NCHS site.
Scope: contains data on visits to hospital outpatient departments and emergency departments. Obtains data on demographics, triage, complaints, diagnosis, services, medications and immunizations provided, and waiting time.
Sample Size and Makeup:Visits to the emergency and outpatient departments of noninstitutional general and short-stay hospitals within the 50 states and the District of Columbia, which had an average length of stay of less than 30 days, or to hospitals whose specialty was general (medical or surgical) or children's general. Excluded were federal hospitals, hospital units within institutions, and hospitals with less than six beds staffed for patient use.
How segmented: patient (age, race, sex of patient); geographic location of facility.

National Health Interview Survey Series (NHIS)
Years covered:
1975- . For 1963-1975, see Health Interview Surveys.
Scope: basic purpose is to obtain information about the amount and distribution of illness, its effects in terms of disability and chronic impairments, and the kinds of health services people receive. Information on the utilization of medical care facilities is also available in the form of data on medical and dental care, hospitalization, preventive care, nursing care, prosthetic appliances, and self-care. The Core variables are contained in the files for household, person, condition, doctor visit, and hospital data. Each year additional batteries of questions are asked which focus on specific topics.
Sample Size and Makeup: representative sample of the civilian, noninstitutionalized population of the USA.  Family core answers questions on health insurance and basic demographics.  A randomly selected adult from the household is asked more questions along with a randomly selected child under the age of 18 (if present).
How segmented: type of living quarters, size of family, geographic region, age, sex, race, marital status, veteran status, education, income, industry, occupation codes, and limits on activity.
Where do I get the data? See ICPSR. Also see the Integrated Health Interview Series which integrates 1969+.
Questions: Does income status impact health service received? Does the patient have Medicare, Medicaid, private health insurance? Does race impact prevalence of certain illnesses? Do certain areas of the country have higher incidences of certain diseases? What variables (age, race, sex, education, etc) impact one's health? Does your occupation impact your health?

National Hospital Discharge Survey
Years covered:
Scope: part of a continuing sample of hospital discharge records that supplies medical and demographic information used to calculate statistics on hospital utilization. The data collection consists of data abstracted from the face sheets of the medical records for sampled inpatients discharged from a national sample of nonfederal short-stay hospitals. The variables include information on the patient's demographic characteristics (sex, age, date of birth, race, marital status), dates of admission and discharge, status at discharge, diagnoses, procedures performed, source of payment, and hospital characteristics, such as bed size, ownership, and region of the country. Replaced by the National Hospital Care Survey.
Sample Size and Makeup: national probability sample of visits to the emergency and outpatient departments of noninstitutional general and short-stay hospitals. USA hospitals that had an average length of stay of less than 30 days, or hospitals whose specialty was general (medical or surgical) or children's general were eligible. Excluded were federal hospitals, hospital units within institutions, and hospitals with less than 6 beds staffed for patient use.
How segmented: See scope note
Questions: How does one pay for hospital visits? Does payment coverage differ by age, sex, region? How long does one stay in the hospital? Does treatment differ by type of hospital?

Public Patient Discharge Data
Producer: California. Office of Statewide Health Planning and Development
Former Titles: California Patient Discharge Data (1983-2000); Also called California Hospital Discharge.
Years Covered: Annual data holdings begin in 1983. Latest is 2014. Princeton also has the Ambulatory Surgery Data (2010-2014) and Emergency Department Data (2010-2014). 
Sample Size and Makeup: each inpatient discharged from California acute care hospitals which includes General Acute Care Hospitals, Acute Psychiatric Hospitals, Chemical Dependency Recovery Hospitals, Psychiatric Health Facilities, and State-operated hospitals.
How segmented: hospital facility number; age, race, sex, and 5-digit ZIP code; length of stay, day of the week on which patient was admitted along with the quarter of the year and the year admitted; source of admission; type of admission; principal diagnosis and up to 4 other diagnoses; principal procedure along with a few other procedures; disposition of the patient; expected principal source of payment; days from admission to each procedure; Diagnosis Related Group (DRG); Major Diagnostic Category (MDC); total charges; principal external cause of injury and up to 4 other external causes of injury; and the patient's county of residence.
Where is the documentation: (DSS) RA981.C3 C35 and RA981.C3 P824
Where do I get the data? DSS Study #2947 is restricted and stored on a secure server. Access will be granted after authorization. Please sign the form and give to the Data Librarian. Summary data for 1999+ is available through their web site.
Questions: How do people pay for health care? Which diagnoses and treatments are linked to which illnesses? What is the average length of stay for a visit?

Healthcare Cost and Utilization Project (HCUP)
United States. Agency for Health Care Policy and Research
Years covered: 1988+
Sample Size and Makeup: uses a stratified probability sample of community hospitals, with sampling probabilities proportional to the number of U.S. community hospitals in each stratum. Sampling weights were used to obtain national estimates for discharges, average length of stay, and average total charges. Includes American Hospital Association codes.
How segmented: region, location, teaching status, ownership/control, and bed size
Where is the documentation: (DSS) RA981.A2 H38
Where do I get the data? Largest collection of longitudinal hospital care data in the United States, with all-payer, discharge-level information beginning in 1988. Micro-level HCUP data is also available from DSS. Speak to a librarian as a confidentiality agreement must be signed before this data can be made available. State of hospital and sometimes the city are identified through 2011. All geography is removed as of 2012. Princeton has the Nationwide Inpatient Sample/National Inpatient Survey (NIS) data for 1988-2020; the Nationwide Emergency Department Sample (NEDS) for 2006--2020 (no geography has ever been included); Hospital Market Structure (1997, 2000, 2003, 2006, 2009); and Nationwide Readmissions Database (NRD) for 2012-2020. Princeton also has the highly restricted State Inpatient Databases (SID) for Arizona (2002-2017), Florida (2009-2016), Georgia (2012-2017), Kentucky (2012-2017), Maryland (2006-2016), New Jersey (2005-2017), New York (2009-2016), Oregon (2002-2004), and Washington (2006-2011). California (1983-2014) and Texas (2015-2021Q3) are available separately from the overall program. Documentation for NIS: (DSS) RA981.A2 H38; Documentation for NEDS (DSS) RA875.5.E5 N374. Stata setup files  (2004+), SPSS setup files (1993+),  and SAS setup files (1997+) can be found on the AHRQ site. Excellent overviews by topic can be found on the AHRQ site.  For the restricted state files, only 2014+ files are in Stata but all years are in SAS.  

National Health and Nutrition Examination Survey (NHANES) and Followup Series (1971+)
The National Health and Nutrition Examination Surveys (NHANES I, II, III, Hispanic HANES, and NHANES I Epidemiologic Followup Survey [NHEFS]) were designed to obtain information on the health and nutritional status of the United States population. The NHANES I and NHANES II datasets were formerly titled Health and Nutrition Examination Surveys. This series succeeds the National Health Examination Survey, which was collected from 1959 to 1970. All of the NHANES datasets use complex, multistage, stratified, clustered samples of civilian noninstitutionalized populations. All of the files within each study are linkable to each other. NHANES I (1971-1975) interviewed persons aged 1-74 years. The sample was selected so that certain population groups thought to be at high risk of malnutrition (persons with low incomes, preschool children, women of childbearing age, and the elderly) were over sampled at preset rates. On completion of the survey, 23,808 of the interviewed sample were given a medical examination, and this information is also part of the NHANES I data collections. The NHANES I Epidemiologic Followup Study (NHEFS) is a longitudinal study designed to investigate the relationships between clinical, nutritional, and behavioral factors assessed in NHANES I and subsequent morbidity, mortality, and hospital utilization, as well as changes in risk factors, functional limitation, and institutionalization. The NHEFS cohort includes all persons aged 25-74 who completed a medical examination for NHANES I (N = 14,407). NHANES II (1976-1980), was designed to continue the measurement and monitoring of the nutritional status and health of the United States population. From the sample of 27,801 persons aged 6 months to 74 years, 25,286 people were interviewed and 20,322 were both interviewed and examined. Because children and persons classified as living at or below the poverty level were assumed to be at special risk of having nutritional problems, they were sampled at rates substantially higher than their proportions in the general population. NHANES III (1988-1994) contains information on a sample of 33,994 persons aged 2 months and older. A home examination was employed for the first time in order to obtain examination data for very young children and for the elderly. The Hispanic HANES (HHANES) was conducted to obtain sufficient numbers to produce estimates of the health and nutritional status of Hispanics in general, as well as specific data for Puerto Ricans, Mexican Americans, and Cuban Americans. The latest data can be found on the CDC website.  

National Longitudinal Study of Adolescent to Adult Health, Waves I-V, 1994-2018  (Ad Health)

Longitudinal study of a nationally representative sample of U.S. adolescents in grades 7 through 12 during the 1994-1995 school year. The Add Health cohort was followed into young adulthood with four in-home interviews, the most recent conducted in 2018 when the sample was aged 33-43. Add Health combines longitudinal survey data on respondents' social, economic, psychological, and physical well-being with contextual data on the family, neighborhood, community, school, friendships, peer groups, and romantic relationships.

Add Health Wave I data collection took place between September 1994 and December 1995, and included both an in-school questionnaire and in-home interview. The in-school questionnaire was administered to more than 90,000 students in grades 7 through 12, and gathered information on social and demographic characteristics of adolescent respondents, education and occupation of parents, household structure, expectations for the future, self-esteem, health status, risk behaviors, friendships, and school-year extracurricular activities. All students listed on a sample school's roster were eligible for selection into the core in-home interview sample. In-home interviews included topics such as health status, health-facility utilization, nutrition, peer networks, decision-making processes, family composition and dynamics, educational aspirations and expectations, employment experience, romantic and sexual partnerships, substance use, and criminal activities. A parent, preferably the resident mother, of each adolescent respondent interviewed in Wave I was also asked to complete an interviewer-assisted questionnaire covering topics such as inheritable health conditions, marriages and marriage-like relationships, neighborhood characteristics, involvement in volunteer, civic, and school activities, health-affecting behaviors, education and employment, household income and economic assistance, parent-adolescent communication and interaction, parent's familiarity with the adolescent's friends and friends' parents.

Add Health data collection recommenced for Wave II from April to August 1996, and included almost 15,000 follow-up in-home interviews with adolescents from Wave I. Interview questions were generally similar to Wave I, but also included questions about sun exposure and more detailed nutrition questions. Respondents were asked to report their height and weight during the course of the interview, and were also weighed and measured by the interviewer.

From August 2001 to April 2002, Wave III data were collected through in-home interviews with 15,170 Wave I respondents (now 18 to 26 years old), as well as interviews with their partners. Respondents were administered survey questions designed to obtain information about family, relationships, sexual experiences, childbearing, and educational histories, labor force involvement, civic participation, religion and spirituality, mental health, health insurance, illness, delinquency and violence, gambling, substance abuse, and involvement with the criminal justice system. High School Transcript Release Forms were also collected at Wave III, and these data comprise the Education Data component of the Add Health study.

Wave IV in-home interviews were conducted in 2008 and 2009 when the original Wave I respondents were 24 to 32 years old. Longitudinal survey data were collected on the social, economic, psychological, and health circumstances of respondents, as well as longitudinal geographic data. Survey questions were expanded on educational transitions, economic status and financial resources and strains, sleep patterns and sleep quality, eating habits and nutrition, illnesses and medications, physical activities, emotional content and quality of current or most recent romantic/cohabiting/marriage relationships, and maltreatment during childhood by caregivers. Dates and circumstances of key life events occurring in young adulthood were also recorded, including a complete marriage and cohabitation history, full pregnancy and fertility histories from both men and women, an educational history of dates of degrees and school attendance, contact with the criminal justice system, military service, and various employment events, including the date of first and current jobs, with respective information on occupation, industry, wages, hours, and benefits. Finally, physical measurements and biospecimens were also collected at Wave IV, and included anthropometric measures of weight, height and waist circumference, cardiovascular measures such as systolic blood pressure, diastolic blood pressure, and pulse, metabolic measures from dried blood spots assayed for lipids, glucose, and glycosylated hemoglobin (HbA1c), measures of inflammation and immune function, including High sensitivity C-reactive protein (hsCRP) and Epstein-Barr virus (EBV).

Wave V data collection took place from 2016 to 2018, when the original Wave I respondents were 33 to 43 years old. For the first time, a mixed mode survey design was used. In addition, several experiments were embedded in early phases of the data collection to test response to various treatments. A similar range of data was collected on social, environmental, economic, behavioral, and health circumstances of respondents, with the addition of retrospective child health and socio-economic status questions.

  Documentation can also be found on the AddHealth site the Add Health Navigator, and the Codebook Explorer


Behavioral Risk Factor Surveillance System (1984+) (BRFSS)
Tracks health risks in the United States. Monitors state-level prevalence of the major behavioral risks among adults associated with premature morbidity and mortality. Collects data on actual behaviors, rather than on attitudes or knowledge, that would be especially useful for planning, initiating, supporting, and evaluating health promotion and disease prevention programs. 2006-2021 have been converted into a cumulative Stata file.

Youth Risk Behavior Surveillance System 1991+ (YRBS)
Monitors health risk behaviors that contribute markedly to the leading causes of death, disability, and social problems among youth and adults in the United States. These behaviors, often established during childhood and early adolescence, include tobacco use, unhealthy dietary behaviors, inadequate physical activity, alcohol and other drug use. Sexual behaviors that contribute to unintended pregnancy and sexually transmitted diseases, including HIV infection. Behaviors that contribute to unintentional injuries and violence.