Health Data


Many of the sources found on the international page may be useful.  Some additional sources can be found on the Data and Statistical Services Health Page.


Secondary Data Sources for Public Health. Cambridge University Press, 2007. (F and Stokes) RA409.B66 2007 Excellent overview of major datasets related to health.
National Center for Health Statistics Data Briefs   Statistical publications that provide information about current public health topics.

National Health Statistics Reports from the National Health Interview Survey

  Long detailed statistical publications that provide information about current public health topics.
Center for Disease Control and Prevention Publications   Publications on a variety of health related topics (see publications by topic at the bottom of the CDC page)
Health United States. U.S. Dept. of Health and Human Services, 1975- RA407.3.U57a (Current volume in DSS) Presents national trends in United States health status.  Also see Women's Health USA and Child Health USA.
CDC Wonder   Provides AIDS cases by metropolitan and rural area (1981-2002), natality (1995+), cancer statistics (1999+), environmental data, mortality data (1968+), infant deaths (1995+), tuberculosis data (1993+), STD morbidity (1984+), vaccine adverse reporting (1990+), and population estimates (1990+).
NCHHSTP AtlasPlus   Provides an interactive platform for accessing data collected by CDC’s National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention (NCHHSTP).  Provides interactive maps, graphs, tables, and figures showing geographic patterns and time trends of HIV, AIDS, viral hepatitis, tuberculosis, chlamydia, gonorrhea, and primary and secondary syphilis surveillance data typically to the county level.
US Health Map   Health trends in the United States at the county level for both sexes in:
  • Life expectancy between 1985 and 2014. 
  • Hypertension in 2001 and 2009.
  • Obesity from 2001 to 2011.
  • Physical activity from 2001 to 2011.
State Health Policy Research Dataset (SHEPRD): 1980-2010   Developed to study trends in the adoption of state public health laws. Specifically, the dataset covers annual trends in seatbelt laws, speed limits for passenger vehicles on rural interstates, minimum legal drinking ages, drunk driving laws, laws prohibiting the purchase of alcohol on Sundays, regulations for registering purchased kegs and/or prohibitions against selling kegs, beer taxes and total alcohol tax revenues, motorcycle and bicycle helmet laws, cigarette taxes, cigarette advertising bans, bans on workplace smoking, bans on smoking in restaurants and bars, and tobacco taxes (total revenue). Contains information about these laws for each year between 1980 and 2010, inclusive. In addition, it contains variables that describe the social, economic, demographic, health care, political, and crime characteristics of the states in each of these years.
State Marketplace Statistics Online Kaiser database showing by state statistics for the Affordable Care Act (Obamacare).
Source Book of Health Insurance Data. Health Insurance Association of America, 1959-2002. (Ceased publication) (Firestone Noncirculating) HG9396 .S72 Collection of statistical data on health insurance and medical care in the United States. Includes health insurance coverage, managed care, the insurance market, medical care costs, utilization, medical care providers, and morbidity and mortality trends.
State Health Access Data Assistance Center (SHADAC) Online Allows users to customize tables and graphs of health insurance coverage estimates within a pre-defined set of parameters. The Data Center is a user-friendly and easily accessible way to get health insurance coverage estimates from the Current Population Survey's Annual Social and Economic Supplement (CPS) and the American Community Survey (ACS).
Health & Medical Care Archive. Robert Wood Johnson Foundation Online  
Portrait of Health in the United States. Bernan, 2001. Online Charts the health status of the United States presenting data on health correlates, conditions, care, and consequences. Also includes a guide to other resources.  Useful for historical statistics.
Reforming the Health Care System: State Profiles. American Association of Retired Persons, 1990-2005 . RA407.3.R43 (Older issues) State data on issues of concern to the elderly.
State-Level Databook on Health Care Access and Financing. Urban Institute.

(Firestone Noncirculating) RA410.53.L67 1998; RA410.53.L67 1995

Provided state level data on health insurance coverage; the uninsured; indices of health status; and health care costs, access and utilization primarily for 1988-1995.
Health Statistics Online Guide to sources from the National Library of Medicine.



The following sources offer microdata.  For a comparison the major sources, see the Comparative Microdata tab.


ICPSR Health and Medical Care Archive
The HMCA preserves and disseminates health care data collected by researchers. Subjects covered include health care providers, cost/access to health care, substance abuse and health, chronic health conditions, and others.

Medical Expenditure Panel Survey (MEPS) 1996+
Third (and most recent) in a series of national probability surveys conducted by AHRQ on the financing and utilization of medical care in the United States. MEPS is the most recent in a series of medical expenditure surveys that began in 1977 as the National Medical Care Expenditure Survey and later became the National Medical Expenditure Survey (NMES).


National Medical Expenditure Survey (NMES)
Years covered:
1977, 1980, 1987.
Scope: provides information on health expenditures by or on behalf of families and individuals, the financing of these expenditures, and each person's use of services.
Sample Size and Makeup: (1) Households: national probability sample of the noninstitutionalized USA civilian population (2) Institutional Population: sample of nursing and personal care homes and facilities for the mentally retarded and residents admitted to those facilities. (3) American Indians and Alaskan Natives living on or near federal reservations.
How segmented: Age, Marital status, student status, veteran status, race, ethnicity
History: 1977: National Medical Care Expenditure Survey, 1980: National Medical Care and Utilization Survey, 1987: National Medical Expenditure Survey. Updated by the Medical Expenditure Panel Survey (see earlier in guide)
Where do I get the data? See ICPSR Direct
Questions: What is the average length of wait to see a doctor in a given specialty? Do physicians follow their own advice about smoking? Does a person's income determine how often they get a mammogram? How often do doctors only recommend necessary surgery? What is the cost of X procedure?

National Center for Health Statistics
The National Center for Health Statistics (NCHS) is a part of the Centers for Disease Control and Prevention (CDC), U.S. Department of Health and Human Services. The mission of the NCHS is to provide statistical information that will guide actions and policies to improve the health of the American people. NCHS data systems include data on vital events as well as information on health status, lifestyle and exposure to unhealthy influences, the onset and diagnosis of illness and disability, and the use of health care. These data are used by policy makers in Congress and the Administration, by medical researchers, and by others in the health community. Princeton participates in the NCHS Public-Use data program.

National Ambulatory Medical Care Survey Series (NAMCS)
Years covered:
1973-1981, 1985, 1989+.
Scope: contains data on medical care provided in physician's offices; Obtains data on the number of office visits and by selected physician characteristics. Data describing the nature of office visits include the patient's problem, prior visit status, referral status, major reason for the visit, physician's diagnosis, diagnostic and therapeutic services provided, and duration of the visit.
Sample Size and Makeup: continuously sampled survey based on a nationwide multistage probability sample of patient records. The physicians from which records are obtained are nonfederally-employed and are primarily involved in office-based patient care, but not engaged in the specialties of radiology, pathology, or anesthesiology.
How segmented: patient (age, race, sex of patient); physician (geographic location, type of practice, specialization)
Where do I get the data? See ICPSR Direct ICPSR is usually a year or two behind.
Questions: How do patients pay? Does the patient smoke? Does drug or alcohol abuse impact the # of doctor's visits? Does race, sex, or age impact the # of doctor's visits? Which demographic groups tend to use X prescription? Which are the most popular specialties? What % of the population is most likely to see a physician for X illness?

National Hospital Ambulatory Medical Care Survey Series
Years covered:
Scope: contains data on visits to hospital outpatient departments and emergency departments. Obtains data on demographics, triage, complaints, diagnosis, services, medications and immunizations provided, and waiting time.
Sample Size and Makeup:Visits to the emergency and outpatient departments of noninstitutional general and short-stay hospitals within the 50 states and the District of Columbia, which had an average length of stay of less than 30 days, or to hospitals whose specialty was general (medical or surgical) or children's general. Excluded were federal hospitals, hospital units within institutions, and hospitals with less than six beds staffed for patient use.
How segmented: patient (age, race, sex of patient); geographic location of facility.

National Health Interview Survey Series (NHIS)
Years covered:
1975- . For 1963-1975, see Health Interview Surveys.
Scope: basic purpose is to obtain information about the amount and distribution of illness, its effects in terms of disability and chronic impairments, and the kinds of health services people receive. Information on the utilization of medical care facilities is also available in the form of data on medical and dental care, hospitalization, preventive care, nursing care, prosthetic appliances, and self-care. The Core variables are contained in the files for household, person, condition, doctor visit, and hospital data. Each year additional batteries of questions are asked which focus on specific topics.
Sample Size and Makeup: representative sample of the civilian, noninstitutionalized population of the USA.  Family core answers questions on health insurance and basic demographics.  A randomly selected adult from the household is asked more questions along with a randomly selected child under the age of 18 (if present).
How segmented: type of living quarters, size of family, geographic region, age, sex, race, marital status, veteran status, education, income, industry, occupation codes, and limits on activity.
Where do I get the data? See ICPSR Direct. Also see the Integrated Health Interview Series which integrates 1969+.
Questions: Does income status impact health service received? Does the patient have Medicare, Medicaid, private health insurance? Does race impact prevalence of certain illnesses? Do certain areas of the country have higher incidences of certain diseases? What variables (age, race, sex, education, etc) impact one's health? Does your occupation impact your health?

National Hospital Discharge Survey
Years covered:
Scope: part of a continuing sample of hospital discharge records that supplies medical and demographic information used to calculate statistics on hospital utilization. The data collection consists of data abstracted from the face sheets of the medical records for sampled inpatients discharged from a national sample of nonfederal short-stay hospitals. The variables include information on the patient's demographic characteristics (sex, age, date of birth, race, marital status), dates of admission and discharge, status at discharge, diagnoses, procedures performed, source of payment, and hospital characteristics, such as bed size, ownership, and region of the country. Replaced by the National Hospital Care Survey.
Sample Size and Makeup: national probability sample of visits to the emergency and outpatient departments of noninstitutional general and short-stay hospitals. USA hospitals that had an average length of stay of less than 30 days, or hospitals whose specialty was general (medical or surgical) or children's general were eligible. Excluded were federal hospitals, hospital units within institutions, and hospitals with less than 6 beds staffed for patient use.
How segmented: See scope note
Questions: How does one pay for hospital visits? Does payment coverage differ by age, sex, region? How long does one stay in the hospital? Does treatment differ by type of hospital?

Public Patient Discharge Data
Producer: California. Office of Statewide Health Planning and Development
Former Titles: California Patient Discharge Data (1983-2000); Also called California Hospital Discharge.
Years Covered: Annual data holdings begin in 1983. Latest is 2014. Princeton also has the Ambulatory Surgery Data (2010-2014) and Emergency Department Data (2010-2014). 
Sample Size and Makeup: each inpatient discharged from California acute care hospitals which includes General Acute Care Hospitals, Acute Psychiatric Hospitals, Chemical Dependency Recovery Hospitals, Psychiatric Health Facilities, and State-operated hospitals.
How segmented: hospital facility number; age, race, sex, and 5-digit ZIP code; length of stay, day of the week on which patient was admitted along with the quarter of the year and the year admitted; source of admission; type of admission; principal diagnosis and up to 4 other diagnoses; principal procedure along with a few other procedures; disposition of the patient; expected principal source of payment; days from admission to each procedure; Diagnosis Related Group (DRG); Major Diagnostic Category (MDC); total charges; principal external cause of injury and up to 4 other external causes of injury; and the patient's county of residence.
Where is the documentation: (DSS) RA981.C3 C35 and RA981.C3 P824
Where do I get the data? DSS Study #2947 is restricted and stored on a secure server. Access will be granted after authorization. Please sign the form and give to the Data Librarian. Summary data for 1999+ is available through their web site.
Questions: How do people pay for health care? Which diagnoses and treatments are linked to which illnesses? What is the average length of stay for a visit?

Healthcare Cost and Utilization Project (HCUP)
United States. Agency for Health Care Policy and Research
Years covered: 1988+
Sample Size and Makeup: uses a stratified probability sample of community hospitals, with sampling probabilities proportional to the number of U.S. community hospitals in each stratum. Sampling weights were used to obtain national estimates for discharges, average length of stay, and average total charges. Includes American Hospital Association codes.
How segmented: region, location, teaching status, ownership/control, and bed size
Where is the documentation: (DSS) RA981.A2 H38
Where do I get the data? See the Main Catalog for study numbers and access (Princeton has the Nationwide Inpatient Sample data for 1988-2014 and the Nationwide Emergency Department Sample for 2006-2011, 2013), and State Inpatient Databases (SID) for Arizona (2002-2011), Maryland (2006-2011), New Jersey (2005-2011), Oregon (2002-2004), and Washington (2006-2011). Documentation for NIS: (DSS) RA981.A2 H38.  Documentation for NEDS (DSS) RA875.5.E5 N374.    Excellent overviews by topic can be found on the AHRQ site.

National Health and Nutrition Examination Survey (NHANES) and Followup Series (1971+)
The National Health and Nutrition Examination Surveys (NHANES I, II, III, Hispanic HANES, and NHANES I Epidemiologic Followup Survey [NHEFS]) were designed to obtain information on the health and nutritional status of the United States population. The NHANES I and NHANES II datasets were formerly titled Health and Nutrition Examination Surveys. This series succeeds the National Health Examination Survey, which was collected from 1959 to 1970. All of the NHANES datasets use complex, multistage, stratified, clustered samples of civilian noninstitutionalized populations. All of the files within each study are linkable to each other. NHANES I (1971-1975) interviewed persons aged 1-74 years. The sample was selected so that certain population groups thought to be at high risk of malnutrition (persons with low incomes, preschool children, women of childbearing age, and the elderly) were over sampled at preset rates. On completion of the survey, 23,808 of the interviewed sample were given a medical examination, and this information is also part of the NHANES I data collections. The NHANES I Epidemiologic Followup Study (NHEFS) is a longitudinal study designed to investigate the relationships between clinical, nutritional, and behavioral factors assessed in NHANES I and subsequent morbidity, mortality, and hospital utilization, as well as changes in risk factors, functional limitation, and institutionalization. The NHEFS cohort includes all persons aged 25-74 who completed a medical examination for NHANES I (N = 14,407). NHANES II (1976-1980), was designed to continue the measurement and monitoring of the nutritional status and health of the United States population. From the sample of 27,801 persons aged 6 months to 74 years, 25,286 people were interviewed and 20,322 were both interviewed and examined. Because children and persons classified as living at or below the poverty level were assumed to be at special risk of having nutritional problems, they were sampled at rates substantially higher than their proportions in the general population. NHANES III (1988-1994) contains information on a sample of 33,994 persons aged 2 months and older. A home examination was employed for the first time in order to obtain examination data for very young children and for the elderly. The Hispanic HANES (HHANES) was conducted to obtain sufficient numbers to produce estimates of the health and nutritional status of Hispanics in general, as well as specific data for Puerto Ricans, Mexican Americans, and Cuban Americans. The latest data can be found on the CDC website.