Categories: Newsletter Issue 2020:1

Data Resources on Medical Providers

By: Alice Chen, Michael Richards, and Kosali Simon[1]

We provide an overview of some commonly-used data sources for research on medical providers. These focus mostly on physicians, though some extend to other provider types (e.g., nurse practitioners and physicians assistants). We largely proceed by introducing and describing the data sources according to relative ease of accessibility to an interested health economics researcher; however, it’s worth remembering that circumstances will vary across researchers’ institutions and professional network. Download links are embedded in the text and summarized at the end.

This is also not meant to be a comprehensive list; we encourage interested researchers to continue the conversation on twitter with the hashtag #MedicalProviderResearchData  and share more details about these or other resources they may be aware of (as well as any of their own publications that use them!).

Please also refer to the column on hospital financial data in the last ASHEcon issue, and stay tuned for details on patient-level data in the next issue.

Physician Compare/NPPES

By: Anupam Jena

Until recently, detailed data on physician characteristics have been relatively limited. This has precluded analyses that relate to how labor market outcomes and patient outcomes vary with physician sex, experience, and other factors (e.g., specialty, training location).

Historically, the CMS National Plan and Provider Enumeration System (NPPES) contained some of these data elements, but in recent years, new repositories have emerged, including the CMS Physician Compare National Downloadable File.  Physician Compare includes information on physician name, sex, age, address, medical school attended, and specialization. The database also has a National Provider Identifier (NPI), which creates a useful opportunity for direct linkage to other databases.

While these datasets are publicly available, a clear downside is that the data are primarily restricted to demographic and training information. Data on social networks aren’t readily available unless one constructs them from where physicians trained, which is feasible but takes effort. Data on physician attitudes that might be available from large scale physician surveys are also not captured in these datasets.

Recent publication examples are listed as references 1-6.

Medicare Part B/Part D Public Use Files and Open Payments

By: Thuy Nguyen

CMS has recently made available provider-identified public use data for Medicare Part D and Part B. These data are available from CY2013 onwards. The data contain NPI, are aggregated to the provider level, and offer an alternative to the restricted-access microdata available through CMS/ResDAC. However, a limitation to be aware of is that for Part B, only Medicare FFS claims will be included––at least so far. For Part D,  the universe of claims is included (i.e., Medicare FFS + Medicare Advantage enrollees). Another drawback is CMS’s data truncation: Part B files do not include data for services performed on 10 or fewer beneficiaries and Part D records with 10 or fewer claims are suppressed to protect beneficiary privacy.

Recent publication examples are listed as references 7-8 (for Part D and B respectively).

Sunshine Act (Open Payments) data records all payments and transfers between drug and medical device manufacturers and certain health care providers (physicians and teaching hospitals). These data include details of these payments such as amount of payment, date of payment, covered product, and nature of payment (for example, consulting fee or meal). Civil monetary penalties may be imposed on these manufacturers for failure to report these payments. These data are available for free, for 2013 and onwards. One limitation of Open Payments data is that product samples and educational materials intended for patient use are excluded.

Note, these data do not contain NPI, but matching techniques can be used based on address and name. For a recent paper using Open Payments data, see reference 9.

State Discharge Data

By: Elizabeth Munnich

When researchers look for patient information linked to identifiable clinical providers, state hospital discharge surveys present good options.  Several states collect hospital discharge data for inpatient and outpatient services, which include identifiers that make it possible to track physicians over time and link to other physician data sets. These data typically include information about patient diagnoses and demographics, medical procedures, and hospital charges. Data are available for purchase (after submitting proposals and data security protocols) through state websites. Some examples include:

Inclusion of physician identifiers makes it possible to observe all care provided by physicians in inpatient—and some outpatient—settings within a state. Most often, the identifier available is the NPI, which enables the researcher to connect to other physician-level information found outside of the discharge data. Importantly, state discharge data are publicly available and often less expensive than many claims data sets. Occasionally, reduced fees are available for academic researchers as well, and students can sometimes benefit from further reduced prices.  Of course, not all states provide these data, and data may not include patient identifiers to track patients across medical encounters. Finally, while these data are often less expensive than other sources, they do come at a cost. Prices can range from $10 to $4,500 for one year of statewide data for the states mentioned above.

Recent publication examples are listed as references 10-17.

All Payer Consortium

By: Ellie Prager

An emerging opportunity for provider-focused studies comes from all-payer claims databases (APCDs). APCDs are typically compiled at the state level by drawing data from all large private health insurance carriers in the state, making the data quite comprehensive for privately insured consumers. They also have the benefit of reporting allowed amounts that insurers actually pay to providers, in addition to charge prices. This feature allows health economists to explore actual spending implications. Prominent examples of APCDs that have gained the attention and analytic effort of health economists come from Maine, Massachusetts, Arkansas, and New Hampshire. Each state has a formal data application procedure for researchers, although one crucial and somewhat unique step in this process is setting up a secure computing environment that meets all the specifications of the state.

A sometimes underappreciated benefit of APCDs is that the data documentation is often more complete and systematic than when getting data directly from private sources. One drawback is studying the uninsured––as the name reveals, these are payer-focused medical care transactions. Additionally, variable coding is not always consistent across payers. This applies both to whether a variable is populated at all (e.g. provider network status) and to how values are coded (e.g. DRG version, use of decimal points in ICD codes, etc.). In a similar fashion, provider IDs can be poorly standardized across the various payers included within the database, which necessitates costly cleaning to become analytically useful. Researchers should also keep in mind that while they observe the universe of transactions across the included payers, they typically won’t know detailed insurance plan characteristics.

Recent publication examples listed in references 19-20.


By: Sayeh Nikpay

Physician practice business models and ownership structures are becoming more complex and consequently reshaping the incentives facing a given provider. For these and related reasons, there has been considerable interest in data assets that can offer health economists more detailed organizational information attached to a given practice and provider, especially with a longitudinal dimension. The SK&A physician office survey has been able to fill this need for a variety of health economics and health policy studies in recent years. The data include hundreds of thousands of physicians working all across the US, with some practices now tracked for over a decade.

The SK&A data were recently purchased by IQVIA, a data analytics firm. Beginning in 2020, IQVIA will collect the SK&A data, and these data will no longer be available to purchase as a standalone product. Instead, IQVIA plans to bundle SK&A together with other IMS Health data from the new parent company into a product called “OneKey.” This may prove to be an even better research resource––though pricing models and such remain unclear at this point.

A clear strength is the richness of the data, which includes practice level information on patient volume, physician staffing and specialization, health IT capabilities, ACO participation, and public payer (i.e., Medicare and Medicaid) participation. Crucially, they also contain information on hospital ownership, system membership, and physician group membership. These latter elements provide useful measures of vertical and horizontal integration within and across physician markets.

The data also have individual physician identifiers (e.g., NPIs), which allow researchers to track the same physician across multiple practices as well as link to other physician-level data. The data are also as close to a census of office-based physician practices as one could probably get––at least among known datasets to date. That said, it’s still not a true census of physicians (e.g., more hospital-oriented physicians will often be unobserved), so may not be appropriate for every research question. Additionally, the answers are self-reported and therefore may be measured with error. Although those who survey practices for SK&A are supposed to ask questions of a practice manager, the person who responds may not be aware of all the nuances related to a given question, e.g., those tied to ownership versus affiliation arrangements.

Finally, the new price points are yet to be determined under the One Key model; however, from a historical standpoint, the SK&A data haven not been cheap––perhaps $10,000 or more for a single year of data depending on what variables are included, how many years are requested, and the overall negotiation process. The data are likely as close to a census of (office-based) physicians that includes detailed practice organization information as one can find; however, the database is still not a census of all physicians (e.g., hospital-oriented physicians will be underrepresented) so not all research questions will be appropriate for the SK&A data.

Recent publication examples are listed in references 21-22.

Medicare Claims Data

By: Chris Whaley

A longstanding workhorse of health economics research is Medicare claims data. The public insurance program covers tens of millions of individuals in a given year and accounts for roughly one out of every five medical spending dollars annually. Thus, it is no surprise that researchers pay special attention to how policy and market changes influence providers’ treatment behavior toward Medicare beneficiaries.

While these data have been used for decades, gaining access to Medicare claims data is a more formalized process than getting access to many other sources of data. Briefly, interested researchers submit a proposal, which includes a description of the project, the funding source, a Data Use Agreement (DUA), and a data safeguarding plan for how the research team will ensure data confidentiality to the Research Data Assistance Center (ResDAC), which is the CMS contractor that assists researchers with accessing CMS data. CMS then reviews the application and determines if CMS data requested are appropriate for the study.

A salient appeal of the Medicare claims data is the ability to get a comprehensive view of medical service flows and spending tied to one of the most prominent and policy-targeted social programs within the US. This facilitates more holistic analyses and nuanced questions, related to things such as provider coordination across care domains, incentive alignment between providers, and ownership arrangements across provider types (e.g., hospital ownership of physicians). Another strength of the data is the fact that they are widely available to a variety of researchers, which helps ‘un-silo’ the data so that they can be used to answer a wide variety of health and policy relevant research questions.

That said, an immediate drawback is the researcher can only observe the Medicare fee-for-service population (i.e., one market within the mixed economy). The data often materialize with a significant lag––which can be longer than what is found among other data sources and can matter a great deal for emerging research topics as well as very recent policy shifts. Lastly, purchasing Medicare data can come with considerable costs that can easily necessitate external funding; however, there are options to reuse Medicare claims data held by other researchers within the same institution, which can help hold down the total costs of a particular project.

Recent publication example is listed in reference 23, where the authors use Medicare data to examine the impact of vertical integration between physicians and hospitals/health systems on referral patterns

Table 1: Download URLs for Selected (Free or Fairly Low Cost) Data Sources

Data Sources Download URL
CMS National Plan and Provider Enumeration System NPPES
CMS Physician Compare National Downloadable File CMS Physician Compare National Downloadable File
Part B Part B
Part D Part D
Sunshine Act (Open Payments) data Sunshine Act (Open Payments) data
Hospital discharge data with physician identifiers
All-payer data Maine, Massachusetts, Arkansas, and New Hampshire


Other data sources: A characteristic of clinical provider data is that—because of several public reporting requirements—there are a number of other specialized purposes for which data are collected (e.g. data on DEA Registration & Drug Addiction Treatment Act (DATA) Waivers). Other sources of data which we do not cover in this column include malpractice reporting through the National Practitioner Data Bank, physician prescribing patterns in ProPublica’s DEA-NPI Dataset, the Department of Health and Human Services’ Office of Inspector General exclusions data for fraud, and CMS’ Medicare Data on Provider Practice and Specialty (MD-PPAS) which links NPIs to tax identification numbers (TIN).

Further Reading:

Moghtaderi A, Viragh T, Berrateo N, and Black B. Individual Healthcare Provider Crosswalk Database, 1999-2019. Working Paper. Available upon request.

Doshi, Jalpa A et al. “Data, Data Everywhere, but Access Remains a Big Issue for Researchers: A Review of Access Policies for Publicly-Funded Patient-Level Health Care Data in the United States.” EGEMS (Washington, DC) vol. 4,2 1204. 31 Mar. 2016, doi:10.13063/2327-9214.1204


  1. Tsugawa Y, Jena AB, Figueroa JF, Orav EJ, Blumenthal DM, Jha AK. “Comparison of hospital mortality and readmission rates for Medicare patients treated by male vs female physicians.” JAMA Internal Medicine, Dec 2016.
  2. Tsugawa, Jena AB, Orav EJ, Jha AK. “Quality of care delivered by general internists in US hospitals who graduated from foreign versus US medical schools: observational study.” BMJ, February 2017.
  3. Tsugawa Y, Jha AK, Newhouse JP, Zaslavsky AM, Jena AB. “Variation in physician spending and association with patient outcomes.” JAMA Internal Medicine, March,
  4. Tsugawa Y, Newhouse JP, Zaslavsky A, Blumenthal DM, Jena AB. “Physician age and outcomes in elderly patients in hospital in the US: observational study.” BMJ, 357:j1797, 2017.
  5. Blumenthal DM, Olenski AR, Tsugawa Y, Jena AB. “Association Between Treatment by Locum Tenens Internal Medicine Physicians and 30-Day Mortality Among Hospitalized Medicare Beneficiaries.” JAMA 318(21), pp. 2119-2129, 2017.
  6. Tsugawa Y, Blumenthal DM, Jha AK, Oraj EJ, Jena AB. “Association between physician US News & World Report medical school ranking and patient outcomes and costs of care: observational study.” BMJ 262, k3640, 2018.
  7. Part D: Nguyen, T. D., Bradford, W. D., & Simon, K. I. (2019). Pharmaceutical payments to physicians may increase prescribing for opioids. Addiction, 114(6), 1051-1059
  8. Part B: Ko, Joan S., Heather Chalfin, Bruce J. Trock, Zhaoyong Feng, Elizabeth Humphreys, Sung-Woo Park, H. Ballentine Carter, Kevin D. Frick, and Misop Han. “Variability in medicare utilization and payment among urologists.” Urology 85, no. 5 (2015): 1045-1051.
  9. Carey,Colleen, Ethan M.J. Lieber, Sarah Miller. 2020. Drug Firms’ Payments and Physicians’ Prescribing Behavior in Medicare Part D NBER Working Paper No. 26751.
  10. Guldi, Melanie, Elizabeth L. Munnich, and Steven Talbert. 2020. “Changes in the Air Ambulance Market and Effects on Individual Health Outcomes.” Working Paper.
  11. David, Guy, and Mark D. Neuman. 2011. “Physician division of labor and patient selection for outpatient procedures.” Journal of Health Economics, 30(2): 381-391.
  12. Durrance, Christine Piette, and Scott Hankins. 2018. “Medical malpractice liability exposure and OB/GYN physician delivery decisions.” Health Services Research, 53(4): 2633-2650.
  13. Gabel, Jon R., Cheryl Fahlman, Ray Kang, Gregory Wozniak, Phil Kletke, and Joel W. Hay. 2008. “Where Do I Send Thee? Does Physician-Ownership Affect Referral Patterns to Ambulatory Surgery Centers?” Health Affairs, 27(Suppl1): w165-w174.
  14. Geruso and Richards. 2020. “Trading Spaces: Medicare’s Regulatory Spillovers on Treatment Setting for Non-Medicare Patients.” Working Paper.
  15. Hu, Tianyan and Karoline Mortensen. 2016. “Mandatory Statewide Medicaid Managed Care in Florida and Hospitalizations for Ambulatory Care Sensitive Conditions.” Health Services Research.
  16. Munnich, Elizabeth L. and Michael R. Richards. 2020. “Treatment Flows After Outsourcing Public Insurance Provision: Evidence from Florida Medicaid.” Working Paper.
  17. Munnich, Elizabeth L. and Michael R. Richards. 2020. “Medicare payment reform effects on outpatient procedure supply and competition. Working Paper.
  18. Yee, Christine A. 2011. “Physicians on board: An examination of physician financial interests in ASCs using longitudinal data.” Journal of Health Economics, 30(5): 904-18.
  19. Prager, Elena. “Health Care Demand under Simple Prices: Evidence from Tiered Hospital Networks”. American Economic Journal: Applied Economics, 2020, forthcoming.
  20. Prager, Elena and Nicholas Tilipman. “Disagreement Payoffs and Negotiated Prices: Evidence From Out-of-Network Hospital Payments.” Working Paper.
  21. Richards, Michael R., Sayeh S. Nikpay, and John A. Graves. “The Growing Integration of Physician Practices.” Medical care 54, no. 7 (2016): 714-718.
  22. Nikpay, Sayeh S., Michael R. Richards, and David Penson. “Hospital-physician consolidation accelerated in the past decade in cardiology, oncology.” Health Affairs 37, no. 7 (2018): 1123-1127.
  23. Damberg, Cheryl, Michael Richards, Xiaoxi Zhao, and Christopher Whaley. “Impact of vertical integration between physicians and hospitals/health systems on referral patterns.” Working Paper.

Alice Chen is an Assistant Professor of Public Policy at the University of Southern California.

Michael Richards is an Associate Professor of Economics at Baylor University.

 Kosali Simon is the Associate Vice Provost for Health Sciences and the Herman B Wells Endowed Professor of Public and Environmental Affairs at Indiana University.

 Anupam Jena is the Ruth L. Newhouse Associate Professor of Health Care Policy and Medicine at Harvard University.

 Thuy Nguyen is a Postdoctoral Fellow at the Paul H. O’Neill School of Public and Environmental Affairs at Indiana University.

 Martin Hackman is an Assistant Professor of Economics at the University of California Los Angeles.

 Elizabeth Munnich is an Assistant Professor of Economics at the University of Louisville.

 Ellie Prager is an Assistant Professor of Strategy at Northwestern University.

 Sayeh Nikpay is an Assistant Professor of Health Policy at Vanderbilt University.

 Chris Whaley is an Associate Policy Researcher at the RAND Corporation and a Professor at the Pardee RAND Graduate School.

[1] The authors would like to thank Carol Kane, Kurt Lavetti, Tony LoSasso and Sean Nicholson for helpful comments.