Catalyzing the Future of Medicaid Research: The T-MSIS Analytic Files

Assistant Professor, Division of Health Policy and Economics, Department of Population Health Sciences, Joan & Sanford I. Weill Medical College, Cornell University

The Medicaid program currently enrolls over 86 million people, making it the single largest payer for health care services in the United States. Medicaid spending represents 16% of national health expenditures and, on average, 20% of state budgets. Medicaid covers 40% of births and 38% of children. It disproportionately enrolls individuals from racial and ethnic minority populations and other subgroups that have historically faced structural inequity in the United States, making it an important policy lever for addressing disparities in health care access, utilization, and health outcomes. Yet, health economics research on basic policy questions in Medicaid has largely lagged research in Medicare and commercial insurance markets. This has been attributed, in part, to the federalist structure of the Medicaid program and underinvestment in efforts to reconcile Medicaid data across states. While national Medicaid administrative claims data have historically been available for research purposes through the Medicaid Analytic eXtract (MAX) produced by the Centers for Medicare and Medicaid Services (CMS), broad use of the MAX files in research has long been hampered by quality issues, inconsistencies in coding across states, and a long lag in data availability.

MAX files are derived from the Medicaid Statistical Information System (MSIS), through which states have long provided their claims data to CMS to comply with reporting requirements. In 2011, CMS began a years-long effort to launch the Transformed Medicaid Statistical Information System (T-MSIS), which was designed to better standardize claims data collection across states and improve the usability of the data for both oversight and research purposes. In 2019, CMS released the first T-MSIS extract produced for research use, the T-MSIS Analytic Files (TAF). Since states vary in when they transitioned from MSIS to T-MSIS, TAF data are first available for a subset of states beginning in 2014 and for all states in 2016. 2020 TAF data are currently available by request from CMS, and the availability lag appears to be decreasing with time as states, CMS, and data contractors increase their familiarity with T-MSIS.

TAF data reflect utilization both in fee-for-service Medicaid and Medicaid managed care. A notable limitation is that spending for payments from managed care plans to providers are currently redacted. The TAF are organized into five primary files. The demographic and eligibility file includes detailed beneficiary-level characteristics, including age, gender, race and ethnicity, income, and other eligibility, benefits, and enrollment information. The other four files contain claims data for (1) inpatient hospital care; (2) long-term care; (3) prescription drug utilization; and (4) other services, which includes physician, outpatient hospital, home health, and other utilization not included in the first four files. Supplemental annual provider and plan files are also available with covariates on provider and managed care plan characteristics, respectively.

Although TAF data usability represents a huge advance over the MAX files, quality issues remain, which requires significant caution when using the data for research purposes. For example, inconsistent enrollment numbers, low claims volume, and missingness in data elements may mean that certain states and years of data cannot be used for particular research questions. The Data Quality (DQ) Atlas, an online dashboard produced by Mathematica under contract with CMS, is a key resource for understanding data limitations in the TAF. The DQ Atlas evaluates TAF quality across a wide variety of data elements, by state and year, and compares the TAF to external benchmarks when available. Data attributes examined in the DQ Atlas include aggregate enrollment, enrollment patterns over time, beneficiary information, claims completeness, spending, service use information, and provider information, among others. The DQ Atlas grades data elements, by state and year, offering assessments that range from “low concern” to “unusable.” While in many cases data elements with grades of low and medium concern may be sufficiently high quality to support analysis, thresholds for usability should likely vary based on the research question and potential for bias due to misreporting or missingness. To provide a sense of variation in quality by data element, the DQ Atlas found that total Medicaid enrollment in the 2019 TAF data exhibits low concern in 48 states (including the District of Columbia), medium concern in two states, and high concern in one state. In comparison, concern about inpatient claims volume is low in 35 states, whereas it is medium in 10 states and high in six states.

Much of the focus on TAF quality has centered on the usability of beneficiary race and ethnicity information. Since race and ethnicity data are self-reported by beneficiaries and the information is not required for enrollment, many states have high levels of missingness. According to the DQ Atlas, in 2019, the data are deemed to have low concern in only 15 states and medium concern in an additional 14 states based on missingness and benchmarking the data to the American Community Survey five-year estimates. The data are described as “unusable” in five states. For reference, a number of organizations have further interrogated the race and ethnicity data in the TAF, including the Kaiser Family Foundation, the Medicaid and CHIP Payment and Access Commission, NORC at the University of Chicago, and the State Health Access Data Assistance Center (SHADAC) at the University of Minnesota. TAF quality — including the beneficiary race and ethnicity indicators — has already improved substantially since the launch of the dataset, and the general expectation is that quality will continue to improve with time as CMS engages in ongoing quality improvement efforts with states.

Access to TAF requires a data-use agreement with CMS. Researchers can either purchase physical TAF data — with price varying based on the number of files and the size of the beneficiary cohort requested — or work with the TAF in the CMS Virtual Research Data Center for a set annual fee. In addition to the DQ Atlas, other important resources for getting started with the data include the technical documentation, user guide, and codebooks. AcademyHealth’s Evidence-Informed State Health Policy Institute, with funding from the Commonwealth Fund and Robert Wood Johnson Foundation, has launched the Medicaid Data Learning Network (MDLN), which is designed to provide a forum for academic users of the TAF to develop and disseminate best practices. Learnings and consensus standards from the MDLN will be shared publicly in an effort to standardize basic methodologies for working with TAF, including defining eligibility categories, measuring spending, leveraging the race and ethnicity data, implementing quality measures, and more.

The advent of the TAF represents a major opportunity for health economists to answer a variety of canonical, timely, and policy-relevant questions about the Medicaid delivery system. With a concerted effort by CMS, states, and other stakeholders to improve the data and develop best practices for using the data in the coming years, TAF may do much to catalyze research on America’s health care safety net.