Categories: Newsletter Issue 2025:1


Data Corner Episode 1: National Vital Statistics System

By James Flynn, Assistant Professor, Department of Economics at Miami University and IZA

Question 1: Tell us about a project or projects you’ve used this data for. What stage are these in (published, working paper, work-in-progress)?

I have used different versions of National Vital Statistics System (NVSS) data for three projects. The first was my job market paper, which is currently forthcoming at the Journal of Human Resources. In this paper, I used natality records linked with infant death records to investigate the impact of the Colorado Family Planning Initiative (CFPI) on infant health outcomes. The CFPI was a privately funded family planning intervention which gave thousands of long-acting reversible contraceptives to mostly lower-income women in Colorado. By giving the most disadvantaged women in the state the ability to avoid unwanted pregnancies, the intervention dramatically reduced the rates of both extremely preterm births (births before 28 weeks of gestation) and infant deaths. The geographic precision of the birth records allowed me to demonstrate that these reductions only occurred in counties which had Title X family planning clinics, where the CFPI was implemented, ruling out any potential statewide confounding variables.

In the second project, along with my coauthors, I am using the NVSS mortality files to look at the impact of the Louisiana Hepatitis C Elimination Plan on mortality due to Hepatitis C related illness. This program implemented a first-of-its-kind subscription-based model, which gave the state’s public health agency unlimited access to the lifesaving but prohibitively expensive class on direct-acting antiviral drugs, which clear Hepatitis C infection with almost perfect efficacy. We show that this intervention, which led to an immediate increase in utilization of the drug, also caused meaningful reductions in mortality from conditions related to Hepatitis C infection. This project benefited from the detailed cause-of-death codes which allowed us to look specifically at deaths from Hep-C related causes, and also test our results for robustness by dropping deaths related to alcohol and drug use.

In the third project, I am in the early stages of using the NVSS natality files to assess whether raising the minimum legal sale age of tobacco products from 18 to 21 reduced the number of expecting mothers in that age group who smoked during pregnancy. The natality records include responses to questions about the mother’s smoking behavior before pregnancy, and then during each of the subsequent trimesters of the pregnancy. This will allow me to measure the impact of these laws on the treated group of mothers, and also compare differences in trends to a placebo group of mothers aged 24-27 in the same states, for whom these laws could not have had an impact. Finally, I will be able to trace the impacts on smoking to actual birth outcomes to determine whether it improves birth weights and gestational length of pregnancies.

Question 2: What was the application process like? How long did it take to write up the application? Did it require any revisions? How long was the waiting period (application to approval, approval to receiving the data)?

The approval process is relatively simple. There is a six-page form that researchers need to fill out to apply for the data. Most of the form is personal information and assuring compliance with data privacy protocols. As there are many datasets with the NVSS, part of the application is specifying exactly which dataset you are requesting, as well as providing a justification for why your project cannot be completed with the publicly available versions. The application requires approximately a 1-2 paragraph explanation of the project, including the research question and why it is important. You also need to describe your plans to present and publish the data, and describe the security measures that will be in place to protect the data. Researchers should review the data-release policy at the NVSS to determine whether their data request will be able to be accommodated. Across the three times I have applied for data through the NVSS, the process has taken between two and six weeks to get approved, and then another two to four weeks to actually receive the data from NVSS.

Question 3: How is the data accessed (sent and stored locally, via VPN)? What kind of security measures are required?

Researchers are able to store and access the data from their secure computer system through their affiliated organization/institution. You need to briefly describe your institution’s data protection procedures. In practice, I have simply reached out to IT at my institution and asked them for the language to use to respond to this question. The data is then sent through a VPN for which you get a temporary password to log in and download the data. Researchers are required to destroy the data at the end of the approved project unless they apply for and are given an extension. Being able to access the data on my local computer has been a big advantage of using NVSS data, as opposed to other data sources which require you to access through a restricted data center (RDC).

Question 4: What aspects of the data are particularly useful (variables, sample size, timeframe, etc.)? In general and for you specifically?

I have found the sample size, with the statistical power it offers, to be the most useful feature of the data, along with the granular detail. It is incredibly useful for measuring the impact of health interventions at either the state or county level.

Question 5: Are there limitations to keep in mind, or that researchers might find surprising?

In the interests of protecting privacy, these data do not typically include precise birth dates or dates of death. They provide the month and year, which is often sufficient for analyzing policies that change over time in specific places, but there are particular identification strategies that can be complicated by this limitation. For example, this paper which uses a regression discontinuity design to look at the impact of turning 21 on alcohol consumption and mortality would be subject to additional data scrutiny in order to access the precise dates of the subject’s 21st birthday and of their mortality. Similarly, research designs that use a regression discontinuity strategy to look at whether being born just before or after New Year’s Day impacts health outcomes due to the changes in tax treatment for births on either side of the cutoff would require additional levels of approval.

Question 6: Is there something interesting/cool/useful about this data that is likely unknown to most researchers? Anything else you want to share?

One thing that is particularly useful about the birth records is that they can be linked to infant death records, meaning that if you have an intervention which you expect to impact health both in utero and after birth, you can link the records in order to trace out the impact on the health of the pregnancy and the child that is subsequently born.