the medical expenditure panel survey - household component (meps-hc) contains data laid out a few different ways. the consolidated file has one-record-per-person with all the complex sample survey variables. start there. the eight event files contain one-record-per-person-per-event, and (except for the supplies/vision table) those events have some sort of dates. crikey. there are tables with one-record-per-person-per-medical-condition, one-record-per-job, even a one-record-per-person-per-interview-per-private-health-plan table for anyone who wants to spend less time with his or her family. if you merge anything to the consolidated file, make sure you understand the difference between setting the parameter all.x = TRUE versus all.x = FALSE -- some respondents have zero records in the non-consolidated files, others have multiple. hot tip: you probably want to aggregate non-consolidated files somehow. you might use tapply and aggregate, but i prefer aggregation using sql.
everything can be read in as a sas transport file (.ssp) using read.xport, but if you like making things harder than they have to be (i.e. if you ride a fixie), you can also follow the example buried in the ?read.SAScii documentation. ahrq draws the meps sample from the national health interview survey, interviews about thirty-five thousand individuals per year, and keeps everyone in the panel for two years. half of the respondents are in their first of two years of interviews, half are in their second. capice? meps generalizes to the us non-institutional, non-active duty military population. this new github repository contains three scripts:
1996-2010 household component - download all microdata.R
- loop through every year and every file type, download, then rename according to a pattern
- save each file as an r data file (.rda) and (if specified by the user) sas transport (.ssp), comma-separated value (.csv), and stata-readable (.dta)
- download the codebook and documentation, if available
2010 consolidated - analyze with brr.R
- load the r data file (.rda) created by the download script (above)
- set up the balanced repeated replication design outlined in this document
- perform a boatload of analysis examples (spoiler: there will be barplots)
2010 consolidated - analyze with tsl.R
- load the r data file (.rda) created by the download script (above)
- set up a taylor-series linearization survey design outlined in this document
- perform the same boatload of analysis examples
click here to view these three scripts
for more detail about the medical expenditure panel survey - household component (meps-hc), visit:
- the agency for healthcare research and quality's medical expenditure panel survey homepage
- the meps insurance component homepage (microdata not publicly available)
- a younger version of myself giving an introduction to online query tools with mepsnet at slide ten
notes:
if you don't know which analysis method to use, choose the replicate weights. replicate weighting requires slightly more ram, but taylor-series designs don't allow the computation of a confidence interval around quantile statistics (like the median).
this repository doesn't include a script to replicate the meps taylor-series linearization or replicate-weighted methods of variance calculation, because i wrote the original journal article with meps. it's legit.
if you just want a one-off statistic and can't bear to get your typing fingers dirty, try their fabulous table-building website mepsnet
confidential to sas, spss, stata, sudaan users: why are you still making calls with two tin-cans and a string now that we've created cell phones? time to transition to r. :D