analyze the national survey on drug use and health (nsduh) with r

the national survey on drug use and health (nsduh) monitors illicit drug, alcohol, and tobacco use with more detail than any other survey out there.  if you wanna know the average age at first chewing tobacco dip, the prevalence of needle-sharing, the family structure of households with someone abusing pain relievers, even the health insurance coverage of peyote users, you are in the right place.  the substance abuse and mental health services administration (samhsa) contracts with the north carolinians over at research triangle institute to run the survey, but the university of michigan's substance abuse and mental health data archive (samhda) holds the keys to this data castle.

nsduh in its current form only goes back about a decade, when samhsa re-designed the methodology and started paying respondents thirty bucks a pop.  before that, look for its predecessor - the national household survey on drug abuse (nhsda) - with public use files available back to 1979 (included in these scripts).  be sure to read those changes in methodology carefully before you start trying to trend smokers' virginia slims brand loyalty back to 1999.

although (to my knowledge) only the national health interview survey contains r syntax examples in its documentation, the friendly folks at samhsa have shown promise.  since their published data tables were run on a restricted-access data set, i requested that they run the same sudaan analysis code on the public use files to confirm that this new r syntax does what it should.  they delivered, i matched, pats on the back all around.

if you need a one-off data point, samhda is overflowing with options to analyze the data online.  you even might find some restricted statistics that won't appear in the public use files.  still, that's no substitute for getting your hands dirty.  when you tire of menu-driven online query tools and you're ready to bark with the big data dogs, give these puppies a whirl.  the national survey on drug use and health targets the civilian, noninstitutionalized population of the united states aged twelve and older.  this new github repository contains three scripts:


download all microdata.R
  • authenticate the university of michigan's "i agree with these terms" page
  • download, import, save each available year of data (with documentation) back to 1979
  • convert each pre-packaged stata do-file (.do) into r, run the damn thing, get NAs where they belong

2010 single-year - analysis examples.R
  • load a single year of data
  • limit the table to the variables needed for an example analysis
  • construct the complex sample survey object
  • run enough example analyses to make a kitchen sink jealous

replicate samhsa puf.R



click here to view these three scripts



for more detail about the national survey on drug use and health, visit:


notes:

the 'download all microdata' program intentionally breaks unless you complete the clearly-defined, one-step instruction to authenticate that you have read and agree with the download terms.  the script will download the entire public use file archive, but only after this step has been completed.  if you contact me for help without reading those instructions, i reserve the right to tease you mercilessly.  also: thanks to the great hadley wickham for figuring out how to authenticate in the first place.

confidential to sas, spss, stata, and sudaan users: did you know that you don't have to stop reading just because you've run out of candlewax?  maybe it's time to switch to r.  :D