nsduh in its current form only goes back about a decade, when samhsa re-designed the methodology and started paying respondents thirty bucks a pop. before that, look for its predecessor - the national household survey on drug abuse (nhsda) - with public use files available back to 1979 (included in these scripts). be sure to read those changes in methodology carefully before you start trying to trend smokers' virginia slims brand loyalty back to 1999.
although (to my knowledge) only the national health interview survey contains r syntax examples in its documentation, the friendly folks at samhsa have shown promise. since their published data tables were run on a restricted-access data set, i requested that they run the same sudaan analysis code on the public use files to confirm that this new r syntax does what it should. they delivered, i matched, pats on the back all around.
if you need a one-off data point, samhda is overflowing with options to analyze the data online. you even might find some restricted statistics that won't appear in the public use files. still, that's no substitute for getting your hands dirty. when you tire of menu-driven online query tools and you're ready to bark with the big data dogs, give these puppies a whirl. the national survey on drug use and health targets the civilian, noninstitutionalized population of the united states aged twelve and older. this new github repository contains three scripts:
download all microdata.R
- authenticate the university of michigan's "i agree with these terms" page
- download, import, save each available year of data (with documentation) back to 1979
- convert each pre-packaged stata do-file (.do) into r, run the damn thing, get NAs where they belong
2010 single-year - analysis examples.R
- load a single year of data
- limit the table to the variables needed for an example analysis
- construct the complex sample survey object
- run enough example analyses to make a kitchen sink jealous
replicate samhsa puf.R
- load a single year of data
- limit the table to the variables needed for an example analysis
- construct the complex sample survey object
- print statistics and standard errors matching the target replication table
click here to view these three scripts
for more detail about the national survey on drug use and health, visit:
- the substance abuse and mental health services administration's nsduh homepage
- research triangle institute's nsduh homepage
- the university of michigan's nsduh homepage
notes:
the 'download all microdata' program intentionally breaks unless you complete the clearly-defined, one-step instruction to authenticate that you have read and agree with the download terms. the script will download the entire public use file archive, but only after this step has been completed. if you contact me for help without reading those instructions, i reserve the right to tease you mercilessly. also: thanks to the great hadley wickham for figuring out how to authenticate in the first place.
confidential to sas, spss, stata, and sudaan users: did you know that you don't have to stop reading just because you've run out of candlewax? maybe it's time to switch to r. :D