archived asdfree: analyze the youth risk behavior surveillance system (yrbss) with r

the youth risk behavior surveillance system is the high school edition of the behavioral risk factor surveillance system (brfss), a scientific study of good kids who do bad things. questions are mostly about sex, drugs, rock and roll, and populate a veritable bouquet of cdc reports, fact sheets, and journal articles. want to know how many american teenagers rode with a drunk driver or carried a gun to school or tried ecstacy for the fortieth time? reading over the questionnaire makes me think rebel without a cause on steroids (steroid use can be found at question 56). for a more professional introduction, check out the cdc's yrbss in brief page. or keep reading.

most states (and even two dozen urban areas) conduct their own yrbsses (that's plural for yrbss), but state participation is kinda all over the place. if you need state- or locale-specific data, you can send in a data request form and maybe modify my syntax (perhaps starting the importation with read.dta or read.spss). but if nationwide estimates are all you're after, just analyze the cdc's publicly-available files with the syntax described below. the yrbss weights generalize to all public and private school students in grades 9-12 in the fifty united states plus dc. this new github repository contains three scripts:

download all microdata.R

download decades of worth of data with no huss and also zero fuss.
flip a few strings around in the cdc's sas importation scripts so they're sascii-compliant
save everything to your local drive for easy loading later.
look at this page and thank your lucky stars that everything has been automated for you

2011 single-year - analysis examples.R

load the latest r data file (.rda) created by the download script (above)
set up a taylor-series linearization survey design outlined in this document
perform enough analysis examples to quench even the most insatiable of statistical appetites

replicate cdc software for analysis of yrbs data publication.R

load an older r data file (.rda) and construct a few different complex sample survey designs
extract statistics and standard errors that precisely match the sudaan output in this cdc publication
extract confidence intervals that precisely match the stata rows from the same document

click here to view these three scripts

for more detail about the youth risk behavior surveillance system, visit:

the frequently asked questions page. it's loaded with useful descriptions. read it plz n thx.
the cdc's yrbss online table creator, for those of us not interested in coding
the yrbss wikipedia page. who doesn't love wikipedia?

notes:

depending on your propensity for detecting statistical software comparisons, you may or may not have noticed that the centers for disease control and prevention document replicated by my third yrbs script is - some may say - the rosetta stone of complex sample survey statistical analysis. in fact, dr. thomas lumley (author of the r survey package) wrote an entire extension of an older (2007) version of that document (currently 2009) to prove that r's survey package is every bit as rough and tumble as any other statistical language out there (it is). dr. lumley wrote much more detail about how r stacks up against those other languages than you'll find in my silly little replication script. if you're a comparison addict like me, read it like you svymean it.

confidential to sas, spss, stata, and sudaan users: why are you huddled around that space heater for warmth when we can just snuggle instead. time to transition to r. ;)