the downloadable american community survey ships as two distinct household-level and person-level comma-separated value (.csv) files. merging the two just rectangulates the data, since each person in the person-file has exactly one matching record in the household-file. for analyses of small, smaller, and microscopic geographic areas, choose one-, three-, or five-year pooled files. use as few pooled years as you can, unless you like sentences that start with, "over the period of 2006 - 2010, the average american ... [insert yer findings here]."
the acs weights generalize to the whole united states population including individuals living in group quarters, but non-residential respondents get an abridged questionnaire, so most (not all) analysts exclude records with a relp variable of 16 or 17 right off the bat. this new github repository contains three scripts:
download all microdata.R
- initiate a monetdblite database in the current working directory
- download, unzip, and import each file for every year and size specified by the user
- create and save household- and merged/person-level replicate weight complex sample designs
2011 single-year - analysis examples.R
- run the well-documented block of code to re-initiate the monetdb server
- load the r data file (.rda) containing the replicate weight designs for the single-year 2011 file
- perform the standard repertoire of analysis examples, only this time using monetdblite
- run the well-documented block of code to re-initiate the monetdb server
- load the r data file (.rda) containing the replicate weight designs for the single-year 2011 file
- match every nationwide statistic on the census bureau's estimates page
click here to view these three scripts
for more detail about the american community survey (acs), visit:
- the us census bureau's acs homepage
- the american factfinder homepage
- the american community survey's wikipedia page
- the census bureau's acs frequently asked questions page
notes:
if you're just looking for a couple data points, you ought to give the census bureau's american factfinder a whirl. it's a table creator (click here to watch me blab about table creators), so it's easy-to-use but inflexible. here's a li'l tip: if you run a statistic using american factfinder and then the same statistic using these scripts, they will be close but won't match exactly. it's not a mistake, and both are methodologically correct.
every now and then, grumpy lawmakers threaten to defund the acs because, well, it's expensive. use it or lose it.
unless specified by the question's phrasing, most acs variables should be treated as point-in-time, as opposed to either annualized or ever during the year. this distinction is particularly important for health insurance coverage. think about these three statistics --
- the number of americans who won't have health insurance at least once during this year
- the number of americans without health insurance right now
- the number of americans who won't ever have health insurance during this year
although the automated ftp download program for this data set only retrieves files back as far as 2005, a nationwide version of the american community survey has been conducted since 2000. i skipped those years for two reasons --
- the sample size (the true strength of the modern acs) wasn't very large on the older files (the 2004 and 2011 single-year person-level files are 54mb and 580mb, respectively). there's no reason to import these files into a monet database.
- the replicate weighted design wasn't implemented until 2005, so the creation of a complex sample survey object isn't possible. if you need to calculate standard errors for earlier years, you'll have to rely on a pita generalized variance formula instead. evidence: the published estimates prior to 2005 don't include error columns.
confidential to sas, spss, stata, sudaan users: the decennial census is enshrined in our constitution. your statistical software isn't. time to transition to r. :D