if you were so brazen as to open up the microdata and run a simple weighted median, you'd get the wrong answer. the five to six thousand respondents actually gobble up twenty-five to thirty thousand records in the final public use files. why oh why? well, those tables contain not one, not two, but five records for each peu. wherever missing, these data are multiply-imputed, meaning answers to the same question for the same household might vary across implicates. each analysis must account for all that, lest your confidence intervals be too tight. to calculate the correct statistics, you'll need to break the single file into five, necessarily complicating your life. this can be accomplished with the `meanit` sas macro buried in the 2004 scf codebook (search for `meanit` - you'll need the sas iml add-on). or you might blow the dust off this website referred to in the 2010 codebook as the home of an alternative multiple imputation technique, but all i found were broken links. perhaps it's time for plan c, and by c, i mean free. read the imputation section of the 2010 codebook (search for `imputation`), then give these scripts a whirl. they've got that new r smell.
the lion's share of the respondents in the survey of consumer finances get drawn from a pretty standard sample of american dwellings - no nursing homes, no active-duty military. then there's this secondary sample of richer households to even out the statistical noise at the higher end of the income and assets spectrum. you can read more if you like, but at the end of the day the weights just generalize to civilian, non-institutional american households. one last thing before you start your engine: read everything you always wanted to know about the scf. my favorite part of that title is the word always. this new github repository contains three scripts:
download all microdata.R
- initiate a function to download and import any survey of consumer finances zipped stata file (.dta)
- loop through each year specified by the user (starting at the 1989 re-vamp) to download the main, extract, and replicate weight files, then import each into r
- break the main file into five implicates (each containing one record per peu) and merge the appropriate extract data onto each implicate
- save the five implicates and replicate weights to an r data file (.rda) for rapid future loading
analysis examples.R
- prepare two survey of consumer finances-flavored multiply-imputed survey analysis functions
- load the r data files (.rda) necessary to create a multiply-imputed, replicate-weighted survey design
- demonstrate how to access the properties of a multiply-imputed survey design object
- cook up some descriptive statistics and export examples, calculated with scf-centric variance quirks
- run a quick t-test and regression, but only because you asked nicely
replicate FRB SAS output.R
- reproduce each and every statistic provided by the friendly folks at the federal reserve
- create a multiply-imputed, replicate-weighted survey design object
- re-reproduce (and yes, i said/meant what i meant/said) each of those statistics, now using the multiply-imputed survey design object to highlight the statistically-theoretically-irrelevant differences
click here to view these three scripts
for more detail about the survey of consumer finances (scf), visit:
- the federal reserve board of governors' survey of consumer finances homepage
- the 2013 scf chartbook, to browse what's possible. (spoiler alert: everything.)
- the survey of consumer finances wikipedia entry
- the official frequently asked questions
notes:
nationally-representative statistics on the financial health, wealth, and assets of american households might not be monopolized by the survey of consumer finances, but there isn't much competition aside from the assets topical module of the survey of income and program participation (sipp). on one hand, the scf interview questions contain more detail than sipp. on the other hand, scf's smaller sample precludes analyses of acute subpopulations. and for any three-handed martians in the audience, there's also a few biases between these two data sources that you ought to consider.
the survey methodologists at the federal reserve take their job seriously, as evidenced by this working paper trail. write a thank-you in their guestbook. one can never receive enough of those.
confidential to sas, spss, stata, and sudaan users: the eighties called. they want their statistical languages back. time to transition to r. :D