archived asdfree: analyze the american time use survey (atus) with r

the american time use survey collects information about how we spend our time. it's a pretty simple setup: sampled individuals write down everything they do for a single twenty-four hour period, in ten minute intervals. those diaries are averaged across all respondents, and we end up with results like this genius nyt visualization by amanda cox. most economists use atus to study uncompensated work (chores and childcare), but you can use it for all sorts of crazy stuff like learning that even in the dead of night, one-twentieth of us are awake. or that we average 54 seconds of sex every day. i cannot think of anything i would rather be doing than analyzing this survey dataset.

before you start crosstabbing and svymeaning, it'd be smart to spend ten minutes reading exhibit 6.2 of the user's guide so you understand how all the data tables (..that the download automation script imports for you..) work together. simpler analyses might only require the respondent and activity summary files, but at the point you want to determine who was with the respondent at soccer practice, you had better merge like a champ. of course before any of that, you'll need to decide which activity codes you actually want to capture. time spent calf-roping or cattle-riding? code 130121. commuting to the vet? code 180807. pumping gas? 070102. tired of me guessing for you? check out the activity coding lexicons. this new github repository contains four scripts:

download all microdata.R

decipher the bls ftp site to download each year-specific (or multi-year) table
unzip whatcha need, then import the microdata in a jiffy with read.csv
save each file as an r data file (.rda) into neatly-sorted atus directories

2012 single-year - analysis examples.R

load the activity, respondent, roster, and replicate weights files into working memory
aggregate activity events to the respondent by top tier-level, then reshape it into one-record-per-person
convert minutes to hours, merge all files into one data.frame, recode a smidgen
create a replicate-weighted survey design object, with the bls-specified fay's adjustment
perform one fine slew of analysis examples, including quite a few of these bls statistics

replicate bls standard error - 2007.R

load the activity, activity-summary, respondent, and replicate weights files into working memory
subset the activity summary table to only the television-related events table
aggregate the activity table to the respondent-level as an example of an alternative to the previous method
merge the minutes-spent-watching-television table with the respondent and replicate weights tables
create a replicate-weighted survey design object, with the bls-specified fay's adjustment
precisely replicate the bureau of labor statistics' standard error of hours per day spent watching the teevee

replicate bls example one - 2006.R

load the activity and respondent data tables into working memory
subset the activity table to only care of household children events (as prescribed by the 2006 lexicon)
aggregate that activity table to the respondent-level, then merge those minutes to the respondent data.
just run a weighted.mean that skips any variance calculation but hits the bls example one on the nose

click here to view these four scripts

for more detail about the american time use survey, visit:

the questionnaire, transmogrified for public dissemination
summary charts and tables provided by the bureau of labor statistics

notes:

just like the medical expenditure panel survey draws its sample from the national health interview survey, the american time use survey is a subsample of current population survey (cps) respondents. in fact, the microdata include a handy atus-cps mergefile. unlike the cps, it's not a household survey - only one individual at least 15 years of age gets selected from each sampled household. another important difference from the cps: the atus should not be used to draw state-level conclusions. atus generalizes to the united states non-institutional, non-active duty military population aged fifteen or more, but don't zoom in on geographies smaller than census regions.

when you see the svytotal function used in the analysis example script, you'll notice overall sums around ninety billion. that's because the survey weights in this data set actually generalize to person-days. divide by 365, and you'll almost precisely hit the `sixteen and older` row of the `2010 column` of table 1 on this census bureau age by sex table. so at ease, everybody. at ease.

confidential to sas, spss, stata, and sudaan users: if you want to impress people at parties with an antiquated skill, learn morse code. at least it's rhythmic. time to transition to r. :D