pnad has been on the shelves for forty-four years, investigating all the good stuff: migration, fertility, marriage, health, and food security. microdata are available back to 2001 and starting in 2004, pnad started including the rural north (the amazon state, among others). the sample design is self-weighted with three selection stages: primary sampling units are municipalities stratified by population size, selected systematically with pps. secondary and tertiary sampling units are enumeration areas, then households. the weights also need to be post-stratified to the 2010 official brazilian census. all in all, a pretty straightforward methodology. let the code do all the setup for you so you can worry about the more exciting questions and then clock out for the day. by the way, in brazil, do they call happy hour cappy hour? this new github repository contains three scripts:
download all microdata.R
- download the fixed-width file containing household and person records
- merge 'em together into a rectangular file at the person-level
- create an adjusted weight and a new variable - one - in the data table
single-year - analysis examples.R
- connect to the sql database created by the 'download all microdata' program
- create the complex sample survey object, post-stratifying using a custom-built function
- perform a boatload of analysis examples
replicate IBGE estimates - 2011.R
- connect to the sql database created by the 'download all microdata' program
- create the complex sample survey object, post-stratifying using a custom-built function
- precisely match the sas-sudaan output provided by analysts at ibge (as seen in the script directory)
click here to view these three scripts
for more detail about the annual pesquisa naciona por amostra de domicilios, visit:
- the official brazilian statistics agency's homepage (english) (portuguese) (even espanhol)
- the ibge ftp site with all your favorite microdatos
- the translated-into-english pnad wikipedia entry
notes:
most years, the pnad includes a limited-time-only supplemental questionnaire (complete list here), so you'd be wise to give that page a glance and see if there's a special questionnaire asked (just once!) of the entire microdata sample that's of interest to you and your research. supplemental questionnaires: the shamrock shake of the survey world.
these scripts perform all manipulations inside monetdblite and rely on database-backed survey objects. the post-stratification function in the current implementation of the r survey package does not work on database-backed survey design objects. therefore, with little fanfare, i've written one that does. you'll find it getting pulled in at the source_url() line. exciting.
confidential to sas, spss, stata, sudaan users: yes, and bicycles with training wheels might be easier to ride, but that doesn't make them a long-term solution. time to transition to r. :D