before you read any more about the microdata, look at http://www.statcompiler.com/ this online table creator might give you access to every statistic that you need, and without the fuss, muss, or missing values of a person-level table. (bonus: click here to watch me describe dhs-style online table creation from a teleprompter.) why should you use statcompiler? because it's quick, easy, and has aggregated statistics for every country at your fingertips.
if that doesn't dissuade you from digging into an actual data set, one more point of order: you'll likely only be given access to a small number of countries. so when applying for access, it'd be smart to ask for whichever country you are interested in _and also_ for malawi 2004. that way, you will be able to muck around with my example syntax using the data tables that they were intended for. if you have already registered, no fear: you can request that malawi be added to your existing project. i tried requesting every data set. i failed. the data archivists do not grant access to more than a few countries unless you provide a legitimate research question that requires each dataset, and as i was only testing scripts, i received access to just a few countries. also note that some surveys require permission to be given by the implementing organization from the individual country - access to restricted countries is at the discretion of the implementing organization. while some surveys are restricted, these are generally public data: so long as you have a legitimate research question, you'll be granted access to the majority of the datasets without cost. this new github repository contains three scripts:
download and import.R
- pretend you're a real boy and log into dhsprogram.com's data download service
- detect which countries, years, and survey data sets you've been granted access to
- download, import, save each of those files onto your local computer
analysis examples.R
- load the 2004 malawi individual recodes file into working memory
- construct the complex sample survey object
- run a battery of stats, standard errors, subsets, etc. that re-create some rows of pdf page 324 of this publication
replication.R
- load the 2004 malawi individual recodes file into working memory
- re-create some of the old school-style strata described in this forum
- match a single row from pdf page 324 all the way across, deft and all.
click here to view these three scripts
for more detail about the demographic and health surveys (dhs), visit:
- the hand calculations for rates, risks, and other epidemiological measures
- same as above, but in printable pdf form
- the icfi landing page for dhs, perhaps you'll prefer their summary to mine
notes:
next to the main survey microdata set, you'll see some roman numerals ranging from one through six. this number indicates which version manual of the survey that particular dataset corresponds to. different versions have different questions, structures, microdata files: read the entire "general description" section (only about ten pages) of the manual before you even file your request for data access.
these microdata are complex, confusing, occasionally strangely-coded, and often difficult to reconcile with historical versions. (century month codes? wowza.) that's understandable, and the survey administrators deserve praise for keeping everything as coherent as they have after thirty years of six major questionnaire revisions of ninety countries of non-english-speaking respondents across this crazy planet of ours. if you claw through the documentation and cannot find an explanation, you'll want to engage the user forum. they are thoroughly responsive, impressively knowledgeable, and will help you get to the bottom of it - whatever `it` may be. before you ask a question here, or really anywhere in life, have a solid answer to whathaveyoutried. and for heavens' sakes,* prepare a reproducible example for them.
* my non-denominational way of saying heaven's sake.
confidential to sas, spss, stata, and sudaan users: i would shake your hand but you've yet to adopt the statistical equivalent of coughing into your sleeve. time to transition to r. :D