analyze the demographic and health surveys (dhs) with r

professors of public health 101 probably cite the results of the demographic and health surveys (dhs) more than all other data sources combined.  funded by the united states agency for international development (usaid) and administered by the technically-savvy analysts at icf international, this collection of multinational surveys enters its third decade as the authoritative source of international development indicators.  want a sampler of what that all means?  load up the dhs homepage and watch the statistics fly by: 70% of beninese children younger than five sleep under an insecticide-treated bednet / more than a third of kyrgyz kids aged 6-59 months have anemia / only 35% of guinean households have a place to wash yer hands.  this is the front-and-center toolkit for professional epidemiologists who want to know who/what/when/where/why to target a public health intervention in any of these nations.

before you read any more about the microdata, look at http://www.statcompiler.com/  this online table creator might give you access to every statistic that you need, and without the fuss, muss, or missing values of a person-level table.  (bonus: click here to watch me describe dhs-style online table creation from a teleprompter.)  why should you use statcompiler?  because it's quick, easy, and has aggregated statistics for every country at your fingertips.

if that doesn't dissuade you from digging into an actual data set, one more point of order: you'll likely only be given access to a small number of countries.  so when applying for access, it'd be smart to ask for whichever country you are interested in _and also_ for malawi 2004.  that way, you will be able to muck around with my example syntax using the data tables that they were intended for.  if you have already registered, no fear: you can request that malawi be added to your existing project.  i tried requesting every data set.  i failed.  the data archivists do not grant access to more than a few countries unless you provide a legitimate research question that requires each dataset, and as i was only testing scripts, i received access to just a few countries.  also note that some surveys require permission to be given by the implementing organization from the individual country - access to restricted countries is at the discretion of the implementing organization.  while some surveys are restricted, these are generally public data:  so long as you have a legitimate research question, you'll be granted access to the majority of the datasets without cost.  this new github repository contains three scripts:


download and import.R

analysis examples.R

replication.R
  • load the 2004 malawi individual recodes file into working memory
  • re-create some of the old school-style strata described in this forum
  • match a single row from pdf page 324 all the way across, deft and all.



click here to view these three scripts



for more detail about the demographic and health surveys (dhs), visit:


notes:

next to the main survey microdata set, you'll see some roman numerals ranging from one through six.  this number indicates which version manual of the survey that particular dataset corresponds to.  different versions have different questions, structures, microdata files: read the entire "general description" section (only about ten pages) of the manual before you even file your request for data access.

these microdata are complex, confusing, occasionally strangely-coded, and often difficult to reconcile with historical versions.  (century month codes? wowza.)  that's understandable, and the survey administrators deserve praise for keeping everything as coherent as they have after thirty years of six major questionnaire revisions of ninety countries of non-english-speaking respondents across this crazy planet of ours.  if you claw through the documentation and cannot find an explanation, you'll want to engage the user forum.  they are thoroughly responsive, impressively knowledgeable, and will help you get to the bottom of it - whatever `it` may be.  before you ask a question here, or really anywhere in life, have a solid answer to whathaveyoutried.  and for heavens' sakes,* prepare a reproducible example for them.

* my non-denominational way of saying heaven's sake.


confidential to sas, spss, stata, and sudaan users: i would shake your hand but you've yet to adopt the statistical equivalent of coughing into your sleeve.  time to transition to r.  :D