archived asdfree: analyze the panel study of income dynamics (psid) with r

the panel study of income dynamics (psid) is a one-trick pony. better than anything else out there, this survey allows you to answer the question, "where are they now?" after tracking the same nationally-representative cohort of americans (and their children, and their children's children) for more than forty years, this microdata allows researchers to investigate how individual americans have changed over time. even though this is the towering achievement among longitudinal studies, many papers foolishly use psid to estimate point-in-time demography statistics. here's an example question that should not be answered with the psid (instead try the current population survey): "how many americans were below the poverty line in 2010?" here's an example question that can only be answered with the panel study of income dynamics: "among americans who were living below the poverty line in 1970, what percent are still below the poverty line in 2010?" do you see the difference? if it walks like a panel and quacks like a panel, it probably is a panel.

the folks at penn state helped out their friends in the big ten by writing the best introduction to what's available in the psid. and before you get any microdata dust all over your typing fingers, read pdf pages thirteen to seventeen of the latest official user guide. the survey administrators at the university of michigan are serious about preserving, preventing attrition, re-contacting, doing just about everything they can to retain the original respondents from 1968. think of them like a seal with a bowling pin on its nose: keeping this data generalizable for the united states population at every interview point is one hell of a balancing act. but judging from these cross-sectional benchmarks, they've succeeded. if you ever find yourself in ann arbor, throw a mackerel at the institute for social research building for a job well done.

before you start digging through all of the data sets i've automatically imported for you, run your literature review with this search engine. the michigan website will also invite you to play with these tutorials, but i'd steer clear - they're outdated, excel-based, impossible-to-replicate. instead, check out my replication of the techniques described in this paper and - when you're ready to grind your data axe - figure out which columns you need with this variable search tool. this new github repository contains three scripts:

download all microdata.R

after you've registered for a (free) account, log in to the university of michigan's server
loop through every year and every file, download, then unzip it. unzip? sexy.
import each data file directly into an r data.frame object with a custom-built function

longitudinal analysis examples.R

open up the data sets specified by michigan's tutorial number three
limit each data table to only the columns necessary for the analysis at-hand
merge, recode, subset, construct a data file for the current desired analysis, not rocket science.
create a complex sample survey object, using a taylor-series linearization design
reproduce the table shown in the last tab of michigan's tutorial number three answer sheet. well actually don't reproduce it. because they've changed everything, so you'll have to trust me.
conduct some other analysis examples just for laughs

replicate umich.R

open up the 2009 cross-year individual file (note that's not the latest available)
match this umich researcher-provided output, but only because i used the 2009 cross-year individual file. here's my output. you won't match precisely if you use the most current cross-year individual file available. but that's cool.

click here to view these three scripts

for more detail about the panel study of income dynamics (psid), visit:

michigan's psid homepage
michigan's psid faq, a great beach read
the psid wikipedia page
the cornell page of psid-spinoffs, sister-surveys, surveys inspired by psid, etc. etc.

notes:

make no mistake, this is a terribly complicated data set both for michigan to construct and for you to analyze. respondents and responses from many years ago must be connected with the same people in present-day. if it's your first time working with the psid, you'd be smart to conduct a thorough literature review, contact the authors of related research, ask other data users why they did what they did, and - before any publication of consequence - run the analysis plan by their helpdesk. glhf.

confidential to sas, spss, stata, and sudaan users: enough with the trampolines, go for a skydive. time to transition to r. :D