analyze the general social survey (gss) with r

the general social survey (gss) has served as america's mood ring since 1972.  data-driven social scientists can compare political beliefs by demography, look at attitude trends, make emile durkheim and max weber (pronounced durk-veber) proud.  in contrast to high-frequency tracking polls that capture newspaper headlines, the gss has sustained a (now biennial) set of questions over four decades.

most analysts start with the cumulative, cross-sectional file (interviews conducted 1972 - present).  given the sprawling nature of that cumulative data set, you'd better read the documentation and understand the eccentricities of each of the variables you want to use before you send anything off for peer-review.  for example, many of the five thousand variables include missing values due to split-sample questions.  not to say it's bad data - it's damn useful.  you try administering a survey that keeps relevant for almost half a century.  otherwise, leave it to the national opinion research center (norc) at the university of chicago.  ..and the national science foundation to foot the bill.

on the main gss page, norc offers two online query tools - nesstar and sda - meaning you can point-and-click your way to some basic statistics.  the nesstar system smells like a fixer-upper, but berkeley's sda (survey documentation and analysis) site offers a great way to confirm that you're broadly analyzing the data correctly before you start writing r code to laser-focus on your research question.

the general social survey only gets asked of noninstitutional adults, because everyone already knows what kids' political beliefs are: more candy, no homework.  this new github repository contains two scripts:

cumulative cross-sectional - analysis examples.R
  • download, import, save the latest cross-sectional table onto your local computer
  • load it back up (so the downloading and importing can be skipped next time)
  • limit the table to the variables needed for an example analysis
  • create a weight and primary sampling unit variable based on berkeley's specifications
  • construct the complex sample survey object
  • run a treasure trove of political analyses

replicate berkeley sda.R
  • download, import, save the 1972-2010 (no typo there - this is not the more current 1972-2012) cross-sectional table onto your local computer
  • load it back up (so the downloading and importing can be skipped next time)
  • limit the table to the variables needed for an example analysis
  • create a weight and primary sampling unit variable based on berkeley's specifications
  • construct the complex sample survey object
  • print statistics and standard errors matching the target replication table
  • loop through each confidence interval on that table as well, using shiny new software born from this thread


click here to view these two scripts



for more detail about the general social survey (gss), visit:

notes:

berkeley's sda website currently hosts release #1 of the 1972-2012 cross-sectional gss file, which is why the replication code above won't match their posted quick tables exactly.  i kept bugging them until they ran the 1972-2010 release #2 data set through their same code, available in my github repository.  those numbers match.  squeaky wheel, baby.


confidential to sas, spss, stata, and sudaan users: why are you still dialing up to the internet after we've discovered fiber optics?  time to transition to r.  :D