analyze the european social survey (ess) with r

with more than a decade of microdata aimed at gauging the political mood across european nations, the european social survey (ess) allows scientists like you to examine socio-demographic shifts among broad groups all the way down to pirate party (piratpartiet) voters in sweden.  with much of the same scope as the united states' general social survey (gss), this biennial survey gives demographers the clearest window into political opinion and behavior across the continent.

run out of the city university london and six other centres, this survey sets its sample universe at all persons aged 15 and over resident within private households, regardless of nationality, citizenship, language or legal status in the participating countries.  however, it's smart - dare i say very smart - to check the documentation report (here's round five) and confirm that the statistics you're coming up with actually generalize to the resident populations that you think that they do.

after enduring a few spammy e-mails from me, daniel oberski agreed to co-author this post and all of the code.  dan spent a handful of years in catalonia at upf's ess competence centre, so in addition to being able to disentangle and simplify this survey's tricky methodology for us, he's also provided a wicked starter script on structural equation modeling (sem) with complex sample survey data, using his very own lavaan.survey package.  so tell him thanks.  this new github repository contains four scripts:


download all microdata.R
  • after you register for an account, plop `your.email` at the top of this script and let 'er rip
  • automatically log in and determine which countries and rounds are currently available
  • for each round available, cycle through each file available, download, unzip, and import it.
  • save everything on the local disk as a convenient data.frame object

analysis examples.R

structural equation modeling examples.R

replication.R



click here to view these four scripts



for more detail about the european social survey (ess), visit:

notes:

some analysts blindly start with the integrated, multi-country data set for each round.  that file contains all countries stacked into a single data table and the appropriate within-country weights, so you'll get the correct point estimates (means, medians, percents).  unfortunately, the integrated file does not contain other sample design information such as clusters and strata, which influence standard errors and statistical tests.  so it's generally necessary to use the country-specific files and associated sample design data file (sddf) if you're itching to calculate a confidence interval, standard error, or any kind of honest statistical test.  a classical approximation to correct standard errors is to multiply the standard error you get without accounting for the survey design by the square root of the "design effect";  the norwegian social science data services have created this tutorial on how to calculate design effects for linear functions of the data such as means and totals, but if that's over your head or you want to estimate something other than means or totals, just use our scripts instead.

confidential to sas, spss, stata, and sudaan users: unless you're a paleontologist, forget those fossils and transition to r.  :D