run out of the city university london and six other centres, this survey sets its sample universe at all persons aged 15 and over resident within private households, regardless of nationality, citizenship, language or legal status in the participating countries. however, it's smart - dare i say very smart - to check the documentation report (here's round five) and confirm that the statistics you're coming up with actually generalize to the resident populations that you think that they do.
after enduring a few spammy e-mails from me, daniel oberski agreed to co-author this post and all of the code. dan spent a handful of years in catalonia at upf's ess competence centre, so in addition to being able to disentangle and simplify this survey's tricky methodology for us, he's also provided a wicked starter script on structural equation modeling (sem) with complex sample survey data, using his very own lavaan.survey package. so tell him thanks. this new github repository contains four scripts:
download all microdata.R
- after you register for an account, plop `your.email` at the top of this script and let 'er rip
- automatically log in and determine which countries and rounds are currently available
- for each round available, cycle through each file available, download, unzip, and import it.
- save everything on the local disk as a convenient data.frame object
analysis examples.R
- load a country-specific data set, merge on the survey design data file, remove unnecessary columns
- construct a survey design object producing taylor series linearized standard errors
- use that survey design object to run examples of any, every summary statistical analysis you'll need
structural equation modeling examples.R
- load, merge, construct a german survey design object producing taylor series linearized standard errors
- and also a latent variable model (without and then with survey-adjustment) in order to imitate the confirmatory factor analysis model in this multidimensionality of welfare attitudes paper
- load, merge, stack, and then construct a german plus spanish tsl design and also a latent variable model (without and then with survey-adjustment) in order to imitate the metric cross-country invariance test in this schwarz human values measurement paper
- use the same german plus spanish stacked design to construct an unadjusted and then a survey-adjusted test for the cross-country equality of a relationship between two latent variables in a structural equation model from this support for immigration paper
replication.R
- load the all-country round five set to match some rudimentary nesstar output
- load a country-specific data set, merge on the survey design data file, construct a tsl design in secret
- by hand, start re-constructing some country-specific statistics in the official ess survey design analysis document
- in one fell swoop, create the design effect once again, but this time using the survey package
click here to view these four scripts
for more detail about the european social survey (ess), visit:
- the european social survey home page
- the online query tool (nesstar) page
- the online usage tutorial page
notes:
some analysts blindly start with the integrated, multi-country data set for each round. that file contains all countries stacked into a single data table and the appropriate within-country weights, so you'll get the correct point estimates (means, medians, percents). unfortunately, the integrated file does not contain other sample design information such as clusters and strata, which influence standard errors and statistical tests. so it's generally necessary to use the country-specific files and associated sample design data file (sddf) if you're itching to calculate a confidence interval, standard error, or any kind of honest statistical test. a classical approximation to correct standard errors is to multiply the standard error you get without accounting for the survey design by the square root of the "design effect"; the norwegian social science data services have created this tutorial on how to calculate design effects for linear functions of the data such as means and totals, but if that's over your head or you want to estimate something other than means or totals, just use our scripts instead.
confidential to sas, spss, stata, and sudaan users: unless you're a paleontologist, forget those fossils and transition to r. :D