6 Data documentation

Simply making data available is not sufficient to ensure that it is re-usable (see e.g., Kidwell et al., 2016). Providing documentation (often referred to as ‘metadata’, ‘codebooks’, or ‘data dictionaries’) alongside data files will ensure that other researchers, and future you, can understand what values the data files contain and how the values correspond to findings presented in the research report. This documentation should describe the variables in each data file in both human- and machine-readable formats (e.g., csv, rather than docx or pdf).4 Ideally, codebooks are organized in such a way that each line represents one variable and each information relative to a variable represents a column. Extraneous information, that cannot be read (e.g., colors, formatting), should be be included in the codebook as well. For an example of a codebook based on survey data, see this example by Kai Horstmann (https://osf.io/e4tqy/); for an example based on experimental data see the codebook in our example OSF project (https://osf.io/up4xq/).

Codebooks should include the following information for each variable: the name, description of the variable, units of measurement, coding of values (e.g., “1 = Female”,”2 = Male”), possible options or range in which the data points can fall (e.g., “1 = not at all to 7 = Very much”), value(s) used for missing values, and information on whether and how the variable was derived from other variables in the dataset (e.g., “bmi was derived from body_weight m and body_height l as \(BMI = \frac{m}{l^{2}}\).”). Other relevant information in a codebook entry can include the source of a measure, instructions for a questionnaire item, information about translation, or scale that an item belongs to.5


  1. Codebooks can be generated from the data set metadata in popular statistical software, including SPSS (http://libguides.library.kent.edu/SPSS/Codebooks), Stata (http://www.stata.com/manuals13/dcodebook.pdf), or R (http://www.martin-elff.net/knitr/memisc/codebook.html; https://cran.r-project.org/web/packages/codebook/index.html), or with data publishing tools (e.g., http://www.nesstar.com/software/publisher.html). The author of the codebook (Arslan, 2018) package for R has also created a web app for creating codebooks for SPSS, Stata or RDS files: https://rubenarslan.ocpu.io/codebook/www/

  2. For those interested in metadata and codebooks, the Digital Curation Centre provides a helpful overview (see http://www.dcc.ac.uk/resources/metadata-standards). Common metadata standards are the basic and general-purpose Dublin Core and the more social-science-focused Data Documentation Initiative (DDI) that was originally developed for survey data (Leeper, 2014).