6 Data documentation

Simply making data available is not sufficient to ensure that it is re-usable (see e.g., Kidwell et al., 2016). Providing documentation (often referred to as ‘metadata’, ‘codebooks’, or ‘data dictionaries’) alongside data files will ensure that other researchers, and future you, can understand what values the data files contain and how the values correspond to findings presented in the research report. This documentation should describe the variables in each data file in both human- and machine-readable formats (e.g., csv, rather than docx or pdf).4 Ideally, codebooks are organized in such a way that each line represents one variable and each information relative to a variable represents a column. Extraneous information, that cannot be read (e.g., colors, formatting), should be be included in the codebook as well. For an example of a codebook based on survey data, see this example by Kai Horstmann (https://osf.io/e4tqy/); for an example based on experimental data see the codebook in our example OSF project (https://osf.io/up4xq/).

Codebooks should include the following information for each variable: the name, description of the variable, units of measurement, coding of values (e.g., “1 = Female”,”2 = Male”), possible options or range in which the data points can fall (e.g., “1 = not at all to 7 = Very much”), value(s) used for missing values, and information on whether and how the variable was derived from other variables in the dataset (e.g., “bmi was derived from body_weight m and body_height l as \(BMI = \frac{m}{l^{2}}\).”). Other relevant information in a codebook entry can include the source of a measure, instructions for a questionnaire item, information about translation, or scale that an item belongs to.5


The contents of this website is licensed under . When refering to this website please cite
Klein, O., Hardwicke, T. E., Aust, F., Breuer, J., Danielsson, H., Hofelich Mohr, A., … Frank, M. C. (2018). A Practical Guide for Transparency in Psychological Science. Collabra: Psychology, 4(1), 20. https://doi.org/10.1525/collabra.158