5 Folder structure

Typically a “project” on the OSF, or on any other repository, will be associated with one or more studies as reported in a paper. The folder structure will naturally depend on what you wish to share. There is no commonly accepted standard. The folders can, e.g., be organized by study, by file type (analysis scripts, data, materials, paper), or data type (raw vs. processed). However, different structures may be justified as a function of the nature of the study. Some archives may also require a specific structure. One example is the BIDS format for openneuro/openfmri (https://doi.org/10.1038/sdata.2016.44). The structure we suggest here is inspired by The DRESS Protocol of The Tier project (http://www.projecttier.org/tier-protocol/dress-protocol/). See Long (2009) for other examples of folder and file structures.

5.1 Root folder

The root folder contains a general readme file providing general information on the studies and on the folder structure (see below):

  • Short description of the study

  • A description of the folder structure

  • Time and location of data collection for the studies reported

  • Software required to open or run any of the shared files

  • Under which license(s) the files are shared (see section on licenses in the main paper)

  • Information on the publication status of the studies

  • Contact information for the authors

  • A list of all the shared files

5.2 Study Protocol or Preregistration

The repository should contain a description of the study protocol. This can coincide with the preregistration document or the method section of the research report. In the example project (https://osf.io/xf6ug/), which is a registered report, we provide the full document as accepted in-principle at stage 1. If the study protocol or the preregistration consist of multiple files (e.g., analysis scripts or protocols of power analyses) these documents can placed in a Study protocol-folder together with the description of the study protocol.

5.3 Materials

If possible, this folder includes all the material presented to the participants (or as-close-as-possible reproductions thereof) as well as, e.g., the software used to present the stimuli and user documentation. The source of this material should be documented, and any licensing restrictions should be noted in a the readme file. In the example project (https://osf.io/xf6ug/), we provide the experimental software used for stimulus presentation and response collection, and the stimuli that we are legally able to share. License information on reuse is included in the README file.

5.4 Raw data

This folder includes the original data, in the “rawest” possible form. These could, for example, be individual e-prime files, databases extracted from online survey software, or scans of questionnaires. If this form is not directly exploitable, a processed version (e.g., in CSV format) that can be imported by any user should be included, in an appropriately labeled folder. For example, raw questionnaire responses as encoded could be made available in this format. Ideally, both versions of the data (i.e., before and after being made “importable”) are included. In the example project (https://osf.io/xf6ug/), we provide raw text files saved by the experimental software for each participant. A file containing a description of each dataset should also be included (see section on data documentation).

5.5 Processed data

This folder contains the cleaned and processed data files used to generate the results reported in the paper as well as descriptions of the datasets. If data processing is extensive and complex, this can be the most efficient way to enable data re-use by other researchers. Nevertheless, in order to ensure full analytic reproducibility, it is always important to provide raw data in addition to processed data if there are no negative constraints (e.g., identifiable information embedded in the raw data). In the example project (http://doig.org/10.17605/OSF.IO/XF6UG), we provide the processed datasets in the native R Data format. A file containing a description of each dataset should also be included (see section on data documentation).

5.6 Analysis

This folder includes detailed descriptions of analysis procedures or scripts used for transforming the raw data into processed data, for running the analyses, and for creating figures and tables. Instructions for reproducing all analyses in the report can be included in the README or in a separate instruction document inthis folder. If parts of the analyses are computationally expensive, this folder can also contain intermediate (“cached”) results if this facilitates fast (partial) reproduction of the original results. In the example project (http://doi.org/10.17605/OSF.IO/XF6UG), we provide the R Markdown file used to create the research report (including the appendix), and cached results in the native R Data format. For convenience we also provideR-script versions of the R Markdown files, which can be executed in R without rendering the manuscript. The folder also contains a subfolder “Analysis functions”, which contains custom R functions that are loaded and used in the R Markdown files.

5.7 Research Report

A write-up of the results, in the form of a preprint/postprint, or the published paper is included here. In our example project, the data and analysis folder contains an R Markdown document that includes the text of the paper interleaved with the R code to process the raw data and perform all reported analyses. When rendered, it generates the research report (in APA manuscript style) using a dedicated package, papaja (https://github.com/crsh/papaja; Aust & Barth, 2017). The advantage of this approach is that all values presented in the research report can be directly traced back to their origin, creating a fully reproducible analysis pipeline, and helping to avoid copy and paste errors.