7 Analytic Reproducibility

Below we provide more detailed guidance on a number of topics in analytic reproducibility.

7.1 Document hardware and software used for analyses

The more detailed the documentation of analyses, the more likely they are to be fully reproducible. The hardware, the operating system, and the software compiler used during the installation of some statistical software packages can affect analytical results (e.g., Glatard et al., 2015; Gronenschild et al., 2012). Any nonstandard hardware requirements, such as large amounts of RAM or support for parallelized or distributed computing, should be noted.

Similarly, analysis software is subject to change. Software updates may introduce algorithmic changes or modifications to input and output formats and produce diverging results. Hence, it is crucial to document the analysis software that was used including version numbers (American Psychological Association, 2010; Eubank, 2016; Gronenschild et al., 2012; Keeling & Pavur, 2007; Piccolo & Frampton, 2016; Rokem et al., 2017; Sandve, Nekrutenko, Taylor, & Hovig, 2013; Xie, 2015). If analyses involve any add-ons to the base software they, too, should be documented including version numbers.

The utility of a detailed documentation of the employed software is limited to a large extent by the availability of the software and its previous versions. An interested reader may not have the license for a given commercial software package or may be unable to obtain the specific version used in the reported analysis from the distributor. In contrast to commercial software, open source software is usually free of charge, can be included in shared software environments, and previous versions are often much easier to obtain. For these and other reasons open source software should be prefered to commercial closed source solutions (Huff, 2017; Ince, Hatton, & Graham-Cumming, 2012; Morin et al., 2012; Rokem et al., 2017; Vihinen, 2015).

Consider sharing software environments

Beyond a list of software, there are convenient technical solutions that allow researchers to share the software environment they used to conduct their analyses. The shared environments may consist of the analysis software and any addons but can even include the operating system (e.g., Piccolo & Frampton, 2016).

A software environment is organized hierarchically with the operating system at its base. The operating system can be extended by operating system libraries and hosts the analysis software. In addition some analysis software can be extended by add-ons that are specific to that software. Technical solutions for sharing software environments are available at each level of the hierarchy. Moving from the top to the base of the hierarchy the number of obstacles for reproducibility decreases but the technical solutions become more complex and less convenient. Choosing between dependency management systems, software containers, and virtual machines involves a trade-off between convenient implementation and degree of computational reproducibility.

Open source analysis software, such as R and Python, support rich ecosystems of add-ons (so-called packages or libraries) that enable users to perform a large variety of statistical analyses. Typically multiple add-ons are used for a research project. Because the needed add-ons often depend on several other add-ons recreating such software environments to reproduce an analysis can be cumbersome. Dependency management systems, such as packrat (Ushey, McPherson, Cheng, Atkins, & Allaire, 2016) and checkpoint (Microsoft Corporation, 2017) for R, address this issue by tracking which versions of which packages the analyst used. Critically, reproducers can use this information and the dependency management systems to automatically install the correct versions of all packages from the the Comprehensive R Archive Network (CRAN).

Software containers, such as Docker (Boettiger, 2015) or ReproZip (Chirigati, Rampin, Shasha, & Freire, 2016), are a more comprehensive solution to sharing software environments compared to add-on dependency management system. Software containers can bundle operating system libraries, analysis software, including add-ons, as well as analysis scripts and data into a single package that can be shared (Huff, 2017; Piccolo & Frampton, 2016). Because the operating system is not included these packages are of manageable size and require only limited computational resources to execute. With Docker, software containers can be set up automatically using a configuration script—the so-called Docker file. These Docker files constitute an explicit documentation of the software environment and can be shared along with data and analysis scripts instead of packaging them into a single but comparably large file (as ReproZip does). A drawback of software containers is that they are not independent of the hosting operating system and may not support all needed analysis software.

Virtual machines allow sharing the complete software environments including the operating system. This approach eliminates most technical obstacles for computational reproducibility. Common virtualization software, such as VirtualBox (https://www.virtualbox.org/), bundle an entire operating system with analysis software, scripts, and data into a single package (Piccolo & Frampton, 2016). This file can be shared but is of considerable size. Moreover, execution of a virtual machine requires more computational resources than a software container. Similar to Docker, workflow tools, such as Vagrant (https://www.vagrantup.com/), can set up virtual machines including the operating system automatically based on a configuration script, which constitutes an explicit documentation of the environment and facilitates sharing the software environment.

7.2 Automate or thoroughly document all analyses

Most importantly, analytic reproducibility requires that all steps necessary to produce a result are documented (Hardwicke et al., 2018; Sandve et al., 2013) and, hence, documentation of analyses should be considered from the outset of a research project (p. 386, Donoho, 2010). The documentation could be a narrative guide that details each analytical step including parameters of the analysis (e.g., variable coding or types of sums of squares; Piccolo & Frampton, 2016). However, ideally an interested reader can reproduce the results in an automated way by executing a shared analysis script. Hence, if possible the entire analysis should be automated (Huff, 2017; Kitzes, 2017; Piccolo & Frampton, 2016). Any manual execution of analyses via graphical user interfaces should be documented by saving the corresponding analysis script or by using workflow management systems (Piccolo & Frampton, 2016; Sandve et al., 2013).

If possible the shared documentation should encompass the entire analytic process. Complete documentation ideally begins with the raw data and ends with the reported results. If possible, steps taken to visualize results should be included in the documentation. All data manipulation, such as merging, restructuring, and transforming data should be documented. Manual manipulation of raw data should be avoided because errors introduced at this stage are irreversible (e.g., Sandve et al., 2013).

7.3 Use UTF-8 character encoding

Character encodings are systems used to represent symbols such as numbers and text usually in a numeral system, such as binary (zeros and ones) or hexadecimal. Not all character encoding systems are compatible and these incompatibilities are a common cause of error and nuisance. Text files contain no information about the underlying character encoding and, hence, the software either makes an assumption or guesses. If an incorrect character encoding is assumed characters are displayed incorrectly and the contents of the text file may be (partly) indecipherable. UTF-8 is a widely used character encoding system that implements the established Unicode standard. It can represent symbols from most of the world’s writing systems and maintains backward compatibility with the previously dominant ASCII encoding scheme. Its standardization, wide adoption, and symbol richness make UTF-8 suitable for sharing and long-term archiving. When storing text files, researchers should ensure that UTF-8 character encoding is applied.

7.4 Avoid “works on my machine” errors

When a fully automated analysis fails to execute on the computer of someone who wants to reproduce it although the original analyst can execute it flawlessly, the reproducer may be experiencing a so-called “works on my machine” error (WOMME). In the political sciences the rate of WOMME has been estimated to be as high as 54% (Eubank, 2016). Trivially, the replicator may be missing files necessary to run the analysis. As discussed above, WOMME can also be caused by hardware and software incompatibilities. Moreover, the file locations specified in analysis scripts are a common source of WOMME. Space and other special characters in file and directory names can cause errors on some operating systems and should be avoided. Similarly, absolute file paths to a specific location (including hard drive and user directory) are a likely source of WOMME. Hence, researchers should use file paths to a location relative to the current working directory if possible (e.g., Eubank, 2016; Gandrud, 2013a; Xie, 2015) or load files from a permanent online source. To guard against WOMME, researchers should verify that their analyses work on a computer other than their own, prefer open source analytical software that is available on all major operating systems, and ideally share the entire software environment used to conduct their analyses (see the Sharing software environments section). Another option to avoid WOMME is to share data and code via cloud-based platforms, such as Code Ocean (https://codeocean.com/) or RStudio Cloud (https://rstudio.cloud/), that ensure computational reproducibility by running the analysis code in a cloud environment instead of locally on a user’s computer.

7.5 Share intermediate results for complex analyses

Some analyses can be costly to reproduce due to non-standard hardware requirements, because they are computationally expensive, or both. Besides pointing out the costliness of such analyses, researchers can facilitate reproducibility of the simpler analysis steps by sharing intermediate results. For example, when performing simulations, such as the simulation of a statistical models’ joint posterior distribution in Bayesian analyses, it can be helpful to store and share the simulation results. This way interested readers can reproduce all analyses that rely on the simulated data without having to rerun a computationally expensive simulation.

7.6 Set and record seeds for pseudorandom number generators

Some statistical methods require generation of random numbers, such as the calculation of bootstrap statistics, permutation tests in large samples, Maximum likelihood estimation using optimization algorithms, Monte Carlo simulations, Bayesian methods that rely on Markov Chain Monte Carlo sampling, or jittering of data points in plots. Many statistical applications employ algorithmic pseudorandom number generators (PRNG). These methods are called pseudorandom because the underlying algorithms are deterministic but produce sequences of numbers, which have similar statistical properties as truly random sequences. PRNG apply an algorithm to a numerical starting point (a number or a vector of numbers), the so-called seed. The resulting sequence of numbers is fully determined by the seed—every time the PRNG is initiated with the same seed it will produce the same sequence of pseudorandom numbers. Whenever an analysis involves statistical methods that rely on PRNG the seeds should be recorded and shared to ensure computational reproducibility of the results (Eubank, 2016; Sandve et al., 2013; Stodden & Miguez, 2014), ideally by setting it at the top of the analysis script.

Practical Implementation:

Note that the analysis software or add-ons to that software may provide more than one PRNG and each may require its own seed. In principle, any whole number is a valid seed for a PRNG but in practice larger numbers sometimes yield better sequences of pseudorandom numbers in the sense that they are harder to distinguish from truly random sequences. A good way to generate a PRNG seed value is to use a true random number generator, such as https://www.random.org/integers/.

7.6.1 SPSS

SPSS provides the multiplicative congruential (MC) generator, which is the default PRNG, and the Mersenne Twister (MT) generator, which was added in SPSS 13 and is considered to be a superior PRNG—it is the default in SAS, R, and Python. The MC generator can be selected and the seed value set as follows:

SET RNG=MC SEED=301455.

For the MC generator the seed value must be a any whole number between 0 and 2,000,000. The MT generator can be selected and the seed value set as follows:

SET RNG=MT MTINDEX=158237730.

For the MC generator the seed value can be any real number. To select the PRNG and set the seed value in the graphical user interface choose from the menus Transform > Random Number Generators.

7.6.2 SAS

SAS relies on the MT generator. The seed value can be set to any whole number between 1 and 2,147,483,647 as follows:

call streaminit(663562138);

7.6.3 R

R provides seven different PRNG but by default relies on the MT generator. The MT generator can be selected explicitly and the seed value set to any whole number as follows:

set.seed(seed = 923869253, kind = "Mersenne-Twister")

Note that some R packages may provide their own PRNG and rely on seed values other than the one set by set.seed().

7.6.4 Python

Python, too, relies on the MT generator. The seed value can be set to any whole number, a string of letters, or bytes as follows:

random.seed(a = 879005879)

Note that some Python libraries may provide their own PRNG and rely on seed values other than the one set by random.seed().

7.7 Make your analysis documentation easy to understand

It is important that readers of a narrative documentation or analysis scripts can easily connect the described analytical steps to interpretative statements, tables, and figures in a report (e.g., Gandrud, 2013; Sandve et al., 2013). Correspondence between analyses and reported results can be established by adding explicit references to section headings, figures, or tables in the documentation and by documenting analyses in the same order in which the results are reported. Additionally, it can be helpful to give an overview of the results produced by the documented analysis (see the Project Tier DRESS Protocol). Additional analyses that are not reported can be included in the documentation but should be discernible (e.g., by adding a comment “not reported in the paper”). A brief justification why the analyses were not reported should be added as a comment.

Best practices in programming discourage extensive commenting of analysis scripts because comments have to be diligently revised together with analysis code—failing to do so yields inaccurate and misleading comments (e.g., Martin, 2009). While excessive commenting can be useful during analysis, it is recommended to delete obscure or outdated comments once a script is finalized to reduce confusion (Long, 2009). Comments should explain the rationale or intent of an analysis, provide additional information (e.g., preregistration documents or standard operating procedures, Lin & Green, 2016), or warn that, for example, particular analyses may take a long time (Martin, 2009). If comments are needed to explain how a script works, researchers should check whether they can instead rewrite the code to be clearer. Researchers can facilitate the understanding of their analysis scripts by adhering to other common best practices in programming, such as using consistent, descriptive, and unambiguous names for variables, labels, and functions (e.g. Kernighan & Plauger, 1978; Martin, 2009) or avoiding to rely on defaults by explicitly setting optional analysis parameters. Extensive narrative documentation is not necessary in a script file (Eglen et al., 2017), and is better suited to dynamic documents (see below).

As a final note, it can be beneficial to split the analysis documentation into parts (i.e., files and directories) in a way that suits the research project. A basic distinction applicable to most cases is between processing of raw data—transforming original data files into restructured and cleaned data—and data analysis and visualization (see, e.g., http://www.projecttier.org/tier-protocol/specifications/).

7.8 Dynamic documents

It is important that readers of a narrative documentation or analysis scripts can easily connect the described analytical steps to interpretative statements, tables, and figures in a report (e.g., Gandrud, 2013a; Sandve et al., 2013).

Dynamic documents constitute a technically sophisticated approach to connect analytical steps and interpretative statements (e.g., Gandrud, 2013a; Knuth, 1984; Kluyver et al., 2016; Welty, Rasmussen, Baldridge, & Whitley, 2016; Xie, 2015). Dynamic documents intertwine automated analysis scripts and narrative reporting of results. When a document is compiled all embedded analysis scripts are executed and the results are inserted into the text. The mix of analysis code and prose creates explicit links between the reported results and the underlying analytical steps and makes dynamic documents well suited for documentation and sharing. It is possible to extend this approach to write entire research papers as dynamic documents (e.g., Aust & Barth, 2017; Allaire et al. 2017b). When sharing researchers should include both the source file, which contains the executable analysis code, and the compiled file, preferably in HTML or PDF format.

Below we provide a brief overview of three software solutions for creating dynamic documents: R Markdown (Allaire et al., 2017a), Jupyter (Kluyver et al., 2016), and StatTag (Welty, et al., 2016).

7.8.1 R Markdown

rmarkdown is an R package that provides comprehensive functionality to create dynamic documents. R Markdown files consist of a front matter that contains meta information as well as rendering options and is followed by prose in Markdown format mixed with R code chunks. Markdown is a formatting syntax that was designed to be easy-to-read and -write (e.g., *italic* yields italic) and has gained considerable popularity in a range of applications. When the document is compiled the R code is executed sequentially and the resulting output (including figures and tables) is inserted into the document before it is rendered into a HTML, Word, and PDF document. Although R Markdown is primarily intended for R, other programming languages, such as Python or Scala, have limited support.

R Markdown uses customizable templates that control the formatting of the compiled document. The R package papaja (Aust & Barth, 2017) provides templates that are specifically designed to create manuscripts in APA style and functions format analysis results in accordance with APA guidelines. Additional document templates that conform to specific journal or publisher guidelines are available in the rticles package (Allaire et al., 2017b)

The freely available integrated development environment RStudio provides good support for R Markdown and can be extended to, e.g., count words (Marwick, n.d.) or search and insert citations from a BibTeX file or Zotero library (Aust, 2016).

7.8.2 Jupyter

Jupyter is a web-application for creating dynamic documents that support one or multiple programming languages, such as Python, R, Scala, and Julia. Like R Markdown, Jupyter relies on the Markdown formatting syntax for prose and while the primary output format for dynamic documents is HTML, Jupyter documents can be rendered to other formats with document templates, albeit less conveniently. Like in R Markdown, Jupyter can be extended, e.g., to search and insert citations from a Zotero library.

7.8.3 StatTag

StatTag can be used to create dynamic Word documents. It supports integration with R, SAS, and SPSS by inserting the contents of variables defined in the analysis scripts into the word document. Other document formats are not supported.

7.8.4 Comparison

StatTag may be the most beginner friendly but currently least flexible option and it is the only of the three presented options that supports SAS and SPSS. Jupyter is the recommended alternative for researchers using Python, Scala, and Julia, or for researchers whose workflows combine multiple programming languages including R. While Jupyter is well suited for data exploration, interactive analysis, and analysis documentation, R Markdown is better suited for writing PDF and Word documents including journal article manuscripts. In contrast to Jupyter, R Markdown relies entirely on text files, works well with any text editor or integrated development environment, and is better suited for version control systems such as git. Technical requirements and personal preferences aside, R Markdown, Jupyter, and StatTag are all well suited for documenting and sharing analyses.