Replication & Reproducibility
Issues, Concepts and an Introduction to Code Versioning
What is the replicability crisis? How widespread is it? Where does it come from, and what can we do about it? This seminar covers the fundamentals of the replication crisis and the current debate around the tools used to navigate it, from statistical roots and questionable research practices to preregistration, registered reports, and code versioning.
At a glance
Requirements
- Read the assigned research articles each week
- Give one 30-min presentation in class, followed by a 30-min discussion
- Submit one question to the presenter before each class
Presentation guidelines
Each presentation (30–35 min, two presenters per session) should cover:
- Relevant background and motivation — what the question is and why it is interesting
- The study design and methodological aspects — for empirical studies: number of subjects, task description, conditions, duration; for simulations: simulation details — answering what was done
- An explanation of any concepts that may be unfamiliar to your colleagues
- Hypotheses — what specifically is expected in the data
- Results — what they are and how they were obtained
- A one-slide summary that flags open questions or issues
Discussion guidelines
Each discussion (25–30 min) should be an interactive in-class session that engages with the topic of the presentation and uses the submitted question as input. For example: round tables for different questions, hands-on demonstrations, short presentations, or discussion of a related paper.
Sessions & readings
Background and foundations
Understand the basic premises of modern science.
- Chalmers, A. F. (2013). Falsificationism and progress. In What is this thing called science? (4th ed., pp. 66–83). Hackett Publishing.
- National Academies of Sciences, Engineering, and Medicine. (2019). Reproducibility and replicability in science (Executive Summary). The National Academies Press.
Statistical roots
Understand what effect power, bias, and flexible analyses have on the rate of false positives and false negatives.
- Ioannidis, J. P. A. (2005). Why most published research findings are false. PLOS Medicine, 2(8), e124.
- Greenland, S., Senn, S. J., Rothman, K. J., et al. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. European Journal of Epidemiology, 31(4), 337–350.
- Wagenmakers, E. J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14(5), 779–804.
- Rouder, J. N., Morey, R. D., Verhagen, J., Province, J. M., & Wagenmakers, E. J. (2016). Is there a free lunch in inference? Topics in Cognitive Science, 8(3), 520–547.
Empirical evidence of a "crisis" in psychology (and elsewhere)
Get a sense for the scale and scope of the reproducibility crisis.
- Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.
- Camerer, C. F., et al. (2018). Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour, 2(9), 637–644.
Case study: Does putting a pen in your mouth make you happy?
Understand how the discussion is shaped by multiple perspectives.
- Wagenmakers, E. J., Beek, T., Dijkhoff, L., et al. (2016). Registered replication report: Strack, Martin, & Stepper (1988). Perspectives on Psychological Science, 11(6), 917–928.
- Noah, T., Schul, Y., & Mayo, R. (2018). When both the original study and its failed replication are correct. Journal of Personality and Social Psychology, 114(5), 657.
Computational reproducibility
Understand the issues that complicate reproducing computational pipelines.
- Stodden, V., Leisch, F., & Peng, R. D. (2016). Enhancing reproducibility for computational methods. Science, 354(6317), 1240–1241.
- Obels, P., Lakens, D., Coles, N. A., Gottfried, J., & Green, S. A. (2020). Analysis of open data and computational reproducibility in registered reports in psychology. Advances in Methods and Practices in Psychological Science, 3(2), 229–237.
Impact of post-hoc decisions
Identify p-hacking, HARKing, and selective reporting; understand systemic incentives.
- Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366.
- Fanelli, D. (2009). How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. PLoS ONE, 4(5), e5738.
Research culture and beliefs around reproducibility
Which incentive structures could play a role in science?
- Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature, 533, 452–454.
- John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23(5), 524–532.
Preregistration, registered reports & replications
Different ways to do science and publish results — preregistration, preprints, eLife, executable notebooks.
- Wagenmakers, E. J., Wetzels, R., Borsboom, D., van der Maas, H. L., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7(6), 632–638.
- Chambers, C. D. (2013). Registered reports: A new publishing initiative at Cortex. Cortex, 49(3), 609–610.
- Ankel-Peters, J., Fiala, N., & Neubauer, F. (2025). Is economics self-correcting? Replications in the American Economic Review. Economic Inquiry, 63(2), 463–485.
Changing p-values
Understand the effect of changing the p-value standard on the literature.
- Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. J., et al. (2018). Redefine statistical significance. Nature Human Behaviour, 2(1), 6–10.
Bayesian approaches; AIC vs. BIC
Evaluate Bayesian approaches as alternatives.
- Wagenmakers, E. J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14(5), 779–804.
- Vrieze, S. I. (2012). Model selection and psychological theory: A discussion of the differences between the AIC and the BIC. Psychological Methods, 17(2), 228–243.
Sample sizes
- Lakens, D. (2022). Sample size justification. Collabra: Psychology, 8(1), 33267.
- Button, K., Ioannidis, J., Mokrysz, C., et al. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365–376.
Hands-on example
Analyze one example data set and look at consistency and agreement — a p-hacking contest.
Hands-on: Git, OSF, containers & Open Science
Practical introduction to code versioning and open-science tooling.
Discussion
Short take-away presentations: your most important lesson from the course, or something that wasn't covered.