Replication & Reproducibility

Issues, Concepts and an Introduction to Code Versioning

What is the replicability crisis? How widespread is it? Where does it come from, and what can we do about it? This seminar covers the fundamentals of the replication crisis and the current debate around the tools used to navigate it, from statistical roots and questionable research practices to preregistration, registered reports, and code versioning.

At a glance

Programme: Psychologischer Wahlbereich (B), MSc Psychology, Institute of Psychology, Universität Hamburg
Instructor: Prof. Schuck
Format: weekly 90-minute seminar during the winter term

Requirements

Read the assigned research articles each week
Give one 30-min presentation in class, followed by a 30-min discussion
Submit one question to the presenter before each class

Presentation guidelines

Each presentation (30–35 min, two presenters per session) should cover:

Relevant background and motivation — what the question is and why it is interesting
The study design and methodological aspects — for empirical studies: number of subjects, task description, conditions, duration; for simulations: simulation details — answering what was done
An explanation of any concepts that may be unfamiliar to your colleagues
Hypotheses — what specifically is expected in the data
Results — what they are and how they were obtained
A one-slide summary that flags open questions or issues

Discussion guidelines

Each discussion (25–30 min) should be an interactive in-class session that engages with the topic of the presentation and uses the submitted question as input. For example: round tables for different questions, hands-on demonstrations, short presentations, or discussion of a related paper.

Sessions & readings

Session 1

Background and foundations

Understand the basic premises of modern science.

Chalmers, A. F. (2013). Falsificationism and progress. In What is this thing called science? (4th ed., pp. 66–83). Hackett Publishing.
National Academies of Sciences, Engineering, and Medicine. (2019). Reproducibility and replicability in science (Executive Summary). The National Academies Press.

Session 2

Statistical roots

Understand what effect power, bias, and flexible analyses have on the rate of false positives and false negatives.

Ioannidis, J. P. A. (2005). Why most published research findings are false. PLOS Medicine, 2(8), e124.
Greenland, S., Senn, S. J., Rothman, K. J., et al. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. European Journal of Epidemiology, 31(4), 337–350.
Wagenmakers, E. J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14(5), 779–804.
Rouder, J. N., Morey, R. D., Verhagen, J., Province, J. M., & Wagenmakers, E. J. (2016). Is there a free lunch in inference? Topics in Cognitive Science, 8(3), 520–547.

Session 3

Empirical evidence of a "crisis" in psychology (and elsewhere)

Get a sense for the scale and scope of the reproducibility crisis.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.
Camerer, C. F., et al. (2018). Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour, 2(9), 637–644.

Session 4

Case study: Does putting a pen in your mouth make you happy?

Understand how the discussion is shaped by multiple perspectives.

Wagenmakers, E. J., Beek, T., Dijkhoff, L., et al. (2016). Registered replication report: Strack, Martin, & Stepper (1988). Perspectives on Psychological Science, 11(6), 917–928.
Noah, T., Schul, Y., & Mayo, R. (2018). When both the original study and its failed replication are correct. Journal of Personality and Social Psychology, 114(5), 657.

Session 5

Computational reproducibility

Understand the issues that complicate reproducing computational pipelines.

Stodden, V., Leisch, F., & Peng, R. D. (2016). Enhancing reproducibility for computational methods. Science, 354(6317), 1240–1241.
Obels, P., Lakens, D., Coles, N. A., Gottfried, J., & Green, S. A. (2020). Analysis of open data and computational reproducibility in registered reports in psychology. Advances in Methods and Practices in Psychological Science, 3(2), 229–237.

Session 6

Impact of post-hoc decisions

Identify p-hacking, HARKing, and selective reporting; understand systemic incentives.

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366.
Fanelli, D. (2009). How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. PLoS ONE, 4(5), e5738.

Session 7

Research culture and beliefs around reproducibility

Which incentive structures could play a role in science?

Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature, 533, 452–454.
John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23(5), 524–532.

Session 8

Preregistration, registered reports & replications

Different ways to do science and publish results — preregistration, preprints, eLife, executable notebooks.

Wagenmakers, E. J., Wetzels, R., Borsboom, D., van der Maas, H. L., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7(6), 632–638.
Chambers, C. D. (2013). Registered reports: A new publishing initiative at Cortex. Cortex, 49(3), 609–610.
Ankel-Peters, J., Fiala, N., & Neubauer, F. (2025). Is economics self-correcting? Replications in the American Economic Review. Economic Inquiry, 63(2), 463–485.

Session 9

Changing p-values

Understand the effect of changing the p-value standard on the literature.

Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. J., et al. (2018). Redefine statistical significance. Nature Human Behaviour, 2(1), 6–10.

Session 10

Bayesian approaches; AIC vs. BIC

Evaluate Bayesian approaches as alternatives.

Wagenmakers, E. J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14(5), 779–804.
Vrieze, S. I. (2012). Model selection and psychological theory: A discussion of the differences between the AIC and the BIC. Psychological Methods, 17(2), 228–243.

Session 11

Sample sizes

Lakens, D. (2022). Sample size justification. Collabra: Psychology, 8(1), 33267.
Button, K., Ioannidis, J., Mokrysz, C., et al. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365–376.

Session 12

Hands-on example

Analyze one example data set and look at consistency and agreement — a p-hacking contest.

Session 13

Hands-on: Git, OSF, containers & Open Science

Practical introduction to code versioning and open-science tooling.

Session 14

Discussion

Short take-away presentations: your most important lesson from the course, or something that wasn't covered.