Correctness Isn't Compelling
This post originally appeared on the Software Carpentry website.
The final report from the ICERM workshop on Reproducibility in Computational and Experimental Mathematics is now available, and its appearance has prompted me to explain why we don't put more emphasis on reproducibility in Software Carpentry. Long story short, it's because scientists don't care because they're not rewarded for doing so. Here's the math:
- Assume five million scientific papers were published in the decade 1990–2000. (The actual number depends on what you include as a "paper", but our reasoning holds.)
- Of those, perhaps a hundred have been retracted because of honest computational irreproducibility ("honest", because fraud isn't part of this argument).
- That means the odds that a scientist will have to retract a particular paper because someone noticed that her calculations couldn't be reproduced are one in fifty thousand.
- So if the average paper takes eight months to produce, and scientists work six-day weeks, that means it's only worth spending 115 extra seconds per paper on reproducibility as insurance.
Different assumptions and models will naturally produce different answers, but won't change the conclusion: given the system we have today, investing extra time to make work reproducible as insurance against error isn't economical. RR's advocates may respond, "That's why we're trying to change the system," but chicken-and-egg traps are notoriously difficult to break out of: if people don't care about the reproducibility of their own work, they're unlikely to check it when reviewing other people's work, and around and around we go. Trying to get them to be an early adopter of new practices (which aren't yet rewarded consistently by their peer group) is therefore a very hard sell.
This is more than just speculation. When we first started teaching Software Carpentry at Los Alamos National Laboratory in 1998, we talked a lot about the importance of testing to see if code was correct. People nodded politely, but for the most part didn't actually change their working practices. Once we started telling them that testing improved productivity by reducing re-work, though, we got significantly more uptake. Why? Because if you cut the time per paper (or other deliverable) from eight months to seven or six, you've given people an immediate, tangible reward, regardless of what their peers may or may not do.
So here's my advice to advocates of reproducible research: talk about how it helps the individual researcher get more done in less time. Better yet, measure that, and publish the results. Scientists have been trained to respect data; if you can show them how much extra effort RR takes using today's tools, versus how much re-work and rummaging around it saves, they'll find your case much more compelling.