Does Continuous Publication Require Continuous Attention?
This post originally appeared on the Software Carpentry website.
I read this post by Martin Fenner a couple of weeks ago. His thesis is that scientific publication is still very much a manual process, which makes publications relatively infrequent (and fairly painful) events. Instead, we ought to strive for continuous delivery: production of the "paper" (including release of associated code and data) should be fully automated so that authors can ship whenever they want with relatively little effort.
Continuous delivery is popular among software developers, who frequently argue it's more efficient using diagrams like this:
What's missing from this picture, though, is the cost to customers (or in the case of publishing, readers). Every time Mozilla releases an update to Firefox, millions of people have to wait thirty seconds for it to download, install itself, and restart. And despite the best efforts of a world-class release engineering team, every update will destabilize somebody, somewhere.
Similarly, every time someone updates a pre-print on arXiv.org, everyone who has read the original has to choose between ignoring the changes or re-reading the paper. In the first case they risk missing results, but in the second they pay an opportunity cost, just as users do when companies update software over and over.
Things are worse for the readers of scientific papers. Release engineers can check that upgrades work for common configurations before shipping them, but there's no equivalent for semantic changes to papers. And as more scientists start communicating via blogs and twitter, keeping up to date with changes to things previously read will only become harder.
What science may be moving toward is a "continuous patch" model rather than a "continuous release" model. Lex Nederbragt wrote in a comment on an early draft of this post:
What I think may happen is that research is released not through arXiv anymore, but on either personal or central sites where researchers add and update results. Each small but significant change will be summarised in some sort of..."release note". Authors also may use their blog to give an overview of recent changes. Discussions of the work will ensue through the release site comment section, or issues a la github, and perhaps another researcher who wants to add an analysis [will fork] the repo and submit a pull request... At certain points, researchers will want to write up a somewhat larger overview paper, which actually may be submitted and published through the traditional journal...
To which Pauline Barmby replied:
From an author point-of-view though, there is something to be said for having an end goal: my experience to date is that I never really finish a research project, I just get sick of it. While you could always improve something, at some point you just have to stop and move on to the next thing... From a reader point-of-view: I already struggle to keep up with the literature in my field as it currently exists. It's not obvious where I would find the time to "check for updates" when I am referring to existing work. If my own current project might have dependencies on published work then I might do so, although again that delays getting my project done. So I agree... there is a cost to incremental publishing, and I think it applies to both reader and writer.
The tradeoff here is not new. Encapsulated context gives you the whole story at once: you need to carry less around in your head, but you have to notice and synthesize changes. Incremental context gives you the changes, and is quicker to assimilate if (and only if) you're keeping track of the current state of the conversation. Diff and merge tools do a decent job of translating encapsulated context into incremental context for simple text files like programs, but are oblivious to semantics. More complicated tools have mostly failed in practice, leaving the burden of comprehension on the reader.
And of course (and as always), there's the problem of attribution. If I make a small incremental improvement in your work, how should it be cited and credited? Returning to the conversation about the first draft of this article:
Pauline Barmby wrote: I see that (given 20 years or so) some kind of new recognition model can be developed.
W. Trevor King wrote: Until then, I think you just have to highlight your contributions in your CV, and talk about how awesome folks think your release process is in your research statement ;).