Data Provenance Challenge

This post originally appeared on the Software Carpentry website.

John's summary of our discussion about what to teach scientists about reproducible research if they already believe it's a good thing, and want to start doing it reminded me that I never posted about the Provenance Challenge. It has been run twice so far; each time, authors of tools to track the provenance (or lineage) of scientific data have to implemented some workflows, then answers questions about where data came from, what was done to it, and so on. The results of the first challenge are described system-by-system in these papers (sorry, but it's behind a wall — if you google for combinations of the authors' names, you can find PDF preprints). This is a very cool research area, and I hope one of my incoming grad students will want to do something with it.