Citing Versions

This post originally appeared on the Software Carpentry website.

We got mail yesterday from a workshop participant saying, "My question is how does one show in a research paper that the underlying data and the software is version controlled?" Cameron Neylon's answer, slightly edited, was:

My approach in an idea world would be to have all of my data (or links to it) under version control along with the code. When the version to be used for the publication is clear I would give it a tag (I'm a Git user but there is similarly functionality in all version control systems) and then push that to an online repository. You can then give a link or reference to the appropriate repository version online. If you don't want to put your main repository online then you could just put up the version from the publication.

Of course this is not so easy if you are doing it in retrospect. Your data may be in other places in systems that aren't under proper version control. If the data is small enough I would grab a copy and put it in the repository version you are using for publication. If its big and stored remotely then you are a bit limited. In that case I would try and refer to a specific version if it is possible, or if you can't do that then you can try and get a checksum.

But basically the main thing is to create and refer to a specific version of your repository and make sure it is available in a useful form to people who want to check it out.

Dialogue & Discussion

Comments must follow our Code of Conduct.

Edit this page on Github