Make It Easier to (Re)use Your Data

This post originally appeared on the Software Carpentry website.

Software Carpentry has focused on computing for most of its 14 years (primarily because that's what I'm most familiar with) but it's increasingly clear that we need to tackle other parts of the research cycle. One is the new ideas clustered around publication, discovery and metrics, which I'll discuss in a future post. The other is data management; we only touch on the topic right now, but it's as important to most scientists as crunching numbers, and how best to do it is changing rapidly. Luckily, a few of our friends have written a guide for the perplexed:

Ethan P. White, Elita Baldridge, Zachary T. Brym, Kenneth J. Locey, Daniel J. McGlinn, and Sarah R. Supp: "Nine simple ways to make it easier to (re)use your data". Ideas in Ecology and Evolution, 6(2):1-10 DOI:10.4033/iee.2013.6b.6.f
Sharing data is increasingly considered to be an important part of the scientific process. Making your data publicly available allows original results to be reproduced and new analyses to be conducted. While sharing your data is the first step in allowing reuse, it is also important that the data be easy understand and use. We describe nine simple ways to make it easy to reuse the data that you share and also make it easier to work with it yourself. Our recommendations focus on making your data understandable, easy to analyze, and readily available to the wider community of scientists.

Their nine specific recommendations (elaborated at readable length in the paper) are:

  1. Share your data.
  2. Provide metadata.
  3. Provide an unprocessed form of the data.
  4. Use standard data formats.
  5. Use good null values.
  6. Make it easy to combine your data with other datasets
  7. Perform basic quality control.
  8. Use an established repository.
  9. Use an established and liberal license.

It's a great outline for a half-day introduction to data management as part of an "extended play" Software Carpentry course, particularly when combined with William Stafford Noble's "A Quick Guide to Organizing Computational Biology Projects". We hope to turn the pair into lessons by September.

Dialogue & Discussion

Comments must follow our Code of Conduct.

Edit this page on Github