Data Management Plans: A Role for Software and Data Carpentry
This post originally appeared on the Software Carpentry website.
I spent the better part of the last three weeks working on an NSF-IOS Doctoral Dissertation Improvement Grant (DDIG) proposal. Pretty much daily, I consulted this list of publically available grant proposals in the biological sciences to look at other people's proposals. It's an awesome resource if you want to see how people write their project description, but there are no links to example data management plans, facilities, summaries, etc. Where does one go for examples of or advice on these supplementary documents?
At least part of the answer is "here". The last page of NSF's information about Data Management Plan requirements, updated on October 1, urges readers to check out Data Carpentry and Software Carpentry for resources and training. This is a huge shout-out (see these tweets), so how can SWC and Data Carpentry do more?
First, here is the advice on and examples of data management plans that I found as well as my thoughts on how the Software and Data Carpentry community can contribute.
The DMPTool seems to be the most commonly recommended tool as it has templates that will walk you step-by-step through creating a funding agency-specific plan. I followed their guidelines suggestions as I revamped my PIs data management plan to fit my proposal.
This blog post from 2012 includes links to five NSF data management plans (for CO2 data, ground water data, HDF files, bug collection, and copepod growth).
These are useful, but I really wanted some examples for Next Generation Sequencing data. If you know of some, please share.
In the meantime, some other things we can do include:
Continue teaching Git and Github. I learned how to use Git during my first workshop and I never looked back. I mentioned GitHub for sharing and disseminating data in my proposal.
Create a page of data management plans. It seems like it shouldn't be that hard to have a webpage with lots of examples of data management plans covering a wide range of data types in the natural sciences and beyond.
Host an Instructor Retreat session on Data Management. If someone who is passionate and knowledgeable about data management, it might be useful to broadcast and archive a brief tutorial about data management plan for that current and future researchers.
The genomics community is growing rapidly, but our data sets are growing even faster. We need to work together to meet this challenge, so please reply if you have some resources to share or are interested in continuing the discussion. Happy data collecting!