My Favorite Tool is Git
This post originally appeared on the Data Carpentry website
I only use them for work I care about. Examples include lesson development for R workshops, my recent performance review packet, and collaborative projects on species distribution modeling. Oh, and Software Carpentry workshop websites (obviously).
Why I like Git:
The version control system and the companion remote host system (GitHub or Bitbucket or CloudForge, etc.) provide a great versioning and collaboration platform, the highlights of which have been enumerated many times over and in such depth that I won’t talk about them here.
The reason I like this dynamic duo is that it reinforces best practices. I should say that using version control won’t necessarily make you 100% compliant with everyone’s idea of best practices, but with a little consideration of a workflow, it can go a long way. Here is why:
Reproducibility: By ignoring my
output
folder in pretty much all my Git repositories, it forces all figures & analyses to be completely reproducible from materials that are in the folders that are tracked.Offsite backup: Rather than lugging my aging laptop to and fro,
pull-add-commit-push
allows me to preserve my work in a location accessible from any internet-enabled terminal. This has the added benefit of protecting against natural disasters and the inevitable bricked hard drive (mark my words, death, taxes, and a failed HD are the only certainties in life now).Sharing: Sure, some of what I currently work on is not ready to be released, so I use the private repository option. But when I am ready to share my code and data, it’s literally one to two mouseclicks and my work is open for re-use by the community. The visibility of platforms like GitHub and Bitbucket make work that much more discoverable.
Documentation: Everybody’s favorite part of software development is … not likely writing documentation (granted there are some of you out there). Because good documentation is imperative for re-use and evaluation, the little reminders from GitHub (“Help people interested in this repository understand your project by adding a README”) further encourage best practices for open research. The support for markdown rendering on GitHub makes it especially nice for writing professional-looking documentation of your work.
Sure, I struggled struggle sometimes with Git syntax and concepts, but 98% of the time I only use four
commands (pull-add-commit-push
, remember?) and the Git/GitHub combo reduces the time I spend developing,
preserving, and sharing the work I do.
– Jeff Oliver, Data Science Specialist, Tucson, Arizona
Have a favorite tool of your own? Please tell us about it!