Steven Koenig: What I've Learned
This post originally appeared on the Software Carpentry website.
I have been a PhD student at Technische Universität München, Germany since May 2012. My research interest is biopolymers: how to produce them, and what to do with them. Where computers come into play? Literally everywhere. And what I need Software Carpentry for? Literally everything.
Nowadays, many tasks in the lab are carried out using computers: from running chromatography systems to evaluating photometric 96 well assays, from rheometry to next-generation sequencing. The usefulness of the skill set covered by Software Carpentry was obvious to me the minute I read about the course contents.
Although I am regarded as the computer nerd at work, there is still plenty to learn. And I found some things I always wanted to be introduced to in the course contents:
- Git (distributed version control software)
- make (automation)
- Python (programming language)
- Unit tests (self-tests for programmes)
Just this weekend I had to take a look at the old files of my bachelor dissertation and what I found in one directory looked something like that:
- 20091201 - HA_PEF.ods
- 20091201 - HA_PEF.xls
- 20091210 - HA_PEF.xls
- 20100105 - HA_PEF.ods
- 20100105 - HA_PEF.xls
- 20100107 - HA_PEF.xls
- 20100108 - HA_PEF.xls
- 20100111 - HA_PEF.xls
- 20100112 - HA_PEF.xls
- 20100114 - HA_PEF.xls
- 20100115 - HA_PEF.xls
- 20100120 - HA_PEF.xls
- 20100121 - HA_PEF.xls
- 20100122 - HA_PEF.xls
- 20100125 - HA_PEF.xls
- 20100126 - HA_PEF.xls
- 20100127 - HA_PEF.ods
- 20100127 - HA_PEF.xls
- 20100205 - HA_PEF.xls
- 20100212 - HA_PEF.xls
I was quite astonished to see this, I could not even remember that I did it that way back then. At that time, I never could have imagined that I could just have something like that instead:
- HA_PEF.xls
Using version control transformed the way I think about organising data. I was familiar with SVN before the SWC course, but it is way too cumbersome to set up and use locally IMHO Now that I know about Git, I can have one and only one file and I can leave the overhead of having different versions to Git. No more cluttering of my working directories. And there are even more benefits to using version control:
- When used with text files, the differences between different versions can be viewed.
- Other scientists can pull in my stuff and continue working on it having access to the complete project history.
- Comments for differences introduced allow for quickly finding your way through your repository.
There are more advantages of using version control and I recommend every scientist to get familiar with it as soon as possible.
While automating routine work in the lab is common, automating the evaluation of the data generated is not as common. At least in our lab, data from high-throughput screenings are evaluated using Excel sheets, which involves lots of copying of data back and forth. Using SWC skills we can tap into the enormous potential of automation:
- Increased reliability: Humans are bad at repetitive tasks, computers are good at that. So, share the work with your computer. By using unit tests, the computer can even run tests to see if its programmes are correct or not.
- Increased reproducibility: Bundling input data with the evaluation instructions (as a programme) allows to reproduce the output data from scratch.
- Time savings: Once the framework is ready, 80+% of time can be saved easily.
- Cost savings: While the computer is doing the evaluation, staff can focus on other problems.
Software Carpentry taught me that out there are numerous other people who think like me on these issues. Therefore, I hope to slowly creep in these "new" technologies into the everyday life of everybody in the lab.