Surviving the Tsunami

This post originally appeared on the Software Carpentry website.

The October 2011 issue of ACM Queue features an article by Bruce Berriman and Steven Groom titled "How Will Astronomy Archives Survive the Data Tsunami?" The figures are scary: astronomers already have a petabyte of publicy-available data, and are adding half a petabyte per year, a rate which will increase dramatically as new instruments come online. The only way to avoid this all becoming write-only is to bet on emerging technologies, from general-purpose GPUs to cloud computing. The problem, of course, is that "emerging" usually means "flaky", both because the tools haven't had time to mature, and because we, their users, don't have the years of experience needed to know how best to use them. (As far as I'm concerned, we're still trying to figure out how best to use object-oriented programming in science, and we've been at it for thirty years...)

But here's the good news. Instead of just the usual perfunctory nod toward education and training, Berriman and Groom put a spotlight on it:

An archive model that includes processing of data on servers local to the data will have profound implications for end users, who generally lack the skills not only to manage and maintain software, but also to develop software that is environment-agnostic and scalable to large data sets. Zeeya Merali [...] and Igor Chilingarian and Ivan Zolotukin [...] have made compelling cases that self-teaching of software development is the root cause of this phenomenon...

Berriman and Groom go on to recommend that we "...make software engineering a mandatory part of graduate education, with a demonstration of competency as part of the formal requirements for graduation." As I've discussed before, there's little chance of this happening in the short or medium term: everyone's curriculum is already over-full, and senior professors who only know what they taught themselves a generation ago are unlikely to push aside core courses in stellar dynamics or planetary physics to make room for version control and design patterns. What we can do, I think, is make resources like Software Carpentry more usable, and implement some sort of badging system to give students recognition for having completed the training themselves, and for passing it on to others (which would in turn encourage the formation of self-help groups like the University of Wisconsin's Hacker Within). All we need is funding for a couple of people for a couple of years...

Dialogue & Discussion

Comments must follow our Code of Conduct.

Edit this page on Github