Harder Than It Should Be
This post originally appeared on the Software Carpentry website.
Someone once said, "Chemistry is basically anything chemists will give each other awards for doing." Or something like that—Google doesn't find matches for that exact quote. Even if I've mangled it, the idea is sound: art is no more and no less than what great artists accept as being art.
So what is computer science? More particularly, what constitutes the core of computer science? What's the stuff that everyone who calls themselves a "computer scientist" should know, or at least have seen? One way to answer the question would be to look at what people are given prizes for, but that's turning out to be harder than I expected, and the reason highlights a gap in this course.
Let's start with the two biggest academic prizes open to the whole spectrum of CS: the ACM Doctoral Dissertation Award, and the A. M. Turing Award, which is often called "the Nobel Prize of computing". The page I linked to lists the names of the Dissertation Award winners from 1978 to the present, but those links take you to pages that have nothing more on them than the name of the prizewinning thesis (and in some cases, a press release or a photo of the winner accepting a check). There's no useful metadata anywhere to be seen: not keywords (which is what I'm after), not links to scholarly databases (so that I could write a script to harvest keywords), nothing. I could write a script to googlewhack the author's name and thesis title, but the half-dozen pages I looked at were formatted in three different ways, so that smells like a lot more effort than I'm willing to put in to do something that my local second-hand stereo parts store has supported since 2005 (or maybe even earlier).
The Turing Award site is a bit better: once you figure out that you have to select a sorting order to get the landing page to display more than the most recent winner, the sub-pages that the main page links to do contain a few sentences explaining why each winner won. There's still no structured metadata, though, so something that I know could be done in 10 minutes looks like it would take half a day, which means I'm not going to do it.
Software Carpentry doesn't really talk about this issue anywhere. It shows you how to use a database, and the essay on provenance nods to the value of structured metadata without going over to say hello, but that's about it. I'm constantly taken aback by how much time real scientists spend looking things up and chasing things down (journal editors are unlikely to take "or something like that" as sufficient citation for a quote like the one that started this post). We really should include something about the computational side of knowledge management and discovery in this course, but for the life of me, I don't know what—if you do, please tell me. And if you have any clout with the ACM, please point out that since they require people to specify topic keywords when submitting papers for publication, it would be only fair of them to give us back a few keywords when we need them...