How We Got Here, and Where We Are

This post originally appeared on the Software Carpentry website.

I gave a talk in Santa Fe early in 1997 describing a set of articles I'd organized for the Summer 1996 and Fall 1996 issues of IEEE Computational Science and Engineering (now Computing in Science & Engineering on the subject, "What should computer scientists teach physical scientists and engineers?" After the talk, John Reynders (then the director of the Advanced Computing Lab at Los Alamos National Laboratory) challenged me to put my money where my mouth was and actually teach basic software development skills to working scientists.

Brent Gorda and I ran the course for the first time in July 1997. We used Perl as a programming language, and covered topics such as CVS, regular expressions, and a little bit of web client programming. Our part of the course was three days long, and was followed by a two-day consulting visit from Steve McConnell (whose books Code Complete and Rapid Development were at the top of the charts). We ran the course in various forms another five or six times in the next three years, during which time we switched to Python and expanded it to five days. All told, about 120 LANL technical staff went through the course, most of them under 35.

In 2004, after I'd taught the course for the Space Telescope Science Institute and the US Navy, the Python Software Foundation gave me a grant to reorganize, update, and expand the material. That version is the core of what's now online; when I last checked, the site was getting 10-12,000 distinct visitors a month, and the material was being used in whole or in part at Caltech, Indiana, several schools in the UK and Germany, Chile, South Korea, and of course here in Toronto.

Based on follow-ups with alumni, I'd guess that it has no effect at all on 20-25%, who take the course because their manager or supervisor told them they had to, and get little out of it. The rest routinely describe it as game changing: a PhD candidate in Psychology who did the course with us in July 2009 told me a few days ago that what she learned probably saved her six months on her current project, and that without it, a second project would simply not have occurred to her to try. As another data point, one of the other alumni of that offering came to me early in October to say that several of her labmates wanted to take the course, and was I planning to offer it again any time soon? I told her that I wasn't, but that I could arrange for a CS grad student to teach it. Three weeks later, 65 students from Pysch and Linguistics had signed up to do it as a non-credit course, roughly 45 of whom have stuck with it so far.

While I don't have data to back this up, I believe very strongly that what most students get out of the course isn't specific knowledge about relational databases, regular expressions, or object-oriented programming, but rather a mental map of the computing landscape, so that they know what's supposed to be easy, what else is supposed to be possible, and where to go looking for more information. Another student from the July 2009 offering said that the biggest thing the course did for him was turn "unknown unknowns" into "known unknowns". I'm supposed to conduct a follow-up survey with those students later this month to see how much they're using what they learned, and what impact is has had; I hope to have results up on the web by Easter.

And as regular readers will know, I'm presently trying to raise money to update the material: this post explains the background, while this plan incorporates what I've learned from students and instructors on four continents about what material, sequence, and presentation will actually "reach" scientists. Sadly, though, funding agencies and companies mostly still seem to think that only HPC-related training is worth funding, which I feel is asking scientists to run before they can walk. This CiSE paper talks about this particular frustration, while our survey results put weight behind the claim that the overwhelming majority of scientists will benefit much more from being helped with development issues than from anything to do with big iron.

Dialogue & Discussion

Comments must follow our Code of Conduct.

Edit this page on Github