Why Is This Hard?

This post originally appeared on the Software Carpentry website.

I've been teaching scientists to program since 1998 (or 1986, if you want to start with my first lunch-and-learn for grad students in physics at the University of Edinburgh). Technology has advanced by leaps and bounds in that time, but I don't think it's any easier than it used to be to get basic software skills into people's heads. What makes it hard?

Programming is intrinsically difficult. It's fashionable to claim otherwise, but abstract thinking is a fairly recent innovation in evolutionary terms, and our brains still find it hard. On the other hand, I don't believe that state machines and data transformations are any harder than high school algebra, and everyone we're trying to help has long since mastered that.

Today's languages and tools make it more difficult. Setup (particularly installation) is, if anything, harder than it was twenty years ago, and even the cleanest languages are full of accidental complexity (particularly in their libraries). (And if you think otherwise, try running a programming workshop for non-programmers working on half a dozen different operating systems, with two or three slightly different versions of your favorite language installed, and then post your dissenting comment.) It's heartening to see that people are finally reviving research from the 1970s and 1980s into the usability of programming languages, but as we found out the hard way, it will be a long time before computer "scientists" start accepting scientific answers to these questions, much less acting on them.

Our students' diverse backgrounds make teaching more difficult too. Our recent workshop at the University of Toronto had students from linguistics through chemistry to astronomy. Some of them had never used a command shell before; others were their labs' unofficial sys admins, and we saw similar variation in almost every other aspect of their computing knowledge. The solution, of course, would be to divide them into levels by topic, but—

We don't have resources to teach widely or deeply. Tens of thousands of people could teach scientists and engineers basic computing skills [1], but we have no way to reach them—yet. One of our goals for the next six months/five years is to increase the pool of instructors by several orders of magnitude [2]. Even on a five-year timescale, though, we'll have to continue to rely mostly on volunteers, because—

There's no room for computing in the curriculum. More precisely, faculty won't make room, because they think computing is less important than thermodynamics, phonology, or whatever other subjects make up the core of their discipline. I used to grumble about this, but I now accept that it's a rational choice: unless and until journal reviewers and grant agencies start asking hard question about how scientists produce their computational results, investing time in improving computational skills is a cost with uncertain rewards. And yes, there are a few exceptions here and there, but until we move to five- or six-year undergraduate degrees, they'll continue to be exceptions. Realistically, I think the best we can hope for in the next decade is that computing has the same standing as statistics, i.e., everybody has to know the basics because their other work depends on it, but more advanced knowledge is acquired on a discipline-specific need-to-know basis.

Follow-through is hard. OK, so you just spent a couple of days at some kind of workshop: what now? If you're lucky, you learned enough about Python or the shell to start automating a few data analysis tasks, so a positive feedback loop will kick in. But if the problem in front of you is to speed up 80,000 lines of legacy C++, those two days probably aren't going to make a big difference. Yes, there are a lot of tutorials online that are supposed to help you, but in practice, you'll probably find those more frustrating than anything else they assume a lot of background knowledge you don't have, so you're not sure which ones actually move you closer to your goal. The proposed Computational Science area at Stack Exchange might help here, if it takes off, and we're hoping that running lesson-a-week online classes after workshops will help too, but it will always be hard for people to find time for "deep" learning, which is precisely what will make the next problem they run into easier to solve.

Most of today's online teaching tools implement bad models of teaching. We've known for decades—literally, decades—that watching a video and doing some exercises is a lousy way to teach (see recent posts by Frank Noschese and Scott Gray for discussion). In programming terms, the root of the problem is that canned instruction assumes the teacher can accurately predict how learners are going to interpret and mis-interpret lessons—in software engineering terms, it's plan-driven rather than adaptive. In practice, different learners will mis-interpret lessons in different (and hard-to-predict) ways; in order to be effective, teaching needs some sort of agile feedback loop to correct for this, but that's exactly what most approaches to web-based teaching take out of the equation [3].

So, is it hopeless? Of course not: over the next six months, and (hopefully) the next five years, I believe we can make real progress on several fronts. We can certainly recruit and train more workshop organizers and instructors, and experiment with different kinds of online learning to see which will make follow-through easiest and most effective (which in turn depends on us coming up with ways to assess the impact we're having). If you'd like to help, please get in touch.

[1] I get "tens of thousands" by taking a million competent programmers, multiplying by 10% (the proportion who can teach), and then multiplying by 10% again (the proportion who might be interested). Your made-up stats may vary.

[2] The other reason this has to be a priority is that our learners' needs are as diverse as their backgrounds. Our learners want to jump straight from "what's a for loop?" to "how do I detect glottal stops in lo-fi audio?" or "how do I visualize turbulent flow of interstellar gas?" We're never going to be able to cover these with just a handful of content creators.

[3] Note that I'm using "online" to mean recorded and/or automated, i.e., things that learners can do when they want. Other approaches that deliver traditional lectures or seminars over the web synchronously and interactively are a bit better, but don't scale: no webinar system I've ever seen gives the instructor the kind of feeling for the room that s/he'd get in a regular lecture hall.