Musing About Reorganization

This post originally appeared on the Software Carpentry website.

I'm increasingly unhappy with the organization of this course. On the off chance that funding materializes and we're able to undertake a major redesign, I'd like to explain why and ask for your input.

Right now, our lectures are broken into topics along lines a computer scientist would instantly recognize: basic programming, regular expressions, databases, and so on. That is not how members of our intended audience see things when they first come to us—if it was, they probably wouldn't need this course. They start with problems like:

How do I read this data file?
How can I share my program with other people?
How should I keep track of thousands of input and output files?
How do I save the state of my program so I can restart it?
How can I use the program my supervisor wrote ten years ago to solve my current problem?

Their answers cut across traditional CS divisions: re-using a legacy program, for example, may require basic programming, the shell, systems programming (such as subprocesses and I/O redirection), and some parsing.

The traditional solution is to view this as a matrix, and order topics to get to problems as quickly as possible. If the matrix is:

		Topic
		A	B	C	D
Problem	X	+	.	.	.
	Y	+	.	+	.
	Z	.	+	+	+

then the "best" order for teaching is [A, C, {B, D}]. Of course, this assumes that we know the problems, and how they depend on topics. We had some vague ideas a year ago, and know a lot more now, but there's something else we ought to take into account: the big ideas of computational thinking. For example, the idea that "programs are data" crops up in many different places in this course: a version control system treats the source code of a program as data, while passing a function as a parameter or storing it in a list only makes sense if you understand that runnable code is just bits in memory.

So should we build a matrix of problems vs. principles? Or a cube of questions, CS topics, and principles? I think the answer is "no", because I believe these principles cannot be taught or applied directly. In my experience, the only way to get them across is to come back after learners have been doing things that depend on them and point out the unifying principle.

I therefore think that the next big step for this course is to:

draw up a list of representative computational problems in science and engineering;
figure out what researchers need to know in order to solve them;
build the matrix;
derive a topic order; and
figure out when each principle can be pointed out.

The tricky bit is that when we say "representative problems", most people think in terms of traditional disciplinary boundaries and offer us one fluid flow problem, one gene sequencing problem, and so on. Our notion of representative is different: we're thinking of things like reformatting data files, improving performance, sharing or testing code, and so on.

That's why we need your help. Have another look at the list at the top of this post. What should we add? What problems are you wrestling with, and what have you needed to know to solve them? "How do I use the shell?" is the wrong kind of answer—we want to know what problem you think the shell is the solution to, and why.