Counting Things

This post originally appeared on the Software Carpentry website.

I re-read Robins et al's "Learning and Teaching Programming: A Review and Discussion" [1] and Eccles and Stacey's "Understanding the Parallel Programming" [2] this morning. One of the key takeaways in both is that novice programmers tend to classify problems according to problem domain or language features, while experienced programmers tend to classify them according to solution strategies. Where a beginner sees a problem in molecular biology or statistics, for example, an expert sees an opportunity to apply divide-and-conquer or accumulate-and-adjust.

Knowing that, it's tempting to try a strategies-first approach when teaching programming. That usually doesn't work, though, because the strategies are shaped by what computers can do, and if you don't know that, those strategies seem to appear like rabbits out of a magician's hat.

The idea is still appealing, so as part of the lecture on program design (or perhaps as a separate lecture), I'd like to explore strategies for solving variations on a simple problem as a way to sum up previous material on programming language features like loops and conditionals. The problem I have in mind can be summed up in one word: counting. The variations are:

How many things do I have? (Use len(list).)
How many things do I have that satisfy some condition? (Conditional inside loop.)
What if my things are in a file instead of in memory? (Read the file into a list, then count.)
What if I have so many things that the won't fit into memory? (Read and test one at a time, streamwise.)
What if I'm counting the number of items of each of a set of fixed types? (Use an array of accumulators.)
What if I don't know the types in advance? (Use a dictionary—which is also the better answer for #5 above.)
What if my classification scheme is likely to change frequently? (Pass the classifying function into the counting function.)
What if I only want things up to some stop sign? (Count to sentinel and break.)
What if my test depends on context, e.g., number of times A comes after B, or number of X's inside a region delimited by Y's? (Use flags to turning counting on and off.)
What if my things are in a tree rather than a list? (Recursion.)
What if my things are in a graph that might contain cycles? (Graph-traversal algorithms.)
What if I might have all of the above? (Iterator and visitor design patterns.)
What if my things are in a database? (Select, filter, and aggregate.)
What if I want to calculate an average? (Any of the above plus a count of items.)
What if I want to calculate a median? (Requires different strategies entirely because it depends on global knowledge.)

I don't think students could absorb graph traversal, design patterns, and select/filter/aggregate as asides in a single lecture. (It's already clear from feedback, by the way, that introducing big-oh as an aside in the lecture on tuning programs is asking too much—we'll fix that up soon.) What I'm wondering is, if we've introduced those topics earlier, would a single lecture that showed how changes in the problem determine solution strategy be useful as a summing up? Your feedback would be greatly appreciated.

[1] Anthony Robins, Janet Rountree, and Nathan Rountree: "Learning and Teaching Programming: A Review and Discussion". Computer Science Education, 13(2), 2003.

[2] Ryan Eccles and Deborah A. Stacey: "Understanding the Parallel Programmer". Proc. 20th International Symposium on High-Performance Computing in an Advanced Collaborative Environment (HPCS'06), 2006.