The Violas of Programming

This post originally appeared on the Software Carpentry website.

Orchestral musicians make jokes about violas and viola players. "What's a string quartet? A great violin player, a mediocre violin player, a bad violin player, and a cellist." Or, "What's the difference between a viola and a trampoline? You take off your shoes to jump on a trampoline." But violas are as essential as they are unglamorous: hardly anyone plays them as a lead or solo instrument, but string quartets just don't sound right without that third voice.

I realized today that Python's sets [1] are sort of like the violas of programming. They come up quite naturally all over the place—just flip through any text on algorithms and count how often they're used. But they're rarely used alone, which makes it hard to come up with well-motivated examples when teaching them. Consider:

  • "What vowels are present in this string?" Sure, but show me an application where that comes up: every one I can think of wants the frequency of the vowels, not just their presence or absence.
  • "Find out whether these photos have some tags in common." Sure, but (a) intersection and union are built in, so it's a one-liner, and (b) you'd almost certainly use a dictionary with photo IDs for keys, and sets of tags as values.
  • Anything with graphs: again, the nodes reachable from X are naturally stored as a set, but the graph as a whole will be a dictionary of nodes to reachable sets.

I didn't worry about this too much in the Version 3 lecture on sets and dictionaries: I used a couple of completely abstract examples (including "which vowels"), and moved quickly into a discussion of how sets are stored and why their values have to be immutable. I'd like to do better in Version 4—I'd like every new tool or technique to be well motivated at the time of its introduction—but I'm damned if I can figure out how.

[1] Disclaimer: As the author of the original Python Enhancement Proposal (PEP) on sets, I have certain biases.

Dialogue & Discussion

Comments must follow our Code of Conduct.

Edit this page on Github