Wrong Is Useful: Lessons as Packages

This post originally appeared on the Software Carpentry website.

"What would Greg do? [pause] OK, now that we've ruled that out..."
— overheard

I wrote a post last July about using package managers like RPM, Homebrew, and Conda to track dependencies between lessons, so that a student could say something like conda install unit_testing and get a lesson on unit testing, along with the code, sample data, and other lessons it depends on. I also mused that it could help make research more reproducible: after all, a paper is just a lesson on something that's never been taught before.

This idea isn't new. Konrad Hinsen wrote about using package management for reproducibility back in 2012, and later about why he decided to go a different route. W. Trevor King has written about it as well, while Rémi Emonet and Raniere Silva built a small prototype last summer.

I'm still not sure whether this is a good idea, and since I've always done what passes for my best thinking when I've got something to fix rather than a blank sheet of paper, I've thrown together a really small demo. I'm sure it's wrongheaded in many ways, but I hope it will help focus discussion by giving people something specific to correct. If you'd like to kick its tires:

  1. Make sure you have Python 2.* installed.

  2. Clone this GitHub repository.

  3. Run make on its own to get a list of available commands.

  4. Run make create to create a distribution file dist/something-0.0.1.tar.gz.

  5. Run make install to install that package in your Python distribution. You may wish to create a virtual environment before doing this so as not to pollute your Python distribution. However, make install writes a list of installed files to installed-files.txt, so you can make uninstall to delete them all.

  6. Once the lesson is installed, lesson view something will open it in your browser. This emulates a learner viewing the lesson locally.

  7. mkdir /tmp/stuff (or some other temporary directory) and then lesson files something /tmp/stuff will copy the lesson's code and data into /tmp/stuff. This emulates a learner getting the sample code and data files for the lesson.

Behind the scenes, installation uses a standard Python setup.py script to create a lessons sub-directory in your Python distribution and then copy the lesson material under there. It also installs a script called lesson in your Python distribution's bin sub-directory. A real system would separate these: people would only install lesson once, and each particular lesson would then be packaged and installed separately.

This little demo doesn't specify any dependencies, so it doesn't install any supporting tools or prerequisite lessons. That would be straightforward to add, but that's another way of saying, "We don't need to think about it right now." What we do need to think about is: how to handle lessons for R, the shell, GitHub, and so on, and whether Python's packaging tools are the right platform for this. I'm pretty sure the answer to the second question is "no", but alternatives are either OS-specific, require more effort at first encounter than most lesson authors will be willing to invest, or both.

The long-term goal of this work is to create something like CRAN, CPAN, or PyPI for lessons. Like those archives, it would require people to package their lesson in a particular way. Once they'd done that, though, their work would be easier to find and use. And as I said at the outset, if we can make this work for lessons, there's no reason we can't make it work for papers. (I for one would have been grateful if I could have run pip install doi://arxiv.org/1111.1111 to get a local, runnable copy of the paper I'm supposed to be reviewing right now.)

Packaging and distribution is a headache and a nightmare and one of practical computing's greatest unsolved problems, but if we want to work through someone's lesson, or reproduce and extend a colleague's research, we have to get the raw material installed somehow. Today's packaging systems pay much less attention to docs than they do to code; I think that making the former a first-class citizen would be an interesting experiment, and I'd be grateful if you could comment on this post to tell me what you think.

Dialogue & Discussion

Comments must follow our Code of Conduct.

Edit this page on Github