Knocking on the Future's Door

This post originally appeared on the Software Carpentry website.

Once again I feel like I'm knocking on the future's door but nobody's answering. The task we set ourselves seemed simple: produce a nicely-formatted PDF of the Version 5 lessons to give learners as a reference (and to print as a book to give instructors when they finish their training). Fifty years after the creation of the first computer typesetting systems, you'd think this would be easy. It's not, and the reasons why highlight yet again why so many scientists would rather keep playing the kazoo than learn to play the violin.

Our starting point is this Git repository. In it, you'll find a directory called novice that contains a mixture of IPython Notebooks (.ipynb) and Markdown files (.md). We use notebooks for our lessons on Python because that's what we teach with; We use Markdown for things like our lesson on Git. (We used to use HTML, but people thought Markdown would be simpler to edit, diff, and merge.)

Our Makefile turns this all into the notes you see online by converting the notebooks to Markdown, and then converting those Markdown files, and the files actually written in Markdown, into HTML. We convert notebooks to Markdown rather than converting them directly to HTML so that we only need to maintain one template file for our website (the one describing the Markdown-to-HTML conversion) rather than two. Our hope was that we could then convert either the Markdown or the generated HTML to LaTeX, and compile that produce our PDF.

This ought to be simple. IPython comes with a tool called nbconvert that uses another tool called pandoc to translate .ipynb files into other formats, and pandoc can be installed and used directly to translate Markdown to other formats as well. Together, those tools get us most of what we want—most, but not all.

For example, we want to clearly distinguish user input from computer output. Notebook cells have this information, and the "Markdown" generated by nbconvert helpfully retains that information as a div with an appropriate class:

<div class="in">
<pre>weight_kg = 55
print weight_kg</pre>

<div class="out">

We want the input and output blocks in lessons that are written in Markdown to have the same classes, but there's no syntax in standard Markdown for putting classes on pre-formatted code blocks. One hack is to use an extension in the Kramdown parser to wrap the block in a div:

<div class="in" markdown="1">
weight_kg = 55
print weight_kg

Another is to rely on its support for the "PHP Extra" dialect of Markdown and do this:

weight_kg = 55
print weight_kg

which is less cluttered. The problem is, these classes aren't translated into LaTeX when we convert to PDF, so all of our pre-formatted blocks come out looking the same.

As another example, our notes include a glossary (as every good set of notes should). This is stored in in the repository's root directory, and lessons (both notebooks and Markdown files) link to glossary entries like this:

...tell Git to make it a [repository](../../gloss.html#repository), which is...

which refers to an anchor in that looks like this:

A storage area where a [version control](#version-control) system...

These links are retained correctly in the generated HTML, but are translated into hyperlinks in the LaTeX rather than intra-document references.

We know how to fix these problems, and all the others I haven't bothered to enumerate, but we shouldn't have to. Nothing we're doing is particularly strange—we're hardly the first people in science to want to create a glossary—but we now have to spend several hours (at least) to do something that "ought" to work out of the box.

I can rhyme off half a dozen reasons why what we're trying to do is the "right" way, but most scientists would (quite rightly) respond, "Yeah, but it doesn't actually work." It comes back once again to Glass's Law and the initial productivity dip that comes with any new way of doing things:

If you'd like to help us solve this particular problem, we would appreciate your assistance. If there's a simpler way to accomplish what we want, we'd appreciate a pointer even more: after all, a problem avoided is better than a problem solved. But most of all, we'd like to see more people working to close the gap between what is and what should be.

Dialogue & Discussion

Comments must follow our Code of Conduct.

Edit this page on Github