Pandoc and Building Pages
This post originally appeared on the Software Carpentry website.
Long-time readers of this blog and our discussion list will know that I'm unhappy with the choices we have for formatting our lessons. Thanks to a tweet from Karl Broman, I may have an answer. It's outlined below, and I'd be grateful for comments on usability and feasibility.
Here's a summary of the forces we need to balance:
- People should be able to write lessons in Markdown. We choose Markdown rather than LaTeX or HTML because it's easier to read, diff, and merge; we choose it rather than AsciiDoc or reStructuredText (reST) because it's much better known.
- People should be able to preview their lessons locally before publishing them, both to avoid embarrassment and because many people compose offline.
- Lessons should be easy to write and read. We shouldn't require people to put div's and other bits of HTML in their Markdown.
- It should be easy to add machine-comprehensible structure to lessons. We want to be able to build tools to extract lesson titles, count challenge exercises, etc., all of which requires machine-comprehensible source. This is in tension with the point above: everything we do to make lessons more readable to computers means extra work or less readbility for people.
- We should use only off-the-shelf tools.
We don't want to have to build, document, and maintain custom plugins for formatting tools.
We do want to use GitHub's
gh-pages
magic. - The workflow for creating and publishing lessons should be authentic, i.e., the way people write and publish lessons should be a way they might use to write and publish research papers.
And here's the proposal:
- We stop relying on Jekyll and start using Pandoc instead.
- Every lesson is stored in a GitHub repository that has a
gh-pages
branch. (GitHub will automatically publish the files in that branch as a mini-website.) - The root directory of that repository contains:
- a
README.md
file with a one-liner about the lesson's content and authorship; - a sub-directory called
src
that contains the source files for the lesson; - the compiled versions of those files; and
- an empty file called
.nojekyll
to tell GitHub that we don't want it to run Jekyll.
- a
- The
src
directory contains all the source files for the lesson, and a simpleMakefile
that uses Pandoc instead of Jekyll to compile those files. Pandoc's output goes in the root directory, i.e., one level above thesrc
directory, and the Makefile makes sure that other files (CSS, images, etc.) are copied up as well. - When an author makes a change, she must build locally, then commit those files to the GitHub repository. Yes, this means that generated files are stored in version control, which is normally regarded as a bad idea. But it does mean we can use Pandoc, which supports a nicer dialect of Markdown than Jekyll on GitHub, and we don't have to worry about compiling files on one branch and committing them to another.
I've created a proof-of-concept repository
to show what this might look like in practice.
It seems to work pretty well,
and I think it satisfies the "authentic workflow" requirement
(though I'd be grateful if others could tell me it doesn't).
The only usability hiccup I can see is that
authors will have to remember to commit the generated files:
my usual workflow of git add -A
followed by git commit -m
only adds files in or below the current working directory,
so I would have to cd ..
up from src
to the root directory of their local copy of the repo first.
One variation on this raised by Trevor King is
to keep the source files in the root directory of the master
branch,
and have the lesson maintainer merge changes into the src
directory of the gh-pages
branch
and do the build.
This frees authors from having to install the build tools—only
the maintainers need that—but on balance,
I think most people will want to preview before uploading,
so the savings will be mostly theoretical.
If you have other thoughts, or can suggest other improvements, please add comments to this post. We'd particularly like to hear from people who aren't Git experts or aren't familiar with HTML templating systems, Makefiles, and the like. Does the workflow described above make sense? If not, what do you think would go wrong where, and why?