Home> Blog> Of Templates and Metadata

Of Templates and Metadata

This post originally appeared on the Software Carpentry website.

As an appendix to the splitting the repository post, Greg recently posted a straw man template for how lessons might be structured after the repo split. He followed up after with more details. There a lot of good ideas there on how we can encourage good structure for lessons and help learners and instructors alike going forward.

First, To assist in the production of workshop websites and to better define the relationship between them, lesson repositories should contain some metadata. YAML is a widely-adopted and reasonably flexible format for storing metadata in files: we're already using it as part of our existing Github-Jekyll workshop and site hosting. The file index.md is the sensible place to look for a lesson's metadata, as its the first thing people are writing and it should therefore be populated early in writing.

YAML headers on the top of the lessons would look like this:

---
title: "Beginner Shell"
authors: [Gabriel A. Devenyi, Greg Wilson]
---

Next is the question of what kind of metadata we want to include. The title of the lesson is essential since its not explicitly the name of any of the files. The list of authors of the material could also live in a YAML header, although there has also been discussion of extracting such information directly from the Git history. (Relying on the Git history would also avoid the problem of figuring out how large a change qualifies someone for being listed as an author.)

There have recently been discussions about recording and reporting the time required to teach lessons. Including the average in the metadata would allow someone constructing a multi-lesson workshop to determine if they have time to present all the material.

With the breakup of the lessons repository into smaller chunks, and the proliferation of intermediate and alternative lessons it would also be useful to specify dependencies for a given lesson. The exact structure for this is tricky, since we have to strike a balance between what's useful and how much effort is required of authors. Options include:

  1. the URLs of lessons that this one depends on
  2. keywords identifying the concepts this lesson requires people to know beforehand
  3. a long-form human-readable description of what learners need to know beforehand.

The first probably won't work for us because we expect to have several lessons covering the same topic, i.e., an introduction to the shell for astronomers and physicists, another for life scientists, and a third for economists. These will probably vary primarily in the examples they present, rather in the concepts they cover, so any of them could be used as a pre-requisite for a shell-based lesson on version control. The second requires us to agree on terms in order to be truly useful; judging from the history of the Semantic Web, that's unlikely. And while the third is probably easiest, it's also the hardest for software tools to work with: we wouldn't be able to check that a particular sequence of lessons hangs together without some natural language processing, and even then it probably wouldn't be reliable.

So here's what the YAML template might look like for a lesson:

---
title: "Beginner Shell"
authors: [Gabriel A. Devenyi, Greg Wilson]
presentation-time: "2h"
preq: [http://github.com/user/repo/tree/commitid, http://github.com/anotheruser/anotherrepo]
---

The dd-slug.md files may also contain YAML metadata, perhaps similar bits such as the title and time estimate, or authors. Having such data would allow further processing programmatically.

Tying this all together with the Makefile that Greg proposed, we can construct a workshop that includes lessons from a number of lesson repositories, check dependencies, and construct a nice site.

Finally, what about the glossary.md and reference.md files mentioned in the template? The terms defined in the glossary could be used as a specification of what this lesson talks about in place of keywords in index.md, but it's redundant to have both. The reference guide is similarly redundant—we can point people at any number of online references written by other people—but we do need something, since learners tell us after almost every workshop that they want a cheat sheet of some kind.