Never Mind the Content, What About the Format?

This post originally appeared on the Software Carpentry website.

I'm still gnawing on the problem of how to construct content for 21st Century learning—or, more prosaically, what I should use to build the next version of Software Carpentry. My starting point is the need to serve several different kinds of users [1], whose descriptions I have moved to a separate post on learners and their needs.

Textbook: big blocks of prose in some narrative order, with pictures, either printed or electronic, read at the learner's pace, alone.
- Zuzel likes this.
- Yeleina doesn't.
- Xanthe uses content out of order via the index or search bar.
- Wafiya remixes content from several textbooks to create lessons (by photocopying, merging PDFs, or whatever). Like Zuzel, she has read content in order, but like Xanthe, she mainly uses the index now.
- Veronique has thought about writing one, but (a) doesn't think she has that much to say about any single topic, and (b) is put off by the effort that would be required.
- Note: the comments below about the difficulty of copying, pasting, and altering also apply to electronic textbooks, as do the proposed remedies.
Static slideshow: a page-by-page dump of a PowerPoint deck, possibly accompanied by a transcript of what the lecturer would say when delivering it.
- Zuzel uses this as if it were a badly-written textbook, with the transcript as the prose and the slides as diagrams.
- Yeleina finds it distracting to switch attention back and forth from slides to transcript.
- Xanthe searches the transcript to find what she wants, then curses because her search engine can't "see" the text in the slides. She also hates the fact that she can't copy and paste the code in the slides (since they're PNGs embedded in a web page).
- Wafiya remixes this content like any other. She's too polite to curse, but she finds it tedious to re-type the code that's shown in the slides (but isn't duplicated as text in the accompanying transcript). She's also finds it wearying to have to re-do diagrams: since the slides are PNGs, it's difficult for her to copy part of a slide, move its elements around, and add a few of her own.
- Veronique doesn't create material in this format because she thinks it's old-fashioned and not useful.
- Note: source code can be made available as copy-and-pasteable text directly in the page, or for download; diagrams can similarly be made available as SVGs to facilitate remixing. Doing either currently requires considerable extra work on the part of content creators.
Voice-over slideshow screencast: a video recording of the slides (as they would appear on screen in a lecture) with someone speaking over them, and subtitles.
- Zuzel ignores the video and reads the transcript as if it were a static slideshow. If a transcript isn't available, she (reluctantly) watches the video.
- Yeleina prefers this to a static slideshow, but prefers the doodling screencast described below even more.
- Xanthe hits the "back" button as soon as she realizes it's a video (unless there's a transcript, in which case she curses because she can't copy and paste code out of a video).
- Wafiya directs students like Yeleina to these, but finds them harder to remix than other formats.
- Veronique thinks this format is also old-fashioned and not useful.
- Note: I'm assuming the subtitles are duplicated as a transcript, or available in some other searchable form. I'm also assuming that code is available for donwload or duplicated in the page for coying and pasting, though all of this requires extra work.
Voice-over doodling screencast: a Khan Academy-style recording of someone doodling on a tablet or coding live.
- Zuzel treats this like a slideshow screencast.
- Yeleina likes this format a lot, particularly if she can add comments at specific points and see her peers' comments.
- Xanthe has mixed feelings: she dislikes explanations delivered this way, but frequently watches "how to" videos, since they're more likely to be accurate and complete than written descriptions.
- Wafiya treats these like slideshow screencasts.
- Veronique creates these fairly regularly: they're easy to do, and easy to re-do when systems change or she discovers a mistake.
- Note: I'm making the same assumptions about transcripts, code, and diagrams as above.
Recorded whiteboard lecture: someone with a camera has recorded someone giving a lecture in a lecture hall, and spliced that with whatever was on the lecturer's screen.
- Zuzel treats this like any other screencast.
- Yeleina prefers this to doodling screencasts because she can see the speaker's body language.
- Xanthe treats these like any other screencast, i.e., she'll use it if there's a searchable transcript and things to copy and paste, or if it's a recording of a live "how to" coding session.
- Wafiya treats these like slideshow screencasts.
- Veronique doesn't create these, partly because of the setup required, but also because she doesn't think seeing her adds value—the lesson's supposed to be about the stuff.
- Note: I'm assuming an electronic whiteboard, since video of someone writing on an actual whiteboard is usually illegible.
Radio drama: a voice-only podcast-style presentation.
- Zuzel ignores the audio and reads the text transcript.
- Ditto for Yeleina.
- Ditto for Xanthe.
- Ditto for Wafiya.
- Veronique doesn't create these.
- Note: but for Ursula, who is blind, this is the only format—all the others fold into it. She doesn't need code samples as text for copying and pasting: she needs them so that her screen reader can tell her what's that code contains.
Star Wars: high-quality video with custom animations, cut scenes, and other special effects.
- Zuzel watches these sometimes, but doesn't learn any more from them than she would from a slideshow.
- Yeleina enjoys these, which means she pays more attention to them, which means she learns more (but no more than she'd learn from an engaging lecturer).
- Xanthe doesn't see the point. Unless something blows up.
- Wafiya likes their high production values, and remixes the special effects segments frequently.
- Veronique can't afford to produce this kind of material.
- Note: again, I'm making assumptions about transcripts, copy-and-pasting, etc.
Write your own adventureexploration: typically a set of connected ideas or challenges with explicit dependency information (i.e., you should/must learn A and B before tackling C).
- Zuzel finds the lack of narrative difficult.
- Yeleina enjoys these if each node in the graph is in one of her preferred formats. She enjoys them even more if she is exploring with peers.
- Xanthe ignores the ordering and searches for what (she thinks) she needs. If content is locked down—i.e., if the system won't let her see or search C until she's "completed" A and B—she writes an angry tweet and moves on.
- Wafiya likes this format for several reasons, but only if everything is always visible. First, it tells her how other teachers think ideas connect (something that is missing or out-of-band for other delivery formats). Second, it's easy to remix: again, providing it's open, she can reorder things as she thinks best for particular learners.
- Veronique would like to do this, but has discovered that creating the metadata about dependencies and recommended paths is as hard as writing a textbook.
Wander aroundexploration: lots of little snippets, but no explicit dependency information.
- Zuzel finds this even more difficult.
- Yeleina likes this less than the "write your own adventure" format: she thinks it's no different than just using Giggle to find things.
- Xanthe likes this because it's just like using Giggle searches. In fact, she uses every other format as if it were this one.
- Wafiya feels the same way as Yeleina: she likes having stuff to remix, but she has to do that remixing before this material is useful to those of her students who aren't as independent as Xanthe.
- Veronique creates content like this almost without realizing it by answering questions at Stuck Underflow.
Jam session: a bunch of learners in a room working through material simultaneously.
- Zuzel doesn't like it: it's too noisy for her to concentrate, and she can't go back at her own pace to review.
- Yeleina thinks this is the best... thing... ever.
- Xanthe is Giggling for information as soon as the presenter tells people what the topic is, but will stop and watch carefully when the presenter is typing live on screen.
- Wafiya can only book space to do this occasionally, and even then, she doesn't enjoy improv teaching.
- Veronique enjoys doing this—she's volunteers with a local free-range learning group—but can only find time once every couple of months.
- Note: in theory this can be combined with any of the formats above. In practice, it's almost always short, live lectures interspersed with hands-on practical work.
Personal tutoring: one-to-one instruction, a.k.a. "pair learning".
- Zuzel doesn't mind this, but she really does prefer books...
- Yeleina actually prefers jam sessions, since they tend to be more lively.
- Xanthe likes having someone available to answer questions on demand, but is happy Giggling on her own for most of what she needs.
- Wafiya wishes she could do this with every one of her learners, but there simply aren't enough hours in the day. The personalized lesson plans she draws up are the closest approximation she can manage.
- Veronique does this a lot, but is frustrated that it doesn't scale—she really wants to help more than one person at a time.

I'm sure some of the above is inconsistent or just plain wrong, but here are my takeaways:

Different people want content in different formats. Yeah, OK, we knew that already, but:
Everybody needs first-class content, in the programming sense of the term. In practice, it means that every kind of content can be copied and pasted without losing its meaning. A bunch of colored pixels in an image that look like letters aren't actually letters; if you copy a region of an image and paste it into a text editor, you don't get the text [2]. Similarly, search engines like Giggle can't "see" code evolving line-by-line in a video, so you can't search for that. Together, I think that point #1 and point #2 imply that:
We need model-view separation in learning content. I apologize for the computerese, but I don't know any other way to say it. A model (more fully, data model) is how information is stored, while a view is how people interact with it. Models should be designed to be easy for computers to work with; views should be designed to meet human needs, and the plural there is important: different people want to interact with information in different ways, and even a single person may want to use different ways at different times. Search engines want the information that's in the model, such as the captions on the boxes in a diagram, not some arbitrary view of it (like a bunch of pixels in a PNG). People usually want that as well when they're remixing, since their goals are to combine that information with information from other sources, and/or to present that information in different ways (i.e., views).
We also need first-class metadata. I haven't been able to find a standard format for summarizing and exchanging lesson objectives, learning dependencies, and everything else needed to stitch individual facts together. The closest thing seems to be SCORM, but I'd rather stick a fork in my eye [3]: it's bloated, it mixes data models with meta-models with presentation layers with everything else its authoring committee could think of, and did I mention the fork? I could provide metadata as data, e.g., put a point-form list at the top of a lesson saying, "Here's what you need to know before tackling this," but that mixes model and view: since it's just a convention, computers will have a hard time stitching things together accurately.
Finally, we need social learning. Even the Zuzels of this world learn best in collaboration with other people: peer learners are often better at understanding and clearing up misconceptions than instructors, and having a "running partner" helps people stay focused and motivated. This isn't really a matter of format, though, but of the tooling used to deliver content, so I'll skip over it below.

OK, so how well do today's tools and/or formats do by these measures? The fact that "PowerPoint" is both a tool and a format is one indication that the answer is going to be, "Not well."

Plain textis highly searchable and remixable, plays nicely with accessibility aids (i.e., screen readers), and runs everywhere, but:
- it doesn't do diagrams (unless you count ASCII art);
- it doesn't directly support metadata (except by convention); and
- it doesn't separate models from views
HTML is also highly searchable and remixable—until you start doing dynamic updates with Javascript, at which point today's search tools (and accessibility aids) can't keep track of what's going on. Unlike text, it provides standard ways to include other media, so we'll delay discussion of images and video. And while it doesn't offer a standard way of providing metadata out of the box, HTML5's custom data attributes were designed with exactly this kind of use in mind. And modern HTML partially separates models from views: I can use CSS to tell the rendering engine (e.g., a web browser) to display things differently for different use cases.
DocBook, LaTeX, and wiki text separate models from views even more than HTML does. What's in the file is a description of content, information about the content, and just enough formatting to make things pretty when viewed in specific ways, e.g., "Break the page here to avoid an orphaned line." Diagrams and metadata can be handled the same way as it is for HTML; in fact, I can't see any advantage these formats have over modern HTML any longer [4], so I'm going to take them off the table.
PNG and other raster formats: fail the searchability and copy-and-paste tests.
SVG and other vector formats: do better. Since (some of) the content and relationships are explicit, search engines can find things in SVGs, and you can actually select and copy a box or an arrow, rather than a region of pixels. It only goes so far—Visio-style information about "this arrow connects the box labeled A to the box labeled B" is mostly implicit—but it's better than raster. I've seen people do entire lessons as a series of SVGs, or as one large SVG with progressive reveal; I'll talk about this more below.
PowerPointand its kin: model, view, and authoring tool are inextricable from one another. You can copy and paste things, and modern search engines understand the format well enough to index textual content, but metadata is just a convention, and remixing takes a lot of work (even if the version you have is the original, rather than an exported ZIP file containing an HTML page that references PNG representations of the slides). That said, authoring rich presentations is easier than it is with HTML+SVG:
1. You use the same tool to create textual and graphical content, rather than having to switch between tools and stitch content together.
2. You can connect textual and graphical content, i.e., you can draw a circle around a word in one of your bullet points, then connect it with an arrow to a particular box in a diagram, just as you would when writing freehand on a whiteboard. This is what HTML-based slideshow packages lack: right now, they force authors to segregate text and graphics, which I view as a throwback to the era of hot metal typesetting.
The fact is, most presenters continue to use PowerPoint (or something similar) because it makes it easy to create a reasonably good presentation in a reasonable amount of time [5]. HTML slideshow packages fail this test: authors must sacrifice the quality of the presentation (e.g., skip graphics, or embed segregated graphical files), and do a lot of non-content typing (tags, page IDs, and so on).
Video: fails all the "first-class content" tests [6], and isn't effective [7] unless:
- authors have the resources to produce Star Wars-quality content [8], or
- they're showing learners how to do something, like dissecting a frog or using a debugger.

So after all of this, what do I actually want?

I want content stored in HTML5 with purely semantic markup, so that it can be searched, copied and pasted, and styled for presentation in a variety of ways [9].
I want an agreed-upon meta and data-* vocabulary for educational metadata, like dependencies, introduction of key terms, questions and answers, and so on. I want a similar vocabulary for commenting and other social interactions that plays nicely with things like the Salmon protocol.
I want an authoring tool (note the singular there) that lets me:
1. write and draw WYSIWYG instead of typing in tags and IDs;
2. freely mix drawings and text; and
3. manage parallel streams (or channels), so that I can keep slide content, presenter's notes, prose, and translations of all three into other languages together.
I want to be able to animate my drawings and text, which is emphatically not the same as "embed video" (though I may want to do that too). Instead of recording the pixels drawn on the screen as I type Python into an editor, I want to record and play back the text that's being created, so that learners can pause the animation, copy the text, and paste it somewhere else. Equally, instead of painting pixels to fool your eyes into believing that a box just moved off the screen, I want to move the damn box; once again, if you pause the animation, you should be able to click on the box, attach a comment to it, paste it into your own drawing, etc.

Freeling mixing drawings and text feels like it ought to be doable today: we could either put the text in blocks inside a canvas element, or layer a transparent canvas over the page and dynamically resize it. Anchoring drawings to the underlying text (e.g., keeping the arrow from a term to the corresponding bit of the diagram in the right place) is "just" Javascript (for some value of "just"). Making it all WYSIWYG is just more Javascript [10].

But animation... Ah, that's a big one. It's an intrinsically hard problem, but canned effects can do a lot to put simple things within reach [11]. The big question is, how far do we push it? If I want to show you how to use a debugger, or how to draw something with a painting program, I can't re-create the whole UI—I'm going to have to record pixels off a screen.

Or am I? I know this is never going to happen—we're not that organized a species—but just imagine what the world would be like if every interface was built using HTML5 and CSS. Any tool at all could export widget descriptions and a semantic trace of what they did (i.e., "the file menu was pulled down" rather than "the cursor moved to pixel (132,172) and the user clicked"), and any other tool could consume it and play it back. The consuming tool might draw the widgets differently, or display the interactions in its own way, but that would be exactly the same as applying a different skin to the original tool [12].

Returning to this universe for a moment, we can store things as HTML5 right now—I'm already using it for Version 5 of Software Carpentry. I could create a vocabulary for instructional metadata, but I'm not an information architect. WYSIWYG authoring tools for HTML5 abound, though the HTML5 they produce can be idiosyncratic (and doesn't play nicely with version control, but that's fixable). I haven't seen a WYSIWYG tool that supports freehand drawing mixed freely with text, or one that supports parallel content streams, but I think half a dozen people working could deliver something substantial in half a dozen months [13].

As for animation, I think we're stuck with video for now: prototyping an HTML5/SVG/Javascript animation framework for use in a learning tool would be a great research project, but we really do need to build a couple to throw away to find out if it's workable. If you'd lke to tackle it, please let us know—I'd be happy to be your alpha tester.

Notes

[1] There was a lot of talk in the 1980s and 1990s about different people having different learning styles, inspired in part on Gardner's theory of multiple intelligences. The idea has mostly been discredited, but like many memes, it lives on in popular culture.

[2] Although I bet someone's working on an Emacs mode to do that...

[3] I've actually done this, so I know whereof I speak.

[4] Except that LaTeX and wiki text require slightly less typing than HTML, but if you're using a smart editor, even that advantage goes away.

[5] Please don't quote Tufte's complaints about PowerPoint at me—I don't think it encourages bad presentations any more than the tangled rules of English spelling and grammar encourage bad writing.

[6] In particular, almost all video content makes life harder for the visually impaired: a screencast in which someone talks over themselves typing in an editor or sketching on a tablet is tantalizing but useless to someone who can't see the pixels. I committed this sin when I created Version 4 of Software Carpentry; I'd like to do better in Version 5, and would like to see high-profile online learning sites make some kind of effort as well.

[7] But wait a second: if video isn't effective, why do MIT Open Courseware and the Khan Academy work so well? The short answer is, they mostly don't: if you take out the 15% of people who can learn almost anything, no matter how it's presented, watching videos and doing drill exercises works less well than other options. The longer answer is, watching a good teacher (and Khan is a great teacher) work through a problem, instead of just presenting the answer, moves the content into the "how to" category that video is well suited to.

[8] Research dating back to the early 1990s shows that higher-quality material improves student retention. I don't know whether it improves it enough to justify its higher production costs, though.

[9] HTML5 will also help with version control, since I expect HTML5-aware diff-and-merge tools to start appearing Real Soon Now. Of course, I've been saying that for almost ten years...

[10] These days, you can wave away almost any technical objection with "it's just more Javascript".

[11] In my mind, the animation interface looks more like Scratch than it does like PowerPoint's menus and dialogs. It definitely doesn't require people to type in code, unless they want to create and share an entirely new kind of animation effect.

[12] We could even call that format XUL...

[13] "6×6" is as big a team/timescale as I'm able to contemplate these days.