What to Teach Researchers About the Web
This post originally appeared on the Software Carpentry website.
One reason I'm reflecting on what I've learned in the last two years is a question that is back on the top of my work pile: what should we teach researchers about the web? Partly, it's a priority because I'm currently embedded in Mozilla; their mandate is to defend and extend the open web, and their educational efforts are all aimed at that, so I ought to be doing something too. The real reason, though, is that a lot of things have brought this into sharper focus recently:
- Audrey Watters' investigation of what and how to teach people about webmaking (summarized in this short talk and the Audrey Test).
- Mark Guzdial's commentary on getting the level right (and everything else he's been writing for the last year).
- Jon Udell's "Awakened Grains of Sand" and "Tags for Democracy" posts (and everything else he has been writing for the last year too).
- Michelle Levesque's thoughts on what Mozilla should teach.
Here's what (I think) I've figured out so far:
- People want to solve real problems with real tools.
- Styling HTML5 pages with CSS and making them interactive with Javascript aren't core needs for researchers.
- All we can teach people about server-side programming in a few hours is how to create security holes, even if we use modern frameworks.
- People must be able to debug what they build. If they can't, they won't be able to apply their knowledge to similar problems on their own.
Jon Udell has summed up the big ideas they ought to know. In concrete terms, we want them to understand
- how to construct (and deconstruct) URLs;
- how an HTTP request/response is processed;
- pass by value vs pass by reference, push vs. pull, structured vs. unstructured data; and
- how a few common security problems arise.
So what can we teach people that meets these goals, and respects our constraints?
- Visualize this: plug an interactive Javascript visualization engine into a web page, show them how to put their data somewhere accessible, and voila: interactive data exploration on the web. This would be fun, but it would fail our debuggability/reproducibility requirement.
- OpenDAP is a framework for sharing the kind of grid-based data that's common in the earth sciences. Setting up a server would be out of reach, but formatting query URLs to pull data from public servers would be within reach, and we could easily run such a server on our site to provide a stable target. My concerns are (a) it's only showing learners half of the equation, and (b) it's not directly relevant to people in genomics and other fields.
- Kynetx (as described in Phil Windley's book The Live Web) is a framework for handling event streams. It's very cool, but it's still very young, and I don't know any scientists who are using it.
- Read dynamic, write static: download data from several sites, merge it, and produce some static HTML pages that other people can then download and merge. This is a common pattern in real life (especially when run periodically by
cron
), and with a little bit more work, we can show people that they only need to download things that have changed. On the downside, it's not really dynamic or interactive, and I want people to see that the web is more than just a bunch of pipes that deliver documents.