Home> Blog> Teaching Librarians in Montreal

Teaching Librarians in Montreal

This post originally appeared on the Software Carpentry website.

Preston Holmes, Jessica Hamrick, Luke Lee, and I helped deliver a Software Carpentry bootcamp during the PyCon sprints in Montreal in April 2014. The audience consisted of roughly 35 librarians coming mostly from the Montreal area.

Planning for this bootcamp was daunting. I had some experience teaching at Software Carpentry bootcamps (as did Preston and Jessica) but our material was almost exclusively directed at graduate students in science, not librarians. On top of that, the instructors were all scientists, so choosing appropriate motivating metaphors was difficult for us. We each spent some time prior to the bootcamp struggling to figure out appropriate materials we could use for an audience of librarians. As always, it was difficult to prepare to teach without a strong sense of what the students know already. We considered constructing examples using Open Access bibliographic data sets and using pymarc to process MARC records. We also considered scraping HTML or XML files as an example use case that librarians would find motivating.

We taught the shell in the morning of the first day. We went fairly slowly discussing the basic model of interaction with a computer through the shell, standard file/directory commands, working with text editors, closing with a little bit of material on pipes, redirection and combining tools into scripts. We did not get all that far; in particular, we found ourselves trying to tie together a few commands into a script file but this was largely lost on the audience. We touched briefly on pipes and redirection, but, by and large, we didn't say much.

The librarians, for the most part, had little experience working with command-line user interfaces and programming (although they were very comfortable with boolean operators and search queries). Actually, the feedback we received seemed to indicate that helping the participants set up a notional model of how files and directories work and what the shell actually does was one of the best features of the bootcamp for many of the them.

In the afternoon of the first day, we started going through the basics of Python. The pace was quite fast starting from basic data types, lists, for loops, and going into using modules and writing or running scripts in Python versus interaction with the IPython shell. We avoided the IPython notebook due to set up issues and confusing learners with the model of execution. To close the day, we gave the learners an exercise to construct a Python script using command-line arguments.

We asked for feedback at the end of the first day. There was an overwhelming consensus that we needed to slow down and to allocate more time for hands on stuff. There was confusion in what happens when one is using the bash shell versus the IPython shell or the generic Python shell. In switching between these, we were losing some of the people. In retrospect, our expectations of how quickly the audience could internalise and apply programming concepts were far too ambitious.

In response to the feedback from day 1, we recapitulated most of the ideas in the morning of day 2 (pointing to the Software Carpentry website for more resources). Refreshing the material on the Unix shell went quickly because the participants seemed comfortable with most of that. We did spend some time describing our own mental processes when running distinct shells concurrently. In revisiting Python, we discussed lists again with methods and for loops much more slowly and in more detail (using slides from V4 lessons to illustrate). We initially intended to spend only half an hour doing a recap; instead, we spend most of the morning going till just 45 minutes before the lunch break.

The rest of day 2 was spent on a single collaborative exercise. The participants had asked for more time for hands on work so this seemed like a good approach. Together, we built a Python script to address a brilliantly simple use case that Jessica dreamed up during the morning. Jessica had manually transcribed data from an image of a library circulation card into a text file. The text file had a two-line header (the Title and Authour) followed by rows of due dates when the book was due back. The dates were inconsistent but only in three different ways. That is, they were given in the format Month-Day-Year separated by spaces. The Months were all expressed in three character abbriviated form. The dates were inconsistent but only in three ways: the year was either four digits (e.g., 1962) or two digits (e.g., 62) or two digits preceded by an apostrophe (e.g., '62). The dates also ranged only between the 1950s and 1960s (so no Y2K issues).

In hindsight, Jessica's reduction to a data set corrupted in limited ways was the smartest choice. We were making matters too complicated for novices in playing with MARC files or more complicated tasks. In reducing a feasible use case of cleaning a meaningful dirty data set into one that is cleaner, we were able to construct a lengthy script incrementally. Logical questions arose about more complicated corruptions (e.g., YYYY-MM-DD vs. MM-DD-YY vs. DD-MM-YY, etc.) but the audience was satisfied with hearing that is more advanced (i.e., requiring regular expressions) that we can extend this script to deal with later.

In finishing up before lunch of day 2, we started developing the script explaining at the same time how to do file I/O. This dovetailed well with the earlier description of files in the Unix shell and how to navigate directories. By lunch, we had a working script that opened the file, loaded its contents into a list, closed it, and printed out the list.

At this point, we had lost Luke and Preston leaving Jessica and I to cover for the rest of the afternoon. Over lunch, Jessica and I discussed strategy. We had the idea of using this script to motivate version control with git coupled with incremental development. This also worked really well since, rather than introducing git in the abstract, we had a concrete problem that the audience had already engaged with.

After lunch, we made sure everyone had git installed before returning to the script. There were some installation headaches (the latest git binaries for Mac didn't work on all hardware). I tried to trouble-shoot this but was not much help. In fact, one of the librarians, being persistent, figured out which git binary was appropriate, posted a link on the etherpad and, before long, most of those who had struggled with getting git installed on their Mac had it running (this was independent of my fumbled attempts).

With git running, Jessica was building the script at the front of the room and we jointly guided the development with frequent commits, explaining the process. There was the usual headache of explaining the syntax of git, but having spent enough time on the shell beforehand, the audience could cope. With each change, we kept the entire group in sync. Occasionally, I would check out to see where I could help someone who couldn't get it right (usually a line miscopied).

At one point, we had a teachable moment: two of the participants accidentally overwrote the data file with an empty file. They had both made the same copy-and-paste error in using the same file for input and output. Fortunately, we had already introduced version control with git! We got everyone to repeat the same mistake so that they overwrote their input file with an empty file. Once we all verified that we had erased our data, we recovered the backup from the repository using git checkout. This actually reinforced the value of version control for backup as well as incremental development.

We went straight to the end of the day working on this single script (that was about 80 lines long at the end including comments). The audience was incredibly engaged and every single person left in the room got it working! This was a new experience for me (with almost 20 years experience teaching at the post-secondary level) and it felt fantastic. As an academic instructor, it is embarrassingly easy to fall into the trap of trying to cover too mucgh content. What happened at this bootcamp is that we didn't actually over much content. My feeling, however, is that the participants collectively got enough of a meaningful learning experience that they could manage on their own from then on. Librarians in general are pretty good at working in the gaps between disciplines and are pretty determined to figure things out; what I learned from this experience is how to use their strengths constructively.