User Stories
This post originally appeared on the Software Carpentry website.
One of the tricks I teach my undergraduates is to create fictional personas to describe the intended users of a system—or in this case, a course. Here are three of the "people" I've had in mind while developing Software Carpentry to date; my goal is to update these stories to better reflect how scientists work today.
Bhargan Basepair
Bhargan Basepair received a B.Sc. in biochemistry five years ago. He has been working since then for Genes'R'Us, a biotech firm with labs in four countries. He did a Java programming course as a freshman, and a bioinformatics course using Perl as a senior.
Bhargan and his colleagues are developing fuzzy pattern-matching algorithms for finding similarities between DNA records in standard databases. To help other Genes'R'Us researchers, and to test his group's heuristics, Bhargan runs an overnight sequence query service. Researchers email sequences in a variety of formats (in-line, attachments, URLs to pages behind the company firewall, etc.). Bhargan saves them in files called search/a.in
, search/b.in
, and so on, then edits them to add query directives. He is very conscientious, and almost never accidentally overwrites one query with another.
Before leaving at night, he runs a Perl script that processses these inputs to create output files with matching names like search/a.out
. When Bhargan comes in the next morning, he pages through his mail again, sending .out
files to the appropriate people. (He almost never sends the wrong file to the wrong person.) He then uses another Perl script to copy all the input and output files to a directory with a name corresponding to the date, such as 2009-07-23
. He and his colleagues would like to do statistics on these saved queries and results to see how well their algorithms are doing, but have never found the time.
This course will teach Bhargan how to automate his overnight service by writing simple scripts to retrieve, process, and reply to email queries. Those scripts will automatically record queries, results, and other data, and produce a daily summary of the performance of the pattern-matching algorithms.
Helen Helmet
Helen Helmet, a Ph.D. student in mechanical engineering, is currently doing a six-month internship at an engineering firm designing carbon-fiber helmets for firefighters and other emergency service personnel. Her undergraduate courses included an introduction to scientific computing using MATLAB, a robotics course using C, and a numerical methods course that also used MATLAB. She taught herself Fortran during a co-op placement between her junior and senior years, and used it again in a graduate course on finite elements.
Helen's task is to model the non-combustive thermal degradation (otherwise known as "melting") of candidate materials. Her starting point is a 14,000-line program her supervisor wrote a decade ago. After deciding that there isn't time to re-write it in C++ (which she would like to learn), she comments out the calls to the mesh deformation routine in the main loop and begins to write a replacement. She sometimes deletes what she has written and starts over three or four times before she is satisfied.
Helen tests her program by writing the total heat content of the mesh at each time step to a file. She then loads this data into MATLAB to graph the percentage differences between these values and the ones produced by the original program for six sample problems. In one case, the difference grew as large as 30\% by the end of the simulation. Helen added \code{write} statements to her program to display values until she managed to convince herself that the difference was due to a bug in the original subroutines.
Helen keeps a to-do list on her home page. Every two or three days, she updates this list to show the progress she has made. She keeps completed tasks on the page until the end of the month, when she writes a short status report for her supervisor.
This course will teach Helen to design software before she starts typing, and that there are better ways to manage code evolution than commenting out one section, and replacing it with another. She will also learn more effective testing and debugging procedures, and how to use a version control system to ensure that she can back down to an old version of code when she needs to. Finally, she will be shown how to use an issue-tracking system to manage her to-do list, and how to write a small script to generate his monthly progress report automatically.
Stefan Synthesis
Stefan Synthesis is a graduate student in chemistry who is working as a lab technician to help cover his costs. His only programming experience is a general first-year introduction to computational science using Python.
Stefan's supervisor is studying the production of fullerenes (also known as "buckyballs"). Each set of experiments involves 100 different reactant mixtures, 20 different temperature regimes, and 5 different pressures. Using a machine built by a collaborating lab, Stefan can run all the mixture and temperature combinations at once, so that the output of each experiment is five files containing 2000 lines of data each.
The controller for the experimental machine writes these files to Stefan's workstation approximately an hour after the experiment begins. To analyze them, Stefan opens them with Excel, copies and pastes to merge the data into one spreadsheet, then creates a chart using the chart wizard. He saves the chart as a PNG file on the group's web site, along with the original data file.
Two or three times a week, Stefan receives results from his supervisor's collaborators. He creates charts for each, which he uploads to the web site, then merges summary statistics into a master spreadsheet.
This course will teach Stefan how to automate the process described above. More importantly, it will teach him how to track the provenance of the data he is working with, so that scientists in his group and others can trace backward from the final charts to the raw data they represent.