Next Lecture?
This post originally appeared on the Software Carpentry website.
The Software Carpentry course currently contains the following lectures:
- Introduction
- The Unix Shell (2 lectures)
- Version Control
- Automated Builds
- Basic Scripting (bool/int/float, for/while/if)
- Strings, Lists, and Files
- Functions and Libraries
- Programming Style
- Quality Assurance (basic testing)
- Sets, Dictionaries, and Complexity
- Debugging
- Object-Oriented Programming (2 lectures)
- Unit Testing (unittest — should switch this to nose)
- Regular Expressions
- Binary Data
- XML
- Relational Databases
- Spreadsheets
- Numerical Programming (the basics of NumPy)
- Integration (subprocess+pipes and wrapping C functions)
- Web Client Programming (HTTP request/response, URL encoding)
- Web Server Programming (basic CGI processing)
- Security (the weakest lecture of the bunch)
- The Development Process (a mish-mash of sturdy and agile)
- Teamware (introduces portals like DrProject)
- Conclusion (various "where to look next" suggestions)
Between now and Christmas, I want to tidy them up, duplicate the examples in MATLAB, and add some of the content I wrote for "CSC301: Introduction to Software Engineering". Since I won't have time to do everything, I'd like your help prioritizing. Which of the following topics do you think is most important to add? And what have I forgotten entirely?
- Lifecycle: should I split the existing "Development Process" lecture into two, and cover agile methods (focusing on Scrum) and sturdy methods (i.e., longer release cycles, more up-front planning, legacy code). Neither exactly fits scientists' "exploratory programming" paradigm, but they're all we've got...
- Quality: this would expand the "Programming Style" lecture with material from Spinellis's Code Reading and Code Quality to describe what makes good software good.
- Deployment
- Currently based on the patterns in Nygard's Release It!, which focus on designing scalable fault-tolerant applications.
- Should I instead cover the creation and distribution of packages (e.g., RPMs, Distutils, Ruby Gems, etc.)?
- Refactoring: a combination of Fowler's original Refactoring and Feathers' Working Effectively with Legacy Code.
- UML: I devote three lectures to this in CSC301; I don't see any reason to inflict it on scientists.
- Reproducible Research: it's already important, and likely to become more so; it also ties in with "open science", though I'm not sure what I could say about either that wouldn't just be rah-rah and handwaving—tools like Sweave are interesting, but I don't people would be willing to learn R just to use it, and there don't seem to be equivalents (yet) in other languages. The same goes for data lineage: it's an important idea, and there are plenty of research prototypes, but nothing has reached the "used by default" level of (for example) Subversion.
- GUI Construction: people still use desktop GUIs, and it's worth learning how to build them (if only because it forces you to come to grips with MVC and event-driven programming), but what everyone really wants these days is a rich browser-based interface, and I don't think it'd be possible to fit that into this course.
- High Performance Blah Blah Blah: this one keeps coming up, but (a) one of the motivations for Software Carpentry is the belief that there's too much emphasis on this in scientific computing anyway, and (b) what would it include? GPU programming? MPI? Grid computing? Some other flavor-of-the-week distraction from the hard grind of creating trustable code and reproducible results without heroic effort? Oh, wait, are my biases showing?