Systematic Curriculum Design

This post originally appeared on the Software Carpentry website.

Executive summary: we'd appreciate your help organizing and motivating our material better.

One of the good things about traveling is that it gives me time to think. One of the bad things about thinking is that every time I do, I wind up with more work than I had when I started. For example, to organize and motivate our content, I'm using eight questions that scientists frequently ask:

How can I manage this data?
How can I process it?
How can I tell if I've processed it correctly?
How can I find and fix bugs when I haven't?
How can I keep track of what I've done?
How can I find and use other people's work?
How can other people find and use mine?
How can I do all these things faster?

On the other side of the equation I have a syllabus for the core Software Carpentry material, which includes:

the command-line shell (e.g., Bash)
version control
basic programming (variables, lists, loops, conditionals, and simple file I/O)
functions and libraries
databases (i.e., basic SQL queries)
matrix programming (e.g., MATLAB or NumPy)
quality assurance (defensive programming, testing, etc.)
dictionaries (or hashes, if you're a Perl programmer)
the development process (stepwise refinement, red-green-refactor, performance profiling)
web programming (by which we mean using web APIs, not providing services yourself)

In order to figure out how well we're helping scientists, we need to map their needs onto our content. Here's what I've come up with:

Question	Subject	Answer
How can I manage this data?	The Shell	Use directories and sub-directories with meaningful names.
		Use filenames that can easily be matched with wildcards.
		Use filename extensions that indicate the type of data in the file.
		Use text unless there's a powerful reason to use something else.
	Version Control	If it's megabytes or less, put it under version control.
	Basic Programming	Create and use data formats that are easy for programs to parse.
	Functions and Libraries	—
	Databases	Store it in a relational database.
		Store each atom of information in its own field.
		Make sure each record has a unique key.
		Make sure that information is never duplicated.
		Use foreign keys and joins to combine information from different tables.
	Number Crunching	Represent it as a matrix, because that's easy to process.
	Quality	—
	Sets and Dictionaries	Store it in a set or dictionary so that elements can be looked up by value rather than by position.
	Development	—
	Web Programming	Format it as HTML (or XML, or some other widely-used format).
		Separate content from presentation (e.g., use CSS for styling).
Question	Subject	Answer
How can I process it?	The Shell	Use Unix commands that manipulate lines of text.
		Combine those commands using pipes and redirection.
		Use loops to perform the same operations on many files.
	Version Control	—
	Basic Programming	Write programs that use loops, file I/O, and string splitting to read data.
		Use floating-point numbers unless you are sure all values (including calculated values) will always be integers.
	Functions and Libraries	Define functions to do simple operations, then combine those for more complicated effects.
		Equivalently, describe what you would do in a language customized to your problem, then fill in the missing bits of code by creating functions.
	Databases	Write SQL queries to select, filter, aggregate, and sort data.
		Use a general-purpose programming language for everything else.
	Number Crunching	Use a linear algebra package like NumPy.
	Quality	—
	Sets and Dictionaries	Use algorithms that don't depend on the order of items.
	Development	Use the right data structures.
	Web Programming	Use an HTTP library to fetch it.
		Use an XML or JSON library to parse it.
Question	Subject	Answer
How can I tell if I've processed it correctly?	The Shell	—
	Version Control	—
	Basic Programming	Test your programs with small data sets whose results can be checked by hand.
	Functions and Libraries	—
	Databases	Build queries in small steps.
		Run queries against small data sets whose output can be checked manually.
	Number Crunching	Compare a program's output to analytic results, experimental results, simplified test cases, and previous programs.
		Use tolerances when comparing results.
	Quality	Create simple data sets for which the right answer can be calculated by hand.
		Compare the results produced by the new program to results produced by older programs.
	Sets and Dictionaries	—
	Development	Make code testable by dividing it into functions, and then replacing some functions with others for testing purposes.
	Web Programming	—
Question	Subject	Answer
How can I find and fix bugs when I haven't?	The Shell	—
	Version Control	—
	Basic Programming	—
	Functions and Libraries	—
	Databases	—
	Number Crunching	—
	Quality	Write test cases that fail when the bug is present, but pass when the bug is fixed.
		Add assertions to programs to check its internal consistency.
		Use a debugger.
	Sets and Dictionaries	—
	Development	Write tests.
	Web Programming	—
Question	Subject	Answer
How can I keep track of what I've done?	The Shell	—
	Version Control	Keep your work under version control.
		Check in whenever you've completed a significant change.
		Write meaningful check-in comments.
	Basic Programming	Put version control IDs in programs (and data files), and copy them forward to results.
	Functions and Libraries	Give functions meaningful names.
		Group related functions and related definitions into modules.
		Write docstrings to explain what functions and modules do and how to use them.
	Databases	Store queries in files (just like programs).
	Number Crunching	—
	Quality	Turn bug fixes into assertions and test cases.
		Use a coverage analyzer to see what code is and isn't being tested.
	Sets and Dictionaries	—
	Development	—
	Web Programming	Use `meta` headers in your HTML/XML data files.
Question	Subject	Answer
How can I find and use other people's work?	The Shell	—
	Version Control	Get it from their version control repositories.
	Basic Programming	—
	Functions and Libraries	Use the `help` function to read their documentation.
	Databases	—
	Number Crunching	—
	Quality	—
	Sets and Dictionaries	—
	Development	—
	Web Programming	Ask them to use well-formed URLs.
		And to format it according to well-defined machine-readable standards (e.g., XML or JSON).
Question	Subject	Answer
How can other people find and use mine?	The Shell	—
	Version Control	Put your work in a publicly-accessible version control repository.
	Basic Programming	—
	Functions and Libraries	Write docstrings to explain what functions and modules do and how to use them.
	Databases	Raise exceptions to signal errors so that other people can handle them as they think best.
	Number Crunching	—
	Quality	—
	Sets and Dictionaries	—
	Development	—
	Web Programming	Put it on the web at a stable URL.
		Format it according to well-defined machine-readable standards (e.g., XML or JSON).
		Include meta-data.
Question	Subject	Answer
How can I do all these things faster?	The Shell	Put commands in shell scripts so that they can be re-used.
	Version Control	—
	Basic Programming	Use appropriate variable names so that people will waste less time trying to read programs.
	Functions and Libraries	Learn to recognize and use common design patterns.
	Databases	—
	Number Crunching	Use a linear algebra package like NumPy.
	Quality	Design code for testing.
		Write test cases before writing new code.
	Sets and Dictionaries	Use sets and dictionaries for sparse, irregular, or unordered data.
	Development	Use a profiler to figure out why code is slow before trying to optimize it.
		Build code so that parts can be replaced easily.
	Web Programming	—

In parallel with this, a group of us have been working on a paper describing best practices for computational science. The list we've converged on is:

Write programs for people, not computers.
- Programs should not require their readers to hold more than a handful of facts in memory at once.
- Names should be consistent, distinctive, and meaningful.
- Code style and formatting should be consistent.
- All aspects of software development should be broken down into tasks roughly an hour long.
Automate repetitive tasks.
- Rely on the computer to repeat tasks.
- Save recent commands in a file for re-use.
- Use a build tool to automate scientific workflows.
Use the computer to record history.
- Software tools should be used to track computational work automatically.
Make incremental changes.
- Work in small steps with frequent feedback and course correction.
Use version control.
- Use a version control system.
- Everything that has been created manually should be put in version control.
Don't repeat yourself (or others).
- Every piece of data must have a single authoritative representation in the system.
- Code should be modularized rather than copied and pasted.
- Re-use code instead of rewriting it.
Plan for mistakes.
- Add assertions to programs to check their operation.
- Use an off-the-shelf unit testing library.
- Turn bugs into test cases.
- Use a symbolic debugger.
Optimize software only after it works correctly.
- Use a profiler to identify bottlenecks.
- Write code in the highest-level language possible.
Document the design and purpose of code rather than its mechanics.
- Document interfaces and reasons, not implementations.
- Refactor code instead of explaining how it works.
- Embed the documentation for a piece of software in that software.
Conduct code reviews.
- Use code review and pair programming when bringing someone new up to speed and when tackling particularly tricky design, coding, and debugging problems.
- Use an issue tracking tool.

As you can see, this list only partially overlaps the "Answers" column in the table above. That makes me nervous: when two independent attacks on a problem yield two different answers, the odds are good that neither of them is right. I trust the "best practices" list more than I do the breakdown of our existing material, which leaves me with some awkward choices. Changing the motivating questions would feel like moving the goalposts so that I can declare victory with the content I have, but on the other hand, maybe there is a better way to carve up the space of things scientists want to do that will give a better mapping. Or are there connections between our content and those motivating questions that I'm just missing? Or do we really have the wrong content, i.e., are we teaching what we know, rather than what would actually be most useful to scientists?

Stepping back for a moment, the real point of this exercise is to ensure that:

we're teaching what's most useful to our learners;
everything we teach makes sense, and is seen as useful, when it first appears; and
learners see the connections between ideas and between ideas and their application.

What we should really do is go one step further and figure out how to tell whether our learners can actually do the things embodied in our eight questions. We should then work backward from that assessment to figure out what demonstrable skills they need to acquire, then what understanding they need in order to become proficient with those skills, and then see how that maps onto our best practices. We've made a start toward this with the "driver's license" exam described in an earlier post; if you'd like to help us follow through, please get in touch.

Greg Wilson 2012-09-16
Content Education Software Carpentry

Dialogue & Discussion

Comments must follow our Code of Conduct.

this GitHub Repository