Performance and Parallelism

This post originally appeared on the Software Carpentry website.

Some topics for a lecture on parallel programming:

how to measure/compare performance (raw speed, weak scaling, strong scaling, Amdahl's Law, response time vs. throughput)
the register/cache/RAM/virtual memory/local disk/remote storage hierarchy and the relative performance of each (order of magnitude)
in-processor pipelining (or, why branches reduce performance, and why vectorized operations are a good thing)
how that data-parallel model extends to distributed-memory systems, and what the limits of that model are
the shared-memory (threads and locks) model, its performance
limitations, deadlock, and race conditions
the pure task farm model, its map/reduce cousin, and their limitations
the actors model (processes with their own state communicating only through messages, as in MPI)

It's too much (each point should be an hour-long lecture in its own right, rather than 10-12 minutes of a larger lecture); what do we cut, and what's in there that doesn't need to be?