Performance and Parallelism
This post originally appeared on the Software Carpentry website.
Some topics for a lecture on parallel programming:
- how to measure/compare performance (raw speed, weak scaling, strong scaling, Amdahl's Law, response time vs. throughput)
- the register/cache/RAM/virtual memory/local disk/remote storage hierarchy and the relative performance of each (order of magnitude)
- in-processor pipelining (or, why branches reduce performance, and why vectorized operations are a good thing)
- how that data-parallel model extends to distributed-memory systems, and what the limits of that model are
- the shared-memory (threads and locks) model, its performance
limitations, deadlock, and race conditions - the pure task farm model, its map/reduce cousin, and their limitations
- the actors model (processes with their own state communicating only through messages, as in MPI)
It's too much (each point should be an hour-long lecture in its own right, rather than 10-12 minutes of a larger lecture); what do we cut, and what's in there that doesn't need to be?