Doing the Math
This post originally appeared on the Software Carpentry website.
Let's do some math. Suppose that working through the Software Carpentry course takes the average scientist five full-time weeks. It doesn't matter whether that's one five-week marathon, or whether the time is spread out over several months; the cost is still roughly 10% of the scientist's annual salary (if you're thinking like an administrator) or 10% of their annual published output (if you're thinking like the scientist herself). How big a difference does it have to make to her productivity to be worthwhile?
Well, the net present value of n annual payments of an amount C with an interest rate of i is P=C(1-(1+I)-n)/i. If we assume our scientist only keeps doing research for another 10 years after taking the course (which I hope is pessimistic), and depreciation at 20% (which I also hope is pessimistic), then the present value works out to 4.2 times the annual savings. Doing a little long division, that means this training only has to improve the scientist's productivity by 2.4% in order to pay for itself. That works out to just under an hour per week during those ten years; anything above that is money (or time) in the bank.
Now suppose the feedback we get from former students is right, and that this training saves them a day per week or more. Let's assume the average scientist (whatever that means) costs $75,000 a year. (That's a lot more than a graduate student, but a lot less than the fully-loaded cost of someone in an industrial lab.) 20% of their time over the same ten years, at the same 20% discount rate, works out to roughly $63,000; at a more realistic discount rate of 10%, it's roughly $93,000. That's roughly a ten-fold return on $7500 (five weeks of their time right now at the same annual salary).
So my question is, why do scientists—who are certainly supposed to be able to do basic math—ignore this? More to the point, why do the people who organize conferences on "e-science" persist in ignoring two facts:
- The biggest bottleneck for the overwhelming majority of scientists (90% or more if you believe our 2008-09 survey) is development time, not CPU cycles. Faster machines can improve turnaround times a bit, but mastering a few basic skills will make a much bigger difference.
- Even those scientists who really need supercomputers to do their work would get more done faster if they were wasting less time copying files around, repeating tasks manually, and reinventing sundry wheels. They are trying to solve two open problems at once: whatever is intrinsic to their science, and high-performance parallel programming. Tackling the latter without a solid foundation is like trying to drive an F1 race car on the highway before you've learned to change lanes in a family car. I know from personal experience that the crash and burn rate is comparable...
I will believe that computational science is finally outgrowing its "toys for boys" mentality when I see an e-science conference that focuses on process and skills: on how scientists develop software at the moment-by-moment, week-by-week, and year-by-year scales. I will believe that people really care about advancing science, rather than in the bragging rights that come from having the world's biggest X or its fastest Y, when supercomputer centers start requiring courses on software design, version control, and testing as prerequisites to courses on GPUs and MPI. I'll believe it when journals like Nature and Computing in Science & Engineering require every paper they publish to devote a section to how (and how well) the code used in the paper was tested.
And I'll believe in Santa Claus when I see him up on my roof saying, "Ho ho ho." What I won't do is take bets on which will happen first.