Tutorial: NumPy, SciPy, and matplotlib
This post originally appeared on the Software Carpentry website.
Today I did a toy data analysis of some annual temperature data in Australia and Canada over the last ~100 years. The goal of the exercise was to demonstrate loading data, inspecting it, and fitting trends. My last tutorial didn't involve any real data so this week we wanted to change that.
Like my previous tutorial I used the IPython HTML notebook to present. I hadn't been planning to use pylab mode or inline plots but some issues with my matplotlib forced me in that direction. There's an awkward situation here because I actually recommend people not use the pylab interface to matplotlib because the behind-the-scenes magic can cause problems (difficult to debug problems), but for doing demos the inline plots are really the way to go. The obvious upside is that the plots I made as part of the tutorial are embedded in the notebook for you to see now.
The data was stored in a well behaved CSV format so it was simple to load with numpy.loadtxt. I used the matplotlib plot function for all the figures, even the one where I probably should have used scatter.
I demonstrated fitting with scipy.stats.linregress, scipy.optimize.curve_fit, and scipy.interpolate.UnivariateSpline. The linregress function is useful for doing just a quick linear fit, while curve_fit allows you to fit arbitrary functions to the data since you give it a function you define.
We just scratched the surface of three modules in SciPy today. Skimming the docs you can see there are a vast array of tools in there. And for a quick look at what matplotlib can do, take a look at the thumbnail gallery.