Interview with Cameron Neylon

This post originally appeared on the Software Carpentry website.

Today's interview is with Cameron Neylon, a noted advocate of open science.

Tell us a bit about your organization and its goals.

I work for the UK Science and Technology Facilities Council. We are a research funder but although we provide some direct funding our main role is to build and run or subscribe to large scale research infrastructure on behalf of UK scientists. For instance we run telescopes, pay the UK subscription to CERN, as well as supporting and running synchrotrons, neutron sources, high powered lasers, microfabrication facilities and large scale computing infrastructure.

I work at the ISIS Neutron Scattering Facility which hosts several thousand scientists a year doing hundreds of experiments on around 20 different instruments. We help to select which experiments get done, support sample preparation, assist with the planning and running of experiments, as well as data analysis, sometimes all the way to publication. My group focusses on support and development of new techniques for biological scientists.

Tell us a bit about the software your group uses.

We use a big mix of things. Like most experimental scientists Word and Excel figure a lot in basic analysis and record keeping. We use a blog based laboratory notebook system (biolab.isis.rl.ac.uk) developed in collaboration with the University of Southampton. The instruments are highly specialised and are run with software developed in house and first stage analysis is moving to a new framework called Mantid (mantidproject.org).

After the first stage we move to all sorts of tools based on what we need and the scientific problem. Specialist analysis software, usually built by individuals or groups, often requiring some sort of proprietary framework (MatLab is common and Igor from Wavemetrics is quite often used), is put together in ad hoc pipelines to attack a problem from several different directions. This is often quite haphazard.

Some examples include RaSCAL (MatLab: http://sourceforge.net/projects/rscl/), ATSAS suite (closed source mostly command line drive suite for scattering analysis: http://www.embl-hamburg.de/ExternalInfo/Research/Sax/software.html), and NIST SANS analysis tools (Igor Pro: http://www.ncnr.nist.gov/programs/sans/data/red_anal.html).

Tell us a bit about what software your group develops.

The Mantid project has provided us with a Python scripting and GUI environment which has made it possible to provide some simple tools for ourselves and some users and to help us integrate this with our blog based record keeping system. Most of what I do is based in the immediate needs of our group but with an eye to making it more useful to a wider community. Often it involves trying to make those disparate data analysis pipelines easier to use, more consistent, and to enable easier and better record keeping of the analysis process. We use our experience of problems to try and build things that are useful for our wider community.

What's the typical background of your scientists, developers, and/or users?

Most of the scientists we deal with have no specific experience of programming. In rare cases they have a little experience of scripting or command line work. They are focussed on outcomes and getting results rather than tools. This leads to ad hoc procedures and pipelines that are usually inefficient and badly recorded. Most could look at simple scripts and manipulate those for their needs. However the lack of experience in programming "properly" and a lack of knowledge of best practice leads to messy and incomprehensible, often unusable tools. An understanding of test driven software design and versioning for safe development is rare.

Those scientists who do build software and are comfortable with programming rarely have any skill or experience in user interface design leading to difficult to use interfaces and GUIs that confuse users.

How do you hope Software Carpentry will help them?

Good practice, good testing, good documentation, and availability of code for checking. On top of this a good understanding of how to think about the design of a specific piece of software and some knowledge of common design patterns to aid in the more rapid development of good and re-usable software.

How will you tell what impact the course has had (if any)?

I'll see some comments in people's code and I'll be able to get at it in an appropriate repository. When I get this code and read the comments I'll be able to understand how I might re-use it for my own purposes. If the course can achieve that or steps towards that I'll be very happy!