We teach foundational coding and data science skills to researchers worldwide.

Best recipe - just add statistics and science

This post originally appeared on the Data Carpentry website

We’d like to talk about our experiences working together as a domain scientist and a statistician and encourage you do this too!

When we say working together, we mean that we collaborate in the best sense – we each bring our strengths and those strengths are complementary. We have some suggestions for you as you get started. First, the domain scientist should look at how statistics is organized at their institution. Does your university have consulting statisticians or research-active faculty or both? Consulting can also be done as part of statistics students’ training. Most larger universities have both consulting and research. Some scientific questions naturally flow into research on new statistics methods and concepts, and other science questions are a better fit for an analysis method that has already been published. Talk to the consulting center first if you have one.

Collaboration between scientists and statisticians is essential and it is best to start before actually gathering data, even before the experimental design is finished. Susan has met with a number of faculty and students after data was collected and had to be the bearer of bad news…the data collected could NOT answer the question at hand. This could be due to a number of reasons – small sample size, wrong experimental design, not a random sample, incorrect information collected, etc. Don’t let this happen to you!

What has our path been like?

Susan: I have been very fortunate to have had the opportunity to work with many great scientists. The problems that I’ve been able to work on have been extremely fun and exciting. Not only did I have a chance to answer scientific questions, but in doing so, I’ve had the opportunity to expand my statistics knowledge. One such example was developing a model for plant quantitative trait loci. We developed a great hierarchical Bayesian model that used a Markov Chain Monte Carlo Model Composition (MC3) approach in identifying important markers. This was the analysis that started me down the path into Bayesian statistics. Before this analysis, most of my work focused on frequentist approaches.

Ann: I first appreciated statisticians when I brought a difficult experimental result to a university statistics consulting department and the statistician both solved the problem and taught me to do bootstrapping in SAS (back then it it was a new tool, biologists had not heard of this). I then convinced my advisor that this was the right analysis. Gail, I salute you!. Then Susan Simmons and the other statistics faculty at UNCW continued to educate me…I especially remember learning about Bayes’ theorem from Ed and Susan, and Susan explaining known-truth simulations and validation to me (which has let to an ongoing cyberinfrastructure project with many great students from computer science and statistics). I am now working toward understanding causal calculus, tensors, and U-statistics with patient tutelage from Yishi Wang and Cuixian Chen. There is always more fun stuff to learn. I have no intention of being an expert – and I don’t have enough formal math – but I can bring lots of disparate and interesting data and methods to the table.

What should you expect as you start down this path?

As a scientist meets with a statistician, the statistician asks many questions! The statistician tries to get a good grasp of the problem and will therefore really probe the scientist about exactly the question at hand. The statistician will also question and try to assess what limitations are evident in the problem and what are the best ways to overcome these limitations. Also, due to the amount of probing, statisticians can sometimes help scientists think of questions about their research that they did not even consider.

The goals of the statistician and scientists are the same – they both want the best research conducted with appropriate answers and solutions. With that said, the statistician must understand the data, the problem and the question to develop the correct methodology to use in analyzing the data.

What to do

Scientists should provide as much background information on the topic as possible. Keep in mind that the statistician might not have a background in your area, so be prepared to provide basic information. Try not to use too much of the jargon that is specific to your area. Be patient….this is true for both the statistician and scientist. In most cases, the two areas will have different terminologies. Keep in mind that the scientist needs to clearly relay all pertinent information about the problem, and the statistician needs to relate the correct methodology needed to analyze the data. This may take some time, but it is important for both parties to have this understanding. Communicate! Just as we mentioned previously that the scientist needs to communicate the problem well, the statistician also needs to communicate the analysis well. For example, if the analysis requires a certain assumption (for example, normality), then it is important for the statistician to relay this information and ensure that it makes sense that this assumption holds for the data. To get the best results, the scientist and statistician need to communicate throughout the entire process.

Many biology papers use outdated or just plain wrong statistical methods and visualizations. You can do better, but you may get pushback (sad, but common). Your statistics colleague can teach you how to explain the better analysis methods that you are using, and serve as the expert to convince reviewers.

This topic from the stats side, http://simplystatistics.org/2013/10/09/the-care-and-feeding-of-your-scientist-collaborator/

Scroll down for some excellent comments at http://stats.stackexchange.com/questions/5597/statistics-collaboration

https://github.com/jtleek/datasharing and


Everyone has a professional association – check out these statistics societies’ conferences, http://ww2.amstat.org/meetings/csp/2017/conferenceinfo.cfm and http://www.amstat.org/ASA/Meetings/Joint-Statistical-Meetings.aspx?hkey=bc3bc257-950f-44f8-aed6-b37736571bfc

Ann’s opinion article about this kind of collaboration – http://journal.frontiersin.org/article/10.3389/fpls.2014.00250/full

Author Bios:

Ann’s highlights: published with statistician and computer science collaborators, funded by USDA and NSF, chair of Gordon Research Conference on Quantitative Genetics, UNCW mentor award

Susan’s highlights: published with various scientists and computer scientists; AE of Environmetrics; Council of Section representative for the Risk Analysis Section of ASA; elected member of the International Statistics Institute

Dialogue & Discussion

Comments must follow our Code of Conduct.

Edit this page on Github