Building Genomics Data Analysis Capacity at NWU
This post originally appeared on the Data Carpentry website
The North-West University in South Africa boasts two next generation sequencing (NGS) platforms and additionally receive terabytes of NGS data annually from local and international service providers. Research projects with NGS components exist in the areas of Microbiology, Zoology, Botany, Nutrition, Agriculture, and more.
The three biggest challenges experienced by researchers and postgraduate students in terms of data analysis are as follow:
- many of the students entering NGS projects have limited prior exposure to molecular techniques such as Sanger sequencing and PCR, and genetics concepts;
- there is limited access to bioinformatics support and training (although there is lots of access to short interventions like 1- or 2-day workshops with no sustained follow-up);
- and they are not aware of the range of research compute infrastructures which are available to them.
In September 2016, the NWU eResearch Initiative helped to establish a Genomics Hacky Hour (GHH) Study Group to support postgraduate students and researchers using NGS technologies. The original intention of the GHH was to bring researchers together to work on their current projects. However, limited shared NGS vocabulary hampered constructive communication amongst researchers and it was decided that specific topics would be discussed during the first few sessions, lead by a study group leader.
The GHH members participated in a locally ran Software Carpentry Workshop in November 2015, where they were introduced to the basic concepts of reproducible research and various tools such as Shell, git, GitHub, and either Python or R. The GHH Study Group sessions provided a safe, informal post-workshop learning environment for participants to continue their learning.
In January 2016, several students and supervisors enrolled for the Coursera Genomics Data Science Massive Open Online Course (MOOC). The GHH sessions were used to discuss challenges and solutions specific to the Coursera course and the hope was that, with a better support structure, participants would be able to stay the course and complete the 7-module specialisation over the next 9 - 12 months. The learning curve was very steep for several of the modules and we realised we needed additional learning opportunities even to complete the MOOC.
In April 2016, two PhD students with NGS projects participated in a locally hosted Software/Data Carpentry instructor training workshop with the idea to host a Genomics Data Carpentry workshop soon after. The NWU hosted its first Genomics Data Carpentry workshop from 26 - 29 September 2016 lead by the two newly-qualified instructors, Bianca Peterson and Maryke Schoonen, alongside Jason Williams, Assistant Director, DNA Learning Centre, Cold Spring Harbor Laboratories.
The workshop was run on AWS instances courtesy of Data Carpentry. One of our concerns was that, contrary to other Carpentry workshops, researchers wouldn’t have access to the software environment after the workshop to continue practicing their newly acquired skills and play around with their own data.
Luckily, NWU is one of the founding members of the African Research Cloud (ARC) and we were able to get access to enough instances on this infrastructure after the Data Carpentry workshop. Tim Carr from UCT eResearch worked with Jason Williams to build a replicate of the AWS Genomics Data Carpentry instance on the ARC and shortly after the workshop our participants were able to continue their learning in a familiar environment.
In the past few weeks the GHH folks have been working through the Data Carpentry genomics lessons at their own pace to reinforce what was learned during the workshop and complete some of the exercises that weren’t covered. These exercises have strengthened individual knowledge, built trust amongst participants and made them more aware of available information, tools and resources. We are already planning additional exercises to augment what is covered in the Data Carpentry Genomics lessons .
Take-home message: genomics capacity building initiatives can not be limited to workshop participation, but require long-term continuous learning (i.e. post-workshop participation) and support. It is important to focus efforts on ‘what works’ at the level of the individual, department and organization, whether it be running workshops, doing MOOCs or getting involved in study groups. Some words from one of our GHH folks: “You will feel stupid and want to give up a thousand times, but if you stick with it and work through the material and exercises, you will get to a level where you can analyze your own data.”