We teach foundational coding and data science skills to researchers worldwide.

16S Metagenomics Workshop based on Data Carpentry: successes and challenges

This post originally appeared on the Data Carpentry website

We ran a 16S metagenomics workshop based on Data Carpentry materials at the North-West University, South Africa, from 24-27 October. A combination of lessons from Data Carpentry, as well as specific workflows for 16S data analysis, were used. We modified most of the lessons according to our main (critical) lesson’s dataset that we were using for 16S analysis. The following topics were covered: spreadsheet organization, shell, 16S analyses on HPC (using Shi7, NINJA and QIIME), R genomics and specific plots of 16S data in R. We extended the workshop to three full days to fit in all the lessons and decided on an additional half-day where attendees would have the opportunity to work on their own data. Twenty-three participants attended the whole workshop.


Here is a breakdown of the lessons, with successes and challenges mentioned for each:

  • Bianca Peterson made an R presentation of Data Carpentry’s Data Organization in Spreadsheets Ecology lesson to guide Leani Bothma, who was teaching for the first time as a newly trained instructor. This saved some time, which we then used for the shell lesson. Total teaching time: 1 hour.

  • We used the Shell Genomics lesson of Data Carpentry as is, and only modified the output according to our HPC. Tomasz Sanko, also a newly trained instructor, taught this lesson at a very reasonable pace, but couldn’t get through the whole lesson. Feedback shows that learners still need a little more time on the Unix lesson, which is crucial for the next lesson where they analyse 16S data on the HPC. At least 90 additional minutes are needed for this lesson, thus totalling 4.5 hours teaching time.

  • One full day for 16S analysis was perfect: we finished in time even with some troubleshooting along the way. This 16S workflow was written by Tonya Ward (Knights Lab, University of Minnesota) and all the required software was installed on our HPC prior to the workshop. The support from IT was amazing - they made sure that an ethernet cable was available for each participant to ensure continued connectivity while working on the HPC. Total teaching time: 6 hours.

  • The R lesson (R genomics from Data Carpentry that we modified according to the metadata/mapping file that we used for the 16S analysis) was taught by qualified instructor Caroline Ajilogba Fadeke. This was her first time teaching. Not all the data in this mapping file is real - some variables were made up in order to do plotting (this lesson can certainly be improved by adding more variables). Time allocation seemed to be perfect, the pace was not fast at all, and almost everybody kept up with the instructor. Total teaching time: 4.5 hours.

  • In the following R session, learners used output files generated by QIIME to make a variety of visualizations in R. Andries van der Walt, also a newly-trained instructor, taught this microbial community analysis in R lesson, which he wrote. Participants got stuck on typos, even though everything on the screen was correct and they had the lesson in a browser. They seemed to fall behind and looked quite tired, since it was the third full day of the workshop. He recapped everything the next day, and people said that they understood. There were many interruptions during his lesson on day 4 (sorry again Andries!): HPC talk, finishing the last section of the Unix lesson (piping and script writing), taking a group photo, and a quick thank you from Professor Carlos Bezuidenhout (who contributed funds towards this workshop). Our HPC people talked to participants regarding registering for an HPC user account and showed them how the scheduler works. Participants appreciated this, even though it wasn’t part of our formal schedule. Luckily, Tonya had a workflow and script ready for drawing diversity plots in R (for this specific dataset that was analysed) and thus had participants running the script while she was explaining all the commands and arguments. They didn’t seem to mind, since they were familiar with R syntax by now, and actually appreciated the time taken for interpretation of the results. Unfortunately, we didn’t have enough time to cover the other plot types in Andries’s workflow, but participants said that they will have a look at those after the workshop.


The modified lessons (around the single example dataset) seemed to work much better. The cognitive load was greatly reduced, and they could really get to know the data since they were working on it from the first day. Participants didn’t mind an extended workshop - they actually appreciated it. After the workshop, several participants used what they learned during the workshop, and analysed their own data.

Word of thanks

We would like to thank Boeta Pretorius (NWU IT Director), Adelle Lotter (Acting director: NWU Academic and Office Solutions) and Professor Carlos Bezuidenhout (Microbiology department) for their support and for sponsoring the workshop. We would also like to thank the NWU IT team, Riaan Stavast, Thabo Molambo, Hannes Kriel, Martin Dreyer, and especially Ciellie Jansen van Vuuren, for going the extra mile to ensure all software were loaded and working on the HPC and for providing additional support in terms of internet access throughout the workshop.

Dialogue & Discussion

Comments must follow our Code of Conduct.

Edit this page on Github