We teach foundational coding and data science skills to researchers worldwide.

Why we teach what we teach

The R ecology lessons

This post originally appeared on the Data Carpentry website

The Data Carpentry lessons are aimed at learners who have never programmed before. Learning something new, especially coding, can feel intimidating. Yet learners attending our workshops are motivated as they realize it is a skill they will eventually need to master to be able to manage and make sense of the data they are generating in their research. When learning a programming language, you eventually need to master two things: the syntax of the language (e.g., where the parentheses and the commas go), and the intricacies of the language that will make you write less code and faster code (e.g., taking advantage of the vectorized operations). The first can be learned relatively quickly, the second takes years of practice. One of the advantages of learning the basics of programming during a workshop is that we can teach learners good practices from the start, rather than having to go through the painful experiences that typically accompany self-learning experiences.

We teach good practices but not necessarily best practices. When you first learn subtraction, your teacher taught you that you can’t give away six marbles if you only have four. And when you first learned about square roots, your teacher told you that you can only calculate the square root of a positive number. Teachers don’t cover negative numbers and complex numbers until you master other skills that are needed to understand these concepts.

We are working on a tight schedule during a workshop. In two days, we need to empower learners to demonstrate that coding isn’t scary, and that with a little knowledge and good practices, you can achieve a lot. During this time, we need to tackle problems that are realistic enough that learners can project the skills we teach them to their own datasets/problems, but not so difficult that learners are demotivated by the amount of knowledge they will need to master to be able to be productive on their own with their data.

One way we do that is by building the lesson around a dataset learners can easily relate to. The dataset for the ecology-themed lesson is from a real long-term experiment that uses variables anyone working in ecology will be familiar with: species names, measurements (lengths and weights), date of observation. The dataset is large enough (35,000+ rows) that manipulating it in a spreadsheet program would be difficult, but small enough that working on it with a programming language is almost instantaneous.

The other way is by putting a lot of thought into selecting what we cover during these two half days. We focus on how to organize the code, the data, and the files that make up a typical research project. There are now many resources to learn R online (some of which can be useful to the learners after a workshop), but one way Data Carpentry stands out is by demonstrating how good practices for data formatting and organization can facilitate data analysis. We reinforce this idea by using the same dataset throughout the workshop.

The main skills we focus on in the R lesson are how to prepare datasets for analysis and visualization. In Data Carpentry, we teach packages from the tidyverse (formerly known as the Hadleyverse) which are sophisticated and elegant additions to the R language to work with data. Most functions are verbs (e.g., filter()), and use a limited vocabulary that distills operations needed to manipulate data. For instance, the six functions we introduce in the lesson on dplyr are enough to cover most cases of subseting data and extracting relevant information from it. We use ggplot2 for data visualization because it allows learners to rapidly produce high-quality graphics. In addition, these packages encourage best and consistent data formatting practices by relying on the tidy data concept (one row for each observation, one column per variable, one table per observational unit). We first introduce the tidy data concept in the spreadsheet lesson and emphasize its utility in each of the lessons.

Because of the duration of the workshop, we have to leave a lot of things out of the lesson. We want to limit the information overload. For instance, we don’t cover lists. While they are essential to programming in R, a lot of data analysis can be done without knowing about them. On the other hand, we cover factors because learners will encounter them when importing data in R, and their behavior is often misunderstood. When factors are covered during a workshop, it will be easier for learners to know how to deal with them in the future.

During workshops, “Challenges” are a core component of the learning experience. After each concept the instructor covers, the challenges are the opportunity for learners to practice what they just learned. It is a form of formative assessment that brings interactivity. The learners can witness for themselves what they are now capable of doing with their newly acquired skills (for instance, a beautiful plot), and it allows instructors to assess whether the learners have assimilated the concepts taught. Compared with the passive lecture format learners typically experience, they are not used to this level of interactivity, and often praise these challenges in the workshop evaluations.

When beginning learning a new programming language, it can be frustrating to not be able to generate the desired output or to be faced with cryptic error messages because a comma or a quotation mark is missing. To limit this frustration, we provide learners with a handout that contains a lot of the code already typed. They can fill in the blanks, add their own comments, and bits of code to it. It allows learners to focus on learning concepts and how they relate to each others rather than obsessing over where the commas go.

I think it is also important to be realistic about the expectations one can have after a 2-day workshop. Learning how to code takes time. Because it is a new skill, learners will need to change the way they approach the analysis of their data. Being confronted with something new can feel uncomfortable, and facing the limits of one’s knowledge can be frustrating. It is important to celebrate your successes along the way. It will help you go through frustrating times. Having people who can help you in your learning experience is also important: the person sitting next to you at a workshop might be the best person to take on this job. Finally, strive for best practices but not before you master the “good enough” practices.

Dialogue & Discussion

Comments must follow our Code of Conduct.

Edit this page on Github