What the Data Says About Novice Programming Mistakes

This post originally appeared on the Software Carpentry website.

I recently had a chance to catch up with this paper from 2014:

Neil C. C. Brown and Amjad Altadmri: “Investigating Novice Programming Mistakes: Educator Beliefs vs Student Data”. ICER'14, http://dx.doi.org/10.1145/2632320.2632343.

Its abstract says:

Educators often form opinions on which programming mistakes novices make most often – for example, in Java: “they always confuse equality with assignment”, or “they always call methods with the wrong types”. These opinions are generally based solely on personal experience. We report a study to determine if programming educators form a consensus about which Java programming mistakes are the most common. We used the Blackbox data set to check whether the educators’ opinions matched data from over 100,000 students – and checked whether this agreement was mediated by educators’ experience. We found that educators formed only a weak consensus about which mistakes are most frequent, that their rankings bore only a moderate correspondence to the students in the Blackbox data, and that educators’ experience had no effect on this level of agreement. These results raise questions about claims educators make regarding which errors students are most likely to commit.

There’s lots to admire in both the data they collected and the analyses they did, but the biggest takeaway is that even very experienced teachers only agree very weakly about what errors students make most often, and that their agreement with the data is no stronger. It would be wonderful to have such rich, grounded insight into where people are actually stumbling with Git, Python, R, and the shell.

What the Data Says About Novice Programming Mistakes

Table of Contents