Looking for a Model

This post originally appeared on the Software Carpentry website.

Updated: this CSV file has information on who taught when. The three columns are the person's unique identifier, the date on which they first qualified, and the dates on which they taught. (If someone has taught multiple times, there is one record for each teaching event.) People who haven't taught at all are at the bottom with empty values in the third column. Erin Becker's analysis of this data is posted on the Data Carpentry blog and discussed here.

We rebooted instructor training in October 2015, and things have been going pretty well since then. If we average over all 23 new-style classes, it looks like two thirds of people who take part actually qualify as instructors within four months of finishing the class:

Date Site(s) Days Since Participants Completed Percentage Cum. Participants Cum. Completed Cum. %age
2015-10-15 online 170 48 30 62.5% 48 30 62.5%
2015-12-07 Paris 162 7 7 100.0% 55 37 67.2%
2015-12-07 Potsdam 162 5 5 100.0% 60 42 70.0%
2015-12-07 Thessaloniki 162 4 4 100.0% 64 46 71.8%
2015-12-07 Arlington 162 10 4 40.0% 74 50 67.5%
2015-12-07 Vancouver 162 5 4 80.0% 79 54 68.3%
2015-12-07 Wisconsin 162 7 5 71.4% 86 59 68.6%
2015-12-07 Australia 162 3 2 66.6% 89 61 68.5%
2015-12-07 Curitiba 162 3 3 100.0% 92 64 69.5%
2015-12-07 Toronto 162 14 12 85.7% 106 76 71.7%
2016-01-05 Oklahoma 133 19 5 26.3% 125 81 64.8%
2016-01-13 Lausanne 125 20 16 80.0% 145 97 66.9%
2016-01-18 Brisbane 120 20 14 70.0% 165 111 67.2%
2016-01-21 Melbourne 117 27 6 22.2% 192 117 60.9%
2016-01-21 Florida 117 25 8 32.0% 217 125 57.6%
2016-01-28 Auckland 111 20 7 35.0% 237 132 55.7%
2016-02-16 Online 91 26 8 30.7% 263 140 53.2%
2016-02-22 UC Davis 85 23 9 39.1% 286 149 52.1%
2016-03-09 U Washington 69 14 2 14.2% 300 151 50.3%
2016-04-13 online 34 33 1 3.0% 333 152 45.6%
2016-04-17 North West U 31 23 0 0.0% 356 152 42.7%
2016-05-04 Edinburgh 13 15 0 0.0% 371 152 40.9%
2016-05-11 Toronto 6 27 0 0.0% 398 152 38.1%

One of our goals for this year is to lower the majority completion time from four months to three; another is to increase the throughput from two thirds to three quarters. What I'd really like, though, is some help figuring out what statistical model to use for the other important aspect of our training and mentoring: how many of the people we train go on to actually teach workshops, and how quickly.

The data we have includes the following for each person:

  • unique personal identifier (we can easy anonymize individuals)
  • date(s) of the instructor training courses they took (someone may enroll, drop out, enroll again, and so on)
  • date(s) on which they were certified (they may have qualified for Software Carpentry and Data Carpentry at different times)
  • the date on which they taught their first workshop (if any)

"Mean time to teach first workshop" isn't a good metric, since roughly 1/3 of the people we've trained haven't taught yet. Should we use an inverted half-life measure, i.e., how long until the odds of someone having taught hit 50%? Or would something else give us more insight? Whatever we choose needs to be robust in the face of a big spike in our data in January 2016, when we retroactively certified a big batch of Data Carpentry instructors. If you have suggestions, comments on this post would be very welcome.

Dialogue & Discussion

Comments must follow our Code of Conduct.

Edit this page on Github