This post originally appeared on the Software Carpentry website.
For the last three years,
I've been storing information about instructors, workshops, and other things
in a small SQLite database
so that I can look things up and generate statistics when I need to.
I can't publish it,
since it contains personal identifying information,
but since I had to write a script to migrate the data to
the tool we're building to manage workshops,
it only took another few minutes to create a partly-redacted version of the data.
("Partly" because someone who was really keen could work backward workshop URLs to instructors' names,
cross-reference,
and recover the names of some fraction of our instructors.
Since that information is all public anyway,
though,
I don't think I've introduced any new risks.)
The SQL source for the database is here;
with it,
you can regenerate the database using:
$ sqlite3 swc.db < swc-db-2014-12-14.sql
You can then ask lots of questions—some examples are included below.
If you'd like a little end-of-year procrastination,
what else can you find in this data that's interesting?
-- How many sites have had how many events?
select count(*), c
from (select count(*) as c
from site join event
on site.id=event.site_id
group by site.id)
group by c
order by c desc;
number | count |
---|
2 | 8 |
2 | 7 |
2 | 6 |
3 | 5 |
4 | 4 |
8 | 3 |
36 | 2 |
117 | 1 |
|
-- How many people have taught?
select count(distinct person_id)
from person join task join role
on person.id=task.person_id and task.role_id=role.id
where role.name='instructor';
-- How many people started as learners or helpers and became instructors?
-- How often have people taught?
from (select count(*) as c
from person join task join role
on person.id=task.person_id and task.role_id=role.id
where role.name='instructor'
group by person_id)
group by c
order by c desc;
# instructors | # workshops |
---|
1 | 43 |
1 | 14 |
3 | 12 |
2 | 11 |
2 | 10 |
5 | 9 |
5 | 8 |
4 | 7 |
5 | 6 |
16 | 5 |
17 | 4 |
32 | 3 |
50 | 2 |
138 | 1 |
-- How has each training cohort done?
select cohort.name, count(*), round((100.0 * sum(trainee.complete)) / count(*), 1)
from trainee join cohort
on trainee.cohort_id=cohort.id
where cohort.qualifies
group by cohort_id;
name | enrolled | completion %age |
---|
2012-08-26-online | 20 | 55.0 |
2012-10-11-online | 25 | 44.0 |
2013-01-06-online | 12 | 16.7 |
2013-03-12-online | 27 | 48.1 |
2013-05-12-online | 45 | 26.7 |
2013-08-12-online | 41 | 43.9 |
2013-09-30-online | 57 | 31.6 |
2014-01-16-online | 67 | 22.4 |
2014-04-24-online | 58 | 31.0 |
2014-04-28-mozilla | 43 | 65.1 |
2014-06-05-online | 29 | 10.3 |
2014-06-11-online | 59 | 27.1 |
2014-09-10-online | 81 | 29.6 |
2014-09-22-uva | 31 | 22.6 |
2014-10-22-tgac | 41 | 26.8 |
2014-11-12-washington | 20 | |
2015-01-01-online | 135 | |
2015-01-06-ucdavis | 4 | |
2015-05-01-online | 113 | |