Workshop at the University of Virginia
This post originally appeared on the Software Carpentry website.
We pulled off our day-long data analysis bootcamp with hardly a hiccup yesterday. The schedule looked something like this:
AM pt 1: | intro to AWS & intro to Unix Shell |
AM pt 2: | data analysis in Unix: alignment, quantitation of RNA-seq data |
PM pt 1: | intro to R |
PM pt 2: | data analysis in R: QC, differential expression, etc. |
I got a grant from Amazon to give all the students in the class a $100 voucher to use AWS. We only used about $1 during the course. Luckily everyone registered and tested their AWS credentials prior to class, and we had everyone successfully connected to their own AWS instance using an image that I created specifically for this course. I had nightmares that half the class would show up unprepared and we'd lose everyone in the first half hour. This was a huge relief that this worked so well.
AM part 1: Jessica did an excellent job with the intro to Unix shell session. This used material that we adapted from some SWC and some Data Carpentry material, but adapted to the data we were actually using in AM part 2. Adapted material was attributed as appropriate.
AM part 2: this was a little bit rushed, but I know some things to fix next time.
PM part 1: I used material I created before SWC started working on R material. I've used this material in the past, and I think it works reasonably well, at least judging from the feedback we've gotten before. But personally I'm becoming less and less happy with it for various reasons, and I may consider adapting some of the SWC or data carpentry R material once these communities decide where they're going with them.
PM part 2: this was also mostly material I had used in the past which has worked well. I started this material about a year ago, and during last spring's Mozilla sprint, Rob Beagrie worked with me to whip some of this into shape for an SWC capstone project in bioinformatics. (I think the PR is still outstanding.) I intend on submitting a PR to data carpentry with this material, and perhaps back to SWC once the repos are split and we start talking about domain-specific capstone examples once more.
There was one particular part of this lesson which generated some confusion: where the column header metadata from one dataset had to match the row header metadata from another dataset, so that the data and metadata could be linked into a single object. When I run this lesson again I'd like to look at other ways of doing this kind of analysis to avoid having to go into this. It's an important concept, but I couldn't do it justice in the amount of time I had.
A few lessons learned/reinforced:
- We had a class of >20 (also, a pretty good gender balance). It would have been literally impossible to do this effectively without two instructors. The pacing in the morning intro to UNIX was as good as could be expected, but several folks would have gotten left behind right from the beginning without someone walking around helping folks individually. Likewise with the afternoon - Jessica saved me a few times where someone would have gotten completely lost from the beginning, yet I didn't have time to help them out individually myself. In the future with a course like this we really need two instructors and a TA that's familiar with everything we're doing, to do this effectively.
- Cough drops: I started helping people set up at 7:30am, and didn't finish until 5:30pm. Again, Jessica saved the day with Ricola lozenges at about 2pm.
- I think things went well, but I might like to limit our scope for something like this again. We had folks in the room who were experts with R, others who were showing me things at the UNIX shell that even I didn't know, and yet others who had never typed anything into a terminal before. It's difficult to have this broad of a distribution in the same classroom.
- As much as I would have liked to have everyone set up software on their own machine, this was impossible to do for AM part 2. We needed to run software that only runs on Linux machines, and we needed >8GB RAM to do it. Most folks don't have this available on their laptop. Thankfully we got AWS working for everyone, but it took lots of e-mail reminders and careful handholding prior to the course to make sure this worked successfully. I'm relatively new to AWS myself, and would benefit from attending some kind of AWS training, or even better, shadowing a more experienced instructor who regularly uses AWS for things like this.
Links to the compiled course materials are here, and sources accessible on GitHub via the usual routine:
- http://bioconnector.github.io/workshops/lessons/rnaseq-1day/
- http://bioconnector.github.io/workshops/lessons/shell/01-intro-unix-shell/
- http://bioconnector.github.io/workshops/lessons/rnaseq-1day/01-alignment-counting/
- http://bioconnector.github.io/workshops/lessons/intro-r-lifesci/01-intro-r/
- http://bioconnector.github.io/workshops/lessons/rnaseq-1day/03-differential-expression/