Reproducible Data-Driven Discovery
This post originally appeared on the Data Carpentry website
I spent the two weeks in January hanging out with some awesome scientists who are all passionate about the future of science. I was participating in two professional development events with support from Data Carpentry, and I’d like to share some of the highlights.
A Curriculum Development Hackathon for Reproducible Research using Jupyter Notebooks
On January 9–11, 2017, I attended my first hackathon at the Berkeley Institute for Data Science! The event was organized jointly by Data Carpentry and the Jupyter Notebook project. The goal of the hackathon was to develop a two-day workshop curriculum to teach reproducible research using the Jupyter Notebook. There attendees were a group of 25 scientists from the US, Canada, and the UK with diverse backgrounds with a unique set of skills and expertise. I was one of a handful of attendees that uses R Markdown more than iPython or Jupyter Notebooks; however, after seeing the notebook’s power and utility, I’m really excited about adding this to my reproducible workflow.
On the first day of the hackathon, we all sketched out the general workshop overview and learning objectives. Then, we broke out into small groups to design the specific lessons. I worked closely with Erin Becker, Elizabeth Wickes, Daniel Soto, and Mike Pacer to develop the lesson on publication and sharing. This particular lesson focuses on exporting reports for sharing, best practices for documenting your workflow, best practices for using metadata, and using DOIs and ORCiD to get credit for your scholarly work. Even thought the workshop curricula is not completely polished and ready to teach, we are all very proud of the made significant progress we collectively made. You can view workshop website here.
This curriculum is still being developed and revised on an ongoing basis. Want to contribute? If you are interested in helping with the development, have a look a this list of GitHub issues to see what is happening and what needs to be done. We’d appreciate your contributions.
Data-Driven Discovery Postdoc and Early Career Researcher Symposium
On January 17-21, 2017, the Gordon and Betty Moore Foundation hosted the Data-Driven Discovery Postdoc and Early Career Researcher Symposium. Over 50 young investigators supported from 14 different time zones gathered at Waikoloa Beach, Hawaii to network and discuss challenges and opportunities for research and careers in data science. The symposium was of the “un-conference” style that promoted group discussions among like-minded attendees and deemphasized traditional panels and speakers.
Each day the participants engaged in ice-breaker activities that gave us a chance to meet and get to know nearly everyone of the attendees. You might think that it’s a little cheesy to introduce yourself and also say your favorite comfort who or which famous person you share a birthday with, but I was pleasantly surprised at how often those bits of helped the participants get to know each other better. Another favorite icebreaker was the living poster session, where we spent about an hour illustrating our research or teaching and then another two hours learning more about everyone’s interests.
All participants played a major role in crafting the agenda by pitching and then attending “birds of a feather” breakout sessions. You can see the diversity of suggested topics by viewing the open and closed GitHub issues or the session notes. One day I participated in a breakout session about science communication. It was awesome to hear how everyone struggled with and/or managed the tricky balance of doing science and communicating science. To report back to the group, we listed some challenges and resources for science communication on big pieces of white board paper, which you can view here. The next breakout session I attended was about science activism. It was a little unfortunate the symposium conflicted with the presidential inauguration and women’s marches, but some of us stayed very engaged in what was happening five time zones away. The 15 or so of us in the activism group (for lack of a better word) are committed to staying in touch to share news and opportunities for promoting science awareness and literacy in our local and global communities.
Overall, the symposium was #MooreUseful and #MooreInspiring than I anticipated. On of the more useful things (in my opinion) was an around-the-room discussion of each person’s favorite new tool; take a look at this list to see the kind of tools and methods we shared. It was so inspiring to learn what the other grad students, postdocs, and research scientists were working on and to hear their career struggles and successes. I was able to synthesize tons of ideas for my future research and career, and my eyes have been opened to more of the challenges and opportunities that data-driven researchers are facing.
Reproducible Data-Driven Discovery
I’m not sure if anyone has already coined the phrase “Reproducible Data-Driven Discovery”, but I think its an awesome way to summarize these two events and the communities that made them happen. The Moore Foundation funds researchers who do science with lots and lots of data, and Data Carpentry and Project Jupyter are two of the Moore-funded organizations that are helping make sure the data-driven research is freely available, open access, and reproducible. I can’t wait to see all the new awesome things that these communities create and build!
Thanks!
I especially want to thank Tracy for the opportunity to attend both of these events. I thank Hilmar Lapp, François Michonneau, Jasmine Nirody, Kellie Ottoboni, Tracy Teal, and Jamie Whitacre for organizing the Hackathon and Chris Mentzel, Carly Strasser, and Natalie Caulk for organizing the symposium. I thank everyone who participated and helped make these events awesome! I thank Laura Noren for feedback on an earlier version of this post.