The major issue I have is that the methods are probably not as reproducible or transparent as they should be – essentially it’s a bit messy for other people to figure out exactly what I was up to when doing something new. It’s not in one place nor is it clearly documented. It also hurts my process in that a lot of the mucking about I do gets lost or it takes time to find. I see this is as a particular problem as I do more web science research where the gathering cleaning and reanalyzing data is a critical part of the endeavor.Carole Goble responded with two good pointers:
With that in mind, I’ve decided to get my act together and follow in the footsteps of the likes of Titus Brown and Carl Boettiger and do more of my science in a reproducible and open fashion.
To do this, I’ve decided to adopt IPython Notebooks as my new note taking environment. This solves the problem of allowing me to try different things out and keep track of all the parts of a project together. Additionally, it lets me “narrate my work” – that is mix commentary with my code, which is pretty cool.
My notebook is on github and also contains information about how my system is setup including versions of libraries I’m relying on.
- To her keynote on reproducibility at ISMB/ECCB entitled results may vary which, as usual with her talks, is full of interesting references and pithy remarks. Paul Groth commented on the talk:
One of the great things about your keynote, was that it made the case that we weren't all horrible people for not doing perfect science but that we should try to be better and there are ways to do it.
- To a blog post by Mike Jackson about their workshop What makes good code good at INTECOL13, one of the premier conferences for ecologists. About the workshop, Carole commented:
about 120 people showed up for a lunchtime workshop that meant missing lunch with about 8 competing workshops in parallel. about 100 outed themselves as coders (mainly R scripts) - about 80% not only hadn't heard of github but hadn't even thought of version control and most hadn't thought of publishing their R scripts despite continually conflating their models and statistics (key to the papers) with the R code itself.
Carole also pointed to Dexy, which seems to be a similar idea for combining code and documents, but agnostic about the language it works with. That seems like an advantage. However, iPython Notebooks are internally represented as JSON, which gives a certain level of preservability; I'm not clear how Dexy would provide this.