The dotAstronomy 4 conference included a number of what are called “unconference” sessions, which are organized at the conference rather than as part of an agenda. I organized such a session on how to provide software engineering training for scientists: our goal was to pool our ideas and come up with what we thought were best practices for software training. I have made several posts on the topic on this blog: http://wp.me/pU7SH-p9, http://wp.me/pU7SH-nU, http://wp.me/pU7SH-gl and http://wp.me/pU7SH-bd
The attendees included:
- Bruce Berriman – Coordinator
- Chris Beaumont
- Tom Robitaille
- Adrian Price-Whelan
- Demitri Muna
- Gabe Perez-Giz
We agreed that formal training is needed because:
- Scientists spend more time coding than ever before
- There is a growing pressure to release code that was used to perform the analysis in a publication and there is therefore a need for code to be robust, easy to understand
- The “Data tsunami” means that distributed processing and high performance computing skills needed
- Most students are not taught programming in a formal fashion – many scientists’ first experience of programming is in departments that are not equipped to teach such topics as object-oriented formalism, database design and usage, etc.
Education is needed in two areas:
- Effective software engineering techniques.
- Learning effective techniques for processing big data sets.
We agreed that intensive immersion was best approach, preferably as an immersive “boot camp” for several weeks before a semester begins (adding classes into the semester is too much or students). One of the major goals of these boot camps is to allow students to develop the confidence to continue to develop advanced knowledge following the training.
Two very good examples of “boot camps” are
- Software Carpentry (started by Greg Wilson, University of Toronto). It is very thorough with many useful videos; CM and testing appear particularly well done. The project offers traveling boot camps across the U.S and abroad. They are tailored to local needs, but are usually only two days long and can seem crammed.
- SciCoder (run by Demitri Muna) are 5 days of intensive teaching over 5 days at New York University (but open to invitations/funding to host elsewhere). Demitri is planning to write a book on on the class, and is considering turning SciCoder into a Standford-style iTunes University course. The obvious benefit would be that material becomes available to the whole astronomy community at once.
Locally organized classes may also prove useful of course. Tom pointed out that there was a lot on enthusiasm for a class on Python he wished to hold at the University of Heidelberg.
We did not have time to discuss high performance computing in detail, but we agreed that techniques such as parallelization in Python and the power of workflow managers need to be brought to the attention of researchers.
There is also a need to provide funding for training efforts of the kind we have described. The splendid efforts of volunteers who also have careers as scientists is simply unsustainable in the long term, and may be even the short term.