Software Carpentry Lessons Learned – A Talk at Sci Py2014

This is a video of a highly entertaining presentation by Greg Wilson, the founder of Software Carpentry, which teaches lab skills for scientific computing through intensive “boot camps” and on-line resources. The story of how Software Carpentry came about alone makes this worth watching (I won’t spoil things – it’s near the beginning).  The video is based alebeit loosely on an article called Software Carpentry:Lessons learned in F1000Research, an Open Science Journal primarily aimed at researchers in Life Science. The paper and the presentation are refreshing for their honesty about what has worked and what has not.


Scientists are generally not taught the skills to design, write, test, debug, install and maintain software: they often have no more than a class or two in programming (I know that’s all I had). Software Carpentry has been aiming to remedy this, first through on-limne tutorials and videos, and now through a series of short intensive Boot Camps, 91 in all since 2013, attended by 3,500 persons.

The content and organization of these Boot Camps have been honed through some hard-earned lessons. On-line resources attracted up to 2,000 visitors a month, but were considered too easy to to be awarded credit in computer science classes, were considered unhelpful in getting science done, and did not induce people to contribute or update material. Early week long classes were considered too long and presented textbook software engineering that scientists generally don’t find useful. Efforts to hold Massive Open On-line Courses (MOOCs before the acronym took hold) were not that successful either – few people contributed material and only 10% completed the class (which later studies showed to be typical of MOOCs).

The big change came in 2012, with the introduction of short, intensive courses held on-site, and with the change in format came a change in content: out went the textbook material, and in came tools that would help scientists and instruct them in many of the methods of software engineering.  These tools are intended to develop computational competence rather than make people experts. These tools are disarmingly simple, and I quote from the lessons learned paper.

  • The Unix shell. We only show participants a dozen basic commands; the real aim is to introduce them to the idea of combining single-purpose tools (via pipes and filters) to achieve desired effects, and to getting the computer to repeat things (via command completion, history, and loops) so that people don’t have to.

  • Programming in Python (or sometimes R). The real goal is to show them when, why, and how to grow programs step-by-step as a set of comprehensible, reusable, and testable functions.

  • Version control. We begin by emphasizing how this is a better way to back up files than creating directories with names like “final”, “really_final”, “really_final_revised”, and so on, then show them that it’s also a better way to collaborate than FTP or Dropbox.

  • Using databases and SQL. The real goal is to show them what structured data actually is (in particular, why atomic values and keys are important) so that they will understand why it’s important to store information this way.


This entry was posted in astroinformatics, computer videos, Computing, computing videos, cyberinfrastructure, informatics, information sharing, programming, R, Scientific computing, SciPy2014, software engineering, software maintenance, software sustainability, user communities, Web 2.0 and tagged , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s