In the age of data intensive astronomy, software has come to be regarded as an instrument, and the question of when and whether code should be shared in astronomy (and indeed across all scientific disciplines) is a topic of debate and discussion. A splinter session at the 221st American Astronomical Society meeting in Long Beach, CA, organized by the Astronomy Source Code Library (ASCL) and attended by 43 persons, discussed this topic. The session took the form of short presentations by a panel consisting of Omar Laurino (CfA), Peter Tueben (Univ of Maryland), Alice Allen (ASCL), Bob Hanisch (STSCI), and Bruce Berriman (IPAC, Caltech), followed by a vigorous discussion of the issues raised by the presentations. A summary of the meeting has been posted on the ASCL blog, and you may download a PDF of the slides here: Astrophysics Code Sharing? AAS Splinter Mtg, Jan. 2013
After an introduction by Peter Teuben on the history of the ASCL, Laurino gave four reasons why code should be shared:
- Reproducibility of Results (requires only that codes are published as “black boxes”)
- Software Robustness (addressed by Open Source licensing)
- Software Reusability (also addressed by Open Source licensing)
- Transparency (addressed simply by having code available, regardless of license).
Bob Hanisch described how software is an essential part of scholarly publication, emphasizing that “Research results should be transparent, supported by the data and methods, and reproducible.”
Alice Allen described the Astrophysics Source Code Library itself, on-line reference library for source codes that have been used to generate results published in or submitted to a refereed journal. The ASCL is a volunteer effort. She described some sociological changes that she is starting to see within the ASCL:
- Coders requesting their code be included in ASCL
- Papers citing codes explicitly
- ASCL entries showing up on CVs, publication lists, and in Google Scholar
- ASCL listed in code documentation
Bruce Berriman described how funded efforts other fields have been able to make greater advances than is the case in astronomy. He gave examples from neuroimaging and biostatistics, and described one repository that goes beyond code sharing and allows users to tun the code: RunMyCode.org.
These are a summary of the lively discussion:
- Code can be shared but code is also dynamic – it may change, and will change, from what is published. Newer versions may offer big improvements. But that can came with its price: defects can be introduced by others editing the code, who then blame the original coder if they fail. One example was cited where the coder now only supports changes made by special request.
- There are institutional barriers and intellectual property rights issues that seriously impede coding. Are there ways to pressure institutions to allow release of code?
- Code often needs to be cleaned before being made public – this is work for the coder, who does not have the time to make the changes. It is tedious to make code production quality and to explain all the caveats.
- We need to make sure we understand the distinction between validation of results, which can just require releasing the executable, and delivering code that can be extended by others.
- The issue of users blaming original coder for changes they didn’t make was brought up. The CRAPL license was denigrated – any license can say original coder is not responsible for changes.
- There was suggestion that referees should say whether the code supporting a paper should be released. This is one part of a broader issue of how to bring about sociological change. Publications and funding agencies may eventually develop policies and rules for code release, especially for code developed with public funding.
- The topic of software engineering skills was brought up, and it was agreed these need to be improved. Several attendees pointed out that it is not that hard to do. Intensive classes such as those offered by Software Carpentry are a big step in the right direction. The paper by Wilson et al. on “Best Practices for Scientific Computing” is an excellent good starting point – it emphasizes the need for e.g. versioning and unit tests, and these were discussed explicitly in the session.
Quote of the session: “No matter how bad your code is, I have seen worse.”
Disclosure: I am a member of the ASCL Advisory Board.