Nick Barnes wrote a fascinating article in Nature News entitled Publish your computer code: it is good enough. He makes a simple and cogent argument that scientists publishing their research should also publish the code used to generate the results. Even though the code may messy and poorly documented, publishing it creates a fuller record of the scientific process and provenance of the data sets. Others may improve the code for the benefit of the wider community, and cites the example of his involvement in rewriting somewhat messy climate change code.
As an astronomer and a computer scientist, I have been on the receiving end of releasing code, so I thought I would express my opinions here. I tend to regard the release of code as part of a wider issue of preserving records of the scientific process so that it can be reproduced and improved. Data products generated in research and published in journals should be curated digitally and made accessible to astronomers along with the code that generated them.
I am inclined to think that, in the first instance, preserving the data products is more important than releasing all the pieces of code used to derive them. Data sets whose attributes properly documented are immediately useful to other researchers. By attributes I mean quantities such as telescope, wavelength, position and so on.
Unfortunately, we haven’t made a lot of progress in preserving data. The trouble is, astronomers get credit for publishing papers and not for releasing data. And, astronomers often don’t have the tools to create and record the attributes needed to make their data useful. Why do all the extra work for no credit? The newly funded Virtual Astronomical Observatory (VAO) is, however, carrying out a pilot project on data preservation that will develop web-based tools to make it easy for astronomers to create fully-described data sets that comply with common standards.
But back to the code. Would releasing all the code used to create products be useful? Many astronomers write custom scripts that call libraries that are often public anyway – IDL users do this all the time; I know I do. I am unconvinced that releasing code such as this is truly a great benefit to the community, beyond the goal of exposing the full “path” taken to release data.
Analysis tools are a different kettle of fish. Let’s say someone develops new code to fit baselines to time series data and extract the signal of a transiting planet. Given the high rate of false positives in planet hunting, I think it important that such code is released so that the community can better assess the results and improve the code. Just as important, the algorithms used in these codes should be fully documented, preferably in papers, and the sources of methods used in these algorithms should be made clear (e.g. we made use of the KS test described in “Statistics for Geeks”, by Smith and Jones, 3rd edtion, 2009, page 65).
But will astronomers do this? Well, they would have to see that releasing the code would benefit them more than keeping it private and retaining what they see as a competitive advantage. Astronomy is after all is a small, intensely competitive field. Barnes’ article has many good points, but he does not address this issue of competitive advantage vs public good. My view? Yes, I think science would gain by releasing software, but it will only happen if mandated by funding agencies, or there is a change in reward system in astronomy. The desire to advance science is a powerful driver for scientists, but it isn’t half as powerful as the force of human nature.