Astrophysics Source Code Library: Where do we go from here?

This week I am featuring a guest post by Ms. Alice Allen, the Editor of the the Astrophysics Source Code Library, an on-line index of codes used in astronomical research and that have been referenced in peer-reviewed journal articles.  The post is essentially a talk given by Ms. Allen at the recent ADASS XXIII meeting. The impact of the ASCL is growing – a poster by Associate Editor Kim DuPrie at ADASS XXIII showed that there are now 700+ codes indexed,  and quarterly page views have quadrupled from Q1/2011 to 24,ooo. Researchers are explicitly citing the code in papers that use the software, the ADS is linking software to papers about the code, and the ASCL is sponsoring workshops and discussion forums to identify obstacles to code sharing and propose solutions. And now, over to you, Alice:

Good afternoon! I’m going to give you some brief information about the Astrophysics Source Code Library itself, and then talk about why we do things the way we do them and where we hope to go, what changes we’d like to see in the resource itself down the road.

The Astrophysics Source Code Library, or ASCL, is a free online registry for source codes used in research that are of interest to astrophysicists and astronomers. It exists to make codes more discoverable for examination.

Transparency of research is the guiding principle. There are other good things that come out of making codes discoverable; we hope also to foster more recognition of the contributions codes and their authors make to the community and greater efficiency in the science with code reuse. But the guiding principle is increasing the integrity of the science by increasing the transparency of computational research methods.

The ASCL was started in 1999 by Robert Nemiroff and John Wallin as a repository of codes. In 2010, it was decided the resource should move to the php bb set up as the APOD discussion forum, and that it should be a registry rather than a repository. Nemiroff enlisted my help; after moving the existing codes to the new site, I started adding codes. The ASCL retains the ability to house codes, but does not require they be deposited.

slidew2
In early 2011, Peter Teuben and I formed the Advisory Committee for the ASCL; you likely recognize many of the names on the committee. Later that same year, Kimberly DuPrie came on board as the associate editor; she has a great poster here at ADASS called Incite to Cite! that provides additional information about the resource. Please stop by to see it if you haven’t already!

In January, 2012, ADS started indexing ASCL entries. This makes codes more discoverable, and because codes have stand-alone entries, ADS entries for papers which use codes can have links from their entries to the code entries. Indexing by the ASCL and ADS also provides a way for codes that do not have papers describing them to be cited and have those citations tracked.

Success for us is this: you read a paper, want to see the code, click a link or two, and can look at the code for its underlying assumptions, methods, and  computations. Alternately, if you want to investigate an unfamiliar domain, you can peruse the ASCL to see what codes have been written in that area.

To that end, we are active in the community in fostering code sharing; yesterday’s BoF was one example, and with the AAS’s Working Group on Astronomical Software, we are co-sponsoring a Special Session at January’s AAS meeting.

slide1

So let’s look quickly at a code entry in the ASCL; here’s the one for AAOGlimpse. The code entry consists of five fields:
The code name
A description of the code; this was originally a copy of the abstract for the code paper, but more recent entries are brief descriptions of the code instead of abstracts.
The name of the code author
The URL for the code’s website or download location
And a unique identifier assigned by the ASCL, the ascl ID. This number reveals when the code was assigned an ID – month and year – and is used to build the permalink for the code, just as astro-ph does for pre-prints.
We also usually include a link to a paper describing a code or using the code. This is not intended to serve as the citation for the code; I originally started including this for my own benefit, so I would know from where I’d gotten information on the code. It serves now as a source of more information about the code or its use in research.

We register codes rather than serve as a repository because code authors generally like to keep their codes close to them (for many reasons); we can act as a repository, however, and welcome code deposits. We keep the metadata light because we point to other places where the metadata exist and are more likely to be updated.

Previous efforts similar to the ASCL have demonstrated that coders will not maintain metadata on an external site. Previous efforts have also demonstrated that coders generally will not add their codes to such a service for many reasons, including not knowing the resource existed and not wanting to make an effort for something that may not have had much payback. We hope over time that more coders will add their codes, but realize there has to be good reason for them to make the effort to do so. We are working hard to give them those good reasons!

We don’t have DOIs at this time because of cost, but do have a unique identifier (the ascl ID) for codes which are vetted by an editor and indexed by ADS. ADS can link to the code entry because of this identifier, and pick up citations using it if the citations are properly formatted.

Codes do not need to be open source to be registered in the ASCL; any licensing is okay. We have GPL, Creative Commons, copyrighted codes, open source,  and everything in between. We don’t care if codes are messy; if it generated results, was used in research, we want it. And we don’t care how a code is cited.

We do care whether codes are discoverable, however. We also care that they are cited, as that helps a code author demonstrate that his work is valued/valuable. Nemiroff recently wrote to me that “codes take longer to write, typically, and are more scientifically important, arguably, than individual papers.” We’ve seen codes presented here at ADASS that have enabled a tremendous number of discoveries, such as the WISE Moving Object Pipeline Sub-System presented by John Dailey.

We want codes to be citable, cited, and discoverable.

The ASCL is facing some challenges. The current infrastructure is usable and easily maintained, but is a mix of what was available at the time and whatever I in my ignorance and/or hubris could pull together to add to it. The code entries themselves are on a phpbb, we use WordPress for a front end and blog, and I use Excel for tracking and building the ADS upload files. The infrastructure needs improvement.

We look for codes, and finding them is not as big a problem as finding all the necessary elements for inclusion in the ASCL, such as a research paper which uses a code or a code download site. We have dozens of code entries written up which need only one or the other of these elements to be moved into production.

And we lack both time and money. The ASCL is a volunteer effort, and those of us working on it do so around our full-time jobs and other obligations.

We have to strike a balance between what we might want to do and what we can do. We need to keep the ASCL lean enough to be managed and usable with volunteer effort to ensure sustainability over the long term. We recognize the deficits in the current model, but the current model beats no model. (Perfect is the enemy of good.)

We need enough metadata to identify the code accurately and without ambiguity, and to find it, to link to it. Because the ADS indexes ASCL entries, the information used by ADS remains static. Data that for a code may change over time, such as language and version, are not currently tracked in the ASCL; the curation effort to keep such metadata accurate is prohibitive. Complex searching which depends on more robust metadata cannot be done, either, since we don’t have that metadata.

We weigh everything we do against the goal of code discoverability, and that which takes away from that goal – where is the code that made these results possible – often comes at too high a price for us to do.

Though we cannot do everything, we can keep the ASCL functional and usable by increasing the number of codes and maintaining the links to code sites. Kim DuPrie has written a nice link checker in Python and runs that weekly; links that fail are verified and if necessary replaced. We want to coordinate with journals to better make the computational methods published research depends on readily available to those who want to examine the methods, and we want researchers to know about the ASCL. In late 2011, we started a campaign that includes speaking at conferences, emailing code authors, and using links from APOD to the ASCL to increase awareness of the resource, and the ASCL has been advocating code release by participating in workshops and writing articles. We want code authors to receive recognition for their work; the ASCL provides a way to cite codes without requiring a paper be written about the code. We can archive codes; the ASCL houses a handful of codes now, and we want to serve as a repository for those codes whose authors do not wish to maintain a website.

And we have to remain manageable with little or no funding. The ASCL is fortunate that Bob Nemiroff’s institution, Michigan Technological University, houses the ASCL at no charge. The ASCL received some support last year from the AAS for the awareness campaign, but other than that, has been unfunded.
Eventually, we would like a new infrastructure for code entries with additional features such as an API and better search capability,  and that looks more professional (because this really matters to some folks). We’d also like to automate some of the things Kim and I do by hand.

Our current set-up has some features I really like and would like to retain in a new infrastructure. We have a full-text iterative search, people can post comments by replying to a code entry thread, and the forum allows people to subscribe to a thread or the ASCL forum itself and be notified if there are additions to either. In addition to that, the service will stay at MTU. It’s been a good home for the ASCL and I don’t anticipate that changing.

Kim’s poster has charts which show the growth of the ASCL over time, and that usage of the resource has been growing steadily. Here is a chart I made in March which shows usage by country for about a three-week period this past March. Hits on the ASCL came from eighty-three countries; as you can see, 60% of the hits came from outside the US, so it appears that the resource is being broadly used, or at least explored!

slide 3slide4

This entry was posted in astroinformatics, Astronomy, cyberinfrastructure, High performance computing, informatics, information sharing, Internet, programming, Scientific computing, social media, software engineering, software maintenance, software sustainability, user communities, Web 2.0 and tagged , , , , , , , , . Bookmark the permalink.

Leave a comment