Software Citation Implementation in Astronomy

This week’s post is a repost of a  summary of a meeting I attended, held at the 231st AAS meeting in National Harbor, MD, January 2018. The original summary, prepared by Daina Bouquin and Arfon Smith, is at https://github.com/CfA-Library/Cite_Astro_Software/blob/master/whitepaper.md, and is reproduced here with permission.

Software Citation Implementation in Astronomy

A White Paper Summarizing Perceptions from the 2018 AAS Splinter Meeting on Implementing the FORCE11 Software Citation Principles

Daina Bouquin , Harvard-Smithsonian Center for Astrophysics, Cambridge, MA

Daniel S. Katz, University of Illinois Urbana-Champaign, Urbana, IL

Edwin Henneken, Harvard-Smithsonian Center for Astrophysics, Cambridge, MA

August (Gus) Muench, American Astronomical Society, Washington, DC

Katie Frey, Harvard-Smithsonian Center for Astrophysics, Cambridge, MA

Arfon M. Smith, Space Telescope Science Institute, Baltimore, MD

Contact: daina.bouquin@cfa.harvard.edu

Introduction

Software citation is foundationally important to the future of astronomy. Deep intellectual contributions are being made by people creating software to enable scientific research, and it is essential that software creators are encouraged to create these valuable resources. Efforts to help authors receive proper academic credit will allow them to prioritize writing software valuable for the astronomy community within their current profession or the ability to focus their whole career on it. With these facts in mind, on January 11, 2018 a “splinter meeting” was held at the 231st Meeting of the AAS that focused on implementing the FORCE11 Software Citation Principles (Appendix A) in Astronomy. The goal of the splinter meeting was to gather feedback from astronomers, developers, and others whose expertise could inform the creation of resources to support the implementation of those principles. The meeting organizers aimed to identify some of the social and technical hurdles that affect software citation practices in astronomy and to understand factors that distinguish it from other fields where software citation is increasingly essential.

The purpose of this document is to summarize the perceptions expressed during the splinter meeting, and to define a few concrete next steps that can be taken to lower perceived barriers. This is not a comprehensive list of all topics discussed by all attendees at the meeting, but will hopefully serve to instigate further discussion and invite disruptive ideas about how to best share increasingly complex, continually changing, digital research artifacts like software.

Structure of the Meeting

Attendees were organized into five groups of five to seven people before the session began to ensure that each discussion group was made up of people representing a mix of relevant perspectives. The goal was to have representation by software authors, developers, journal editors, indexers, and librarians in each group to the extent that this was possible.

The splinter meeting at AAS was 90 minutes long, beginning with three brief presentations: an update about the current FORCE11 Software Citation Principles Implementation Working Group1; the diversity of software being produced by the astronomy community2; and the on-going Asclepias project3 from the AAS. A grant from the Alfred P. Sloan Foundation to the AAS for the Asclepias project was used to fund the splinter meeting.

Attendees were given prompts (Appendix B) after the presentations and asked to discuss the issues outlined within the prompts. A scribe at each table took notes about the ensuing conversation and presented highlights from the discussion at the end of the meeting.

Recurrent Themes from Discussion Groups

Motivations for Software Citation in Astronomy

The FORCE11 Software Citation Working Group identified four primary motivations for developing the software citation principles in 2016: 1) the need for a complete record of the research process; 2) the need to enable software discoverability; 3) the importance of research reproducibility; and 4) the need to give credit to academic researchers of all levels for the software that they develop4. Throughout the splinter meeting, the discussion groups eventually all came to focus predominantly on what was perceived as the primary motivating factor for implementing the principles in astronomy: giving credit for research software. The groups framed their conversations and focused the majority of their discussion around perceived social and technical barriers that impact people’s ability to give and receive credit for their code. Therefore, credit and the lack of credit academics receive for their software became the driving force behind subsequent discussions. This is not to say that the other motivations were not seen as important to attendees, rather that less discussion in this meeting was linked to discoverability, research reproducibility, or completeness of the scientific record than to credit.

Credit Leaves Many Questions Unanswered

By focusing on credit for academic researchers, issues surrounding authorship, native software citation, and subsequently the complexity authorship takes on in the context of software versioning came to the front. When authorship changes from one version of software to the next, the community highlighted how opaque attempts to implement the principles become. Specifically, the following questions remained unanswered:

  1. Who is a software “author”? The astronomy community doesn’t agree on how much someone should contribute to a code before that person is considered an author. This is true independent of version, but becomes an increasingly complicated discussion when multiple versions are being cited as contributors change over time.
    • Authorship questions highlighted a limitation that software papers present, as it         was noted repeatedly that when a software paper exists, astronomers will overwhelmingly cite the paper rather than natively citing code regardless of whether or not the code they want to cite is the same version as the code discussed in the software paper. This may contribute to software paper authors receiving disproportionate amounts of credit and current contributors not receiving any. This preference for papers also runs counter to the FORCE11 Principle aiming for native software citation.
    • There was also discussion about how to give acknowledgement for contributions that might not fit a definition of “authorship” and how giving all contributors equal credit as authors may serve to dilute the perceived importance of authoring software from external perspectives (e.g. tenure committees, funders, etc.).
      • The meeting attendees saw this issue (and also item 1.i.) as something that could be addressed through education and outreach to external groups, but that a concerted effort would be required from allies within those groups.
  2. What should be cited to properly give author(s) credit? The ways in which astronomers are able to share software are varied (e.g. Zenodo record, AAS software paper, JOSS paper, ASCL record, etc.) and one piece of software may be represented as multiple digital objects (i.e. both as a paper and with a natively citable DOI from Zenodo) each with a different authorship list.More specifically, the following questions came up regularly in this context:
    • When more than one digital object exists, what is the best way to give credit.
      • How should we determine this?
      • Knowing native software citation is the goal, should one also cite a software paper if one exists?
      • How do we deal with multiple forks?
    • Where should researchers look to find preferred citations?
      • How can indexes support researchers looking for preferred citations?
    • How should authors determine the preferred citation for their code?
      • Where should an author make this information apparent (i.e. CITATION.md file5)?
    • How should citations be calculated across different types of digital objects and versions of those objects?
      • Should citations be aggregated?
      • Should transitive credit be used in this context6 or approaches described in the “Metrics Toolkit”7?
  3. What metadata do people need to make available to create a proper citation?
    • Should metadata associated with the use of the code be put in the same place as the metadata required to document the creation of the code? Where should usage metadata be documented?
      • Facilitating code reuse, although not as essential as credit from the meeting attendees perspective, should be encouraged. Metadata needed for reuse may be far beyond what is needed for a citation though.
    • Should software papers require a structured metadata file to improve machine-readability (i.e. CodeMeta.json8)?

Limited Situational Awareness

Outreach was another perceived barrier that came up throughout the discussions. Specifically, attendees expressed that outside of the group gathered for the splinter meeting and a few other select groups within the broader astronomy community, there was very little situational awareness about the importance of software citation. For the purposes of this summary, a person with a high degree of situational awareness in the context of research software citation is a person capable of understanding how implementation of the FORCE11 Software Citation Principles will impact them, both immediately and in the future. Situational awareness of this sort came up in two contexts:

  1. A lack of situational awareness pertaining to software citation by the community at large. This type of awareness was discussed as primarily a social issue in that this lack of awareness could be at least partially addressed through an awareness campaign.
    1. Attendees at the splinter meeting expressed that the Astronomy community as a whole may not possess an intuitive understanding of the importance of thinking critically about how and when to cite software and may not fully understand what they should be doing if they want people to cite their own software. As a result, it is not obvious to the community the ways in which research software deviates from a digital object like a research article or image.
      • This lack of awareness may lead to apathy among members of the community who are not seeking credit for their software.
        • Apathy may also stem from the misperception that software presents unresolvable problems for scholarly communication rather than being representative of issues that need to be addressed for all digital objects.
      • Limited awareness about the importance of software citation may exacerbate the problem of software not being seen as a scholarly contribution by external groups (e.g. tenure committees, funders, etc.).
        • Power dynamics can also prevent software authors from pushing back against these perceptions or pushing for software citation to be part of any formal curriculum or policy discussion.
      • A noted lack of awareness was also discussed when splinter meeting attendees touched on publisher’s editorial practices.
        • Meeting attendees discussed the need for more targeted education on software citation best practices for reviewers and the need for more qualified referees.
    2. Uncertainty about expectations from both the community and publishers perspective. The below questions about expectations could also be clarified through an awareness campaign and specific guidelines:
      What level of support is expected from software authors who share their code? (e.g. Code reuse in new contexts?; Feature requests?; Documentation?)
      What is the expectation of “code quality”? (i.e. Is “hacky” code something that should be shared?)
      Should scripts be shared or only packages that are meant to be reused?
      If the primary motivation of sharing/citing code is giving credit, where does Open Science fit? (e.g. Is one expected to ever cite individual scripts?; Can you cite software that doesn’t have a “title”?; How do you share and point people to code that is meant for individual use? )
      Without clarifying this point in particular it is easy for people to become fixed on reproducibility issues (i.e. “Perfect becomes the enemy of the good.”)
      What are publishers supposed to do to educate referees?
      To what degree can automation support referees? (e.g. What can be done to flag papers that should have software citations?)

Tools and Other Resources

Throughout the splinter meeting, attendees also took time to brainstorm about possible tools that could support software citation implementation. Ideas for tools were diverse, but primarily focused on how automated tools could be used to initiate human action by authors and reviewers. The motivation behind developing tools in this context was to find ways to make software citation straightforward and to make giving credit for research software as much a part of writing a paper as citing an article or data source. Tool ideas were not vetted for feasibility, nor was there any consideration of who should create or fund these types of tools.

Ideas for tools included (in no particular order):

  • “Figure Tags” – A way to automatically embed a software citation in figures and/or generate the current preferred software citation when the figure is generated (e.g. figures created using corner.py).
  • “Flag for review” – Upon submission, a script could cross reference an article’s text against a list of keywords or phrases that would indicate that a software citation is expected and trigger an alert for the reviewer.
  • “Look up service for preferred citations” – Could cross-reference known sources of preferred citations and GitHub for CITATION.md files.
  • Wizard to help code authors create CITATION.md files and/or CodeMeta.json files.

Splinter meeting attendees also pointed out a number of other resources that would be useful that were not geared toward automating publication processes. These included:

  • Clear, brief, best practice guidelines for software citation addressing the spectrum of research software including example citations.
    • These could be general enough to be helpful for reviewers and authors and answer many of the above noted questions,
    • These guidelines would need to refer out directly to relevant publisher guidelines or be augmented by publishers as needed.
      • Could act as a template for publishers who do not currently have guidelines in place for software citation.
      • Could be used to recommend changes to current citation style guides and reference management tools.
    • Guidelines could include recommended practices that would support software reuse and give general direction to resources on software licensing.
  • LaTeX templates that include links to software citation guidelines for specific publishers.
  • The splinter meeting attendees also brought up resources that are not specific to software citation implementation in Astronomy, but do touch on issues that impact the ability for academics to make a career writing software, which is fundamentally why giving credit for software is so essential. These included a lack of designated funding mechanisms/opportunities to support software development in astronomy. Attendees also highlighted their perception that funders do not require or fund software “preservation” and archiving in the same way that they may require for the preservation of data. It should be noted though that these perceptions may stem from limited awareness about existing policies from funders like the NSF, which does require software management in the same way it requires data management as it is considered to be a type of data.

Editorial Policies

Discussions during the splinter meeting often moved toward publisher editorial policies and how the astronomy community could influence them. Although it was noted repeatedly that the AAS has reasonable guidelines for publishing software papers and citing software, many other journals do not and might inadvertently create disincentives for citing software by limiting the number of citations authors can include in their papers. Moreover, meeting attendees lamented the limited attention that referees tend to give to software citations as they pertain to software version and use of preferred citations. Some attendees pointed out instances where articles that cited a software paper, or version of software, which predated their contributions would sometimes include them in the acknowledgement section and sometimes not. Editorial policies may not give specific enough guidance to publishing staff or assume that best practices are obvious. Increased dialog about editorial policies is needed between members of the astronomy community who are knowledgeable about software citation issues and journal editors who can influence policies for their respective journals.

Limitations

The information presented in this document was gathered by using an unsystematic and unstructured approach and will miss aspects of the conversations had during the AAS splinter meeting. Aside from this reality though, there are other important limitations to highlight from this process:

The group selected to discuss barriers to implementing the FORCE11 Principles at the splinter meeting were people already involved in conversations about software citation in astronomy. They therefore likely do not represent the perspectives of the broader community who may have much more limited awareness about these issues. Anecdotes made up the majority of examples given during conversations and those examples may be outliers despite efforts to focus on issues that impact the community more holistically.

Another issue that will impact any next steps recommended within this document is the fact that no “solutions” will be comprehensive, seeing as the entire premise of responding to community needs sets up an inverse problem wherein we are attempting to determine causes based on a limited set of observations. This fact should not prevent steps from being taken. Scholarly communication ecosystems are multifaceted and “solutions” will be iterative.

Finally, this document did not focus on intellectual property issues surrounding software and software citation. Discussions during the AAS splinter meeting only peripherally touched on intellectual property, licensing, and software reuse.

Next Steps

It would be advantageous to develop and promote brief guidelines to inform software authors in astronomy of what they can do to help ensure they are given credit for their work within the current scholarly communication landscape. These guidelines could point out simple steps to make their code more citable9 and clarify some of the confusion about author expectations. The guidelines could also explain common methods for publishing software and address citation issues for each case. The guidelines could then be iteratively updated as questions posed by the splinter meeting attendees are addressed by the community. A separate set of short guidelines could be created on how to cite software and where to look for citations. In order for any of these guidelines to be impactful, they would need to include real examples and be promoted by institutions and individuals in as many settings as possible. Any guidelines created for the community would be a foundation on which an awareness campaign could be built, serving as a focal point for future debate and discussion.

In addition to concrete guidelines specific to astronomy, it would be valuable to develop a survey of publisher’s policies and editorial practices that impact software citation in the field. A survey of this sort could then be used to frame conversations with different publishers and indexers about which tools would be most helpful in enabling implementation of the FORCE11 Principles and inform updates to guidelines. The survey would also give publishers and stakeholders data that could support changes to editorial practices.

Moreover, it would be useful to design a study on self-citation practices in astronomy in order to better understand how software authors currently share their own work with the broader scientific community. A study of this kind could act as a standard for comparison to better understand how citation practices change over time.

Future Discussion Needed

Future conversations about software in astronomy are needed to address issues surrounding intellectual property, licensing, and copyright in particular. Research software preservation and resources to support preservation should also be discussed further in the context of software reuse.

Appendix A

The FORCE11 Software Citation Principles

https://doi.org/10.7717/peerj-cs.86

  • Importance: Software should be considered a legitimate and citable product of research. Software citations should be accorded the same importance in the scholarly record as citations of other research products, such as publications and data; they should be included in the metadata of the citing work, for example in the reference list of a journal article, and should not be omitted or separated. Software should be cited on the same basis as any other research product such as a paper or a book, that is, authors should cite the appropriate set of software products just as they cite the appropriate set of papers.
  • Credit and attribution: Software citations should facilitate giving scholarly credit and normative, legal attribution to all contributors to the software, recognizing that a single style or mechanism of attribution may not be applicable to all software.
  • Unique identification: A software citation should include a method for identification that is machine actionable, globally unique, interoperable, and recognized by at least a community of the corresponding domain experts, and preferably by general public researchers.
  • Persistence: Unique identifiers and metadata describing the software and its disposition should persist—even beyond the lifespan of the software they describe.
  • Accessibility: Software citations should facilitate access to the software itself and to its associated metadata , documentation, data, and other materials necessary for both humans and machines to make informed use of the referenced software.
  • Specificity: Software citations should facilitate identification of, and access to, the specific version of software that was used. Software identification should be as specific as necessary, such as using version numbers, revision numbers, or variants such as platforms.

Note: Bolding is not part of the formatting in the original publication of the principles.

Appendix B

Discussion Prompts

Given the Software Citation Principles as a Goal:

  • What are the most substantial barriers you experience when citing software?

 

  • What resources have you used to help you to cite software? (e.g. use cases, blogs, people, reference management tools, etc.)
  • What resources are still needed in astronomy?
  • What specific tools would be helpful when authoring papers and writing new codes? (e.g. LaTeX macros, CITATION file template on GitHub, etc.)\
  • What do you think would most impactful in raising awareness about software citation issues in astronomy?

References

  1. Katz, D. S. (2017). Software Citation: A Solution with a Problem. Retrieved from https://www.slideshare.net/danielskatz/software-citation/danielskatz/software-citation
  2. Smith, A. (2018). Spectrum of Research Software. Retrieved from https://stsci.app.box.com/s/ooamf3ds1fxcqq7nnhgpnbjgi0sra031
  3. Muench, A. (2018). Implementations and provocations from the asclepias project. Retrieved from https://drive.google.com/file/d/1LB1arge6mMSHz1Lgxu6wn8Nr-Rb1LYNw/view?usp=embed\_facebook
  4. Smith, A. M., Katz, D. S., & Niemeyer, K. E. (2016). Software citation principles. PeerJ Computer Science, 2, e86. https://doi.org/10.7717/peerj-cs.87
  5. Druskat, S., Bast, R., Chue Hong, N., Konovalov, K., & Rowley, A. (2017). A standard format for CITATION files. Software Sustainability Institute: Software and Research Blog. Retrieved from https://www.software.ac.uk/index.php/blog/2017-12-12-standard-format-citation-files
  6. Katz, D. S., & Smith, A. M. (2014). Implementing Transitive Credit with JSON-LD. ArXiv:1407.5117 [Cs]. Retrieved from http://arxiv.org/abs/1407.5117
  7. Citations, software. (2017). Retrieved from http://www.metrics-toolkit.org/citations-software/
  8. Chue Hong, N., Druskat, S., & Slaughter, P. (2018). codemeta: Minimal metadata schemas for science software and code, in JSON-LD. CodeMeta. Retrieved from https://github.com/codemeta/codemeta/blob/master/codemeta.json
  9. Smith, A. (n.d.). Making Your Code Citable · GitHub Guides. Retrieved from https://guides.github.com/activities/citable-code/
    © 2018 GitHub, Inc.
Advertisements
Posted in astroinformatics, Astronomy, informatics, information sharing, Scientific computing, Software citation, software engineering, software maintenance, software sustainability | Tagged , , , | Leave a comment

The Virtual Observatory Is Very Much Real!

 

This is the title of a talk given at the Harvard-Smithsonian Center for Astrophysics by my colleague Giuseppina (Pepi) Fabbiano. Pepi has been involved in the VO since its earliest daya, and she currently serves as Chair of the Executive Committee of the International Virtual Observatory Alliance (IVOA), the international body that specifies protocols for uniform access to worldwide data. You can watch the full video below. It explains how the VO is embedded into the infrastructure of many projects, and astronomers use it day-to-day without realizing it.

 

Here is Pepi’s abstract:

“While the VO is now embedded in the fabric of data astronomy, astronomers are still largely unaware of it, and often think of it as a past, perhaps failed, experiment. Instead, chances are that if you work with data, you are already using the VO. In this talk, I will discuss VO perception and reality, and demonstrate recent VO-enabled software and interfaces. Observations of the sky by means of increasingly powerful ground-based and space telescopes (and simulations) produce a rich and ever larger volume of digital data. They constitute a tremendous Virtual Observatory for astronomers to investigate the properties and evolution of the Universe. The realization that a new infrastructure was needed to fully and easily exploit these increasingly complex, diverse and large data sets, led to the constitution in 2002 of the International Virtual Observatory Alliance (IVOA), the standards organization for digital astronomy data access and interoperability. The IVOA has so far been joined by 21 national and international VO projects worldwide, and has produced standards for finding, accessing, selecting, extracting, analyzing and visualizing data. IVOA standards are increasingly implemented in all the major archives worldwide. Data centers are beginning to be built upon these standards, and new telescope projects are planning to use them. IVOA standards are used in a growing set of popular tools and interfaces in astronomy, as well as in tools used for Education and Outreach world-wide.”

Posted in archives, astroinformatics, Astronomy, computer videos, Computing, computing videos, cyberinfrastructure, data archives, Data Management, information sharing, software engineering, software sustainability, Virtual Observatory | Tagged , , , , | Leave a comment

Virtual and Augmented Reality for Space Science and Exploration

This is the title of a Symposium held at the Keck Institute for Space Studies on January 30 2018 in Pasadena.  According to the website, the goals of the meeting were: “The emerging technologies of Virtual and Augmented Reality (VR/AR) are promising to change dramatically the ways in which we perceive information and interact with it. VR/AR offer a natural, intuitive, and highly effective way for a collaborative data visualization and visual exploration. We will focus on two areas of space science and exploration where these technologies can have a major impact: mission planning and operations, and data visualization of high-dimensionality abstract data spaces.”

There were nine presentations in all, covering topics such as Visual Analytics by Demonstration for Interactive and Immersive Data Analysis (Alex Endert), Virtual Mars (P. Abercrombie), and Immersive Analytics Beyond Visualization (D. Bowman). There were also three panel discussions: Data Visualization in VR/AR, VR/AR for Mission Design and Ops, and A View From the Industry and Tech Transfer.

All the presentations and panels have been posted on the symposium web site and on YouTube – all are worth checking out.  Here is the panel discussion on R/AR for Mission Design and Ops:

 

 

 

 

 

 

 

Posted in astroinformatics, Astronomy, astronomy surveys, computer videos, Computing, computing videos, cyberinfrastructure, High performance computing, History of Computing!, programming, Scientific computing, software engineering, visualization | Tagged , , , | Leave a comment

Best Practices for a Future Open Code Policy

This week,  I am posting a copy of white paper prepared by Shamir et al., in response to a call from the National Academies, who are sponsoring a Task Force on “Best Practices for a Future Open Code Policy for NASA Space Science.”  Read more about the Task Force at http://sites.nationalacademies.org/SSB/CurrentProjects/SSB_178892, where you can click the link  “Submitted White Papers” to see a list of the 47 papers submitted.

Our white paper can be downloaded here: ShamirLior

Best Practices for a Future Open Code Policy: Experiences and Vision of the Astrophysics Source Code Library

A white paper submitted to the National Academies of Sciences, Engineering, and Medicine’s Best Practices for a Future Open Code Policy for NASA Space Science Project Committee

Lior Shamir,1 Bruce Berriman,2 Peter Teuben,3 Robert Nemiroff,4 and Alice Allen3 ,5
1 Lawrence Technological University 2 Caltech/IPAC-NExScI
3 University of Maryland
4 Michigan Technological University 5 Astrophysics Source Code Library

Introduction

We are members of the Astrophysics Source Code Library’s Advisory Committee and its editor-in-chief. The Astrophysics Source Code Library (ASCL, ascl.net) is a successful initiative that advocates for open research software and provides an infrastructure for registering, discovering, sharing, and citing this software. Started in 1999, the ASCL has been expanding in recent years, with an average of over 200 codes added each year, and now houses over 1,600 code entries.

Codes registered by the ASCL become discoverable not just through ASCL but also by commonly used services such as NASA’s Astrophysics Data System (ADS) and Google Scholar. ASCL entries are citable and citations to them are tracked by indexers, including ADS and Web of Science (WoS), thus providing one metric for the impact of research software and accruing credit for these codes to the scientists who write them. The number of citations by ASCL ID has increased an average of 90% every year since 2013, and nearly 70 publications indexed by ADS have citations to the ASCL. Figure 1 shows the number of citations to ASCL entries by year.

The ASCL has worked to formalize source code sharing and citation practices in astrophysics (Teuben et al., 2014) and has consulted with other disciplines seeking to do the same. Our work on and experience with the ASCL, as and with software authors, and with open and closed software informs our recommendations to NASA as it contemplates establishing an open code policy.

Figure 1. Citations to ASCL entries by year.

 

Open Source code in astrophysics

Due to the integration of information technology in astronomy research, computing has been becoming increasingly important, making software a pivotal part in the process of scientific conduct. Therefore, source code has become an integral part of the research, and critical for allowing the understanding, replication, and re-usability of research results. Unlike “traditional” methods of scientific communication such as peer-reviewed papers, and despite the clear need for common practices of making source code accessible to the scientific community, practices, guidelines, and requirements for making source code publicly available have not fully crystallized.

Although the requirement for making the source code open has not traditionally been a formal expectation in the field, numerous software tools have been released with Open Source Initiative1 (OSI) licenses, making substantial impact on the field. Some notable examples include SExtractor (Bertin & Arnouts, 1996), Montage (Berriman et al., 2015), and emcee (Foreman-Mackey et al. , 2013) which has been widely used by multiple unrelated projects.

Open development, in which the source is open by design, has also been becoming more common in astrophysics; examples include the data management software stack of the Large Synoptic Survey Telescope (Jurić, 2015) and the popular AstroPy package (Astropy Collaboration et al., 2013).

Releasing source code is important for transparency of the research, as the code is an integral part of the study and can reveal important information about the process that was carried out. Full details of the code are often very difficult to describe in a standard paper. More importantly, the availability of the code allows for replication of the results and reusability of the software, and possible discovery of unintended side effects.

Despite these important advantages, researchers are still often reluctant to release their source code. We believe the NSF Committee members, our readers, are already familiar with the reasons for this reluctance, which has been well-documented (Weiner et al. 2009, Barnes 2010, Stodden 2010, Ince et al. 2012, Katz et al. 2014, Goble et al. 2016).

Software development is a demanding task that requires a substantial amount of time, is often insufficiently funded, and is often regarded as not very important for career development decisions, such as promotion or tenure. The gap between the time required for developing a good software package and the little reward that comes with it encourages researchers to invest more time on peer-reviewed publications rather than on releasing clean documented (re)usable code. Funding software development, documentation, testing, and release would provide a financial incentive to researchers for these activities.

1 https://opensource.org/

Practices of sharing source code

Because software is an integral part of research, the release of source code should become part of the standard scientific communication process. Currently, the primary method of scientific communications is peer-reviewed publications; the publication process can be leveraged to standardize the process of sharing source code (Shamir et al., 2013). For example, Astronomy and Computing recommends that authors register their research source codes with the ASCL when submitting a research paper.2 The link between the publication process of scientific paper and the source code related to it helps to clarify requirements such as when and how source code should become available. It also makes the use of the specific source code clearer to the reader of the paper, with easier access to the specific relevant code. As peer-reviewed publications are the “building blocks” of any scientific discipline, the association between the release of the source code and the publication of the paper also standardizes the guidelines.

Since the maintenance of software after its initial goals have been accomplished is a time-consuming task yet offers little reward to the author, NASA should provide funding for maintenance of useful existing software. Standard regression tests written in the code development phase can help in software maintenance.

To provide stronger incentive for researchers to release their source code, published source codes should be indexed and become citable documents (Allen et al., 2015; this allows the software to be found on indexing engines, and also for citations for these programs to accrue to their authors. This is also consistent with the Force11 Software Citation Principles (Smith et al., 2016) and the Center for Open Science Transparency and Openness Promotion Guidelines.3

Funding agencies should support and encourage journals dedicated to publishing software, such as the Journal of Open Source Software or the Journal of Open Research Software . These journals allow source code and scientific software authors to receive credit equivalent to “traditional” journal publication. Clearly, these papers are also discoverable, indexed, and citable.

Research source code should be released under a licence that allows the reuse of the code, at least for non-commercial uses. Acceptable Open Source Initiative licenses, such as GPL (General Public License) and BSD (Berkeley Software Distribution) should be used to allow legal replication and use of the code with no restriction for scientific or other non-commercial purposes.

2 https://www.elsevier.com/journals/astronomy-and-computing/2213-1337/guide-for-authors 3 https://cos.io/our-services/top-guidelines/

Recommendations

We recommend the following:

Source code that enables research results be open for examination (released to the public) absent any truly compelling reasons, such as ITAR restrictions, that prohibit public release, upon submission of the first research paper that uses the source code to a journal.

Further, we recommend that metadata about these source codes be shared with the ASCL for indexing upon submission to increase the discoverability of the software.

The version of the software used in a paper that is accepted for publication be archived in a repository before publication.

All source code developed for research be explicitly licensed with an Open Source Initiative license that permits legal replication, use, and modification of the software with no restriction for scientific or other non-commercial use.

NASA award funds explicitly for

  • development of research source code, documentation, and testing procedures
  • maintenance of source code developed for research that is of continuing importance to the discipline.
  • creation of new open codes that do the same tasks as closed codes.

    Further, we recommend undertaking a study to determine a hierarchy of codes to rewrite. NASA require compliance with code release requirements when evaluating proposals. Further, we recommend forming a task force to 1.) develop standardized methods of reporting compliance with these requirements in new proposals where applicable, and 2.) develop instructions to referees of funding proposals to ensure these requirements are not overlooked in the evaluation process.

    NASA sponsor one or more journal that publish code and software.

References

Allen, A., et al . 2015, Improving software citation and credit. arXiv preprint arXiv:1512.07919 Astropy Collaboration et al. 2013 Astropy: A community Python package for astronomy. A&A ,

558, A33
Barnes N. 2010, Publish your computer code: it is good enough. Nature , 467, p. 753

Berriman, G. B. and Good, J. C. 2017, The Application of the Montage Image Mosaic Engine to the Visualization of Astronomical Images. PASP, 129, 058006.

Bertin, E. Arnouts, S. 1996, SExtractor: Software for source extraction. Astronomy & Astrophysics Supplement 317, 393

Foreman-Mackey, D. et al. 2013, emcee: The MCMC Hammer. PASP, 125, 306
Goble, C, et al. 2016, Dagstuhl Reports, 6, 62. http://drops.dagstuhl.de/opus/volltexte/2016/6755

Ince D., Hatton L., Graham-Cumming J. 2012, The case for open computer programs. Nature , 482, p. 485

Jurić, M. et al. 2015, The LSST data management system. arXiv preprint arXiv:1512.07914 Katz, D. S., et al. 2014, Summary of the First Workshop on Sustainable Software for Science:

Practice and Experiences (WSSSPE1), arXiv preprint arXiv:1404.7414
Shamir, L., et al. 2013, Practices in source code sharing in astrophysics. Astronomy and

Computing 1: 54-58.
Smith, A. M., et al. 2016, Software Citation Principles, PeerJ Computer Science, 2:e86,

doi:10.7717/peerj-cs.86

Stodden, V. 2010, The Scientific Method in Practice: Reproducibility in the Computational Sciences, Tech. Rep. MIT Sloan Research Paper No. 4773-10, doi:10.2139/ssrn.1550193

Teuben, P., et al., 2014, Ideas for Advancing Code Sharing: A Different Kind of Hack Day, ASP Conference Series, Vol. 485, ADASS XXIII, ed. N. Manset & P. Forshay, 3

Weiner, B., et al. 2009, Astronomical Software Wants To Be Free: A Manifesto, Vol. 2010, astro2010: The Astronomy and Astrophysics Decadal Survey, 61P

Posted in astroinformatics, Astronomy, BSD, Computing, GPL, High performance computing, informatics, information sharing, Licenses, Open Access, Open Source, programming, publishing, Scientific computing, social networking, software engineering, software maintenance, software sustainability | Tagged , , , , , , | 1 Comment

A Study of the Efficiency of Spatial Indexing Methods Applied to Large Astronomical Databases

This is the title of a poster presented by Tom Donaldson, Bernie Shiao (both of STScI) , John Good (Caltech/IPAC-NExScI) and myself at the 231st AAS meeting in Washington DC (January 8-12).  I am attaching a copy of the poster below, and linking a copy of the paper we prepared for the proceedings of ADASS XXVII (Santiago, Chile).

Briefly, we studied the the comparative performance of databases as follows:

  • Indexing depth (cell size) of Hierarchical Triangular Mesh (HTM) vs. HEALPix
  • PostgreSQL vs. SQL servers
  • Linux vs. Solaris vs. Windows

and we did this for two catalogs: the 2MASS All Sky Catalog (which covers the complete sky; 470,000,000) and the unmerged Hubble Source catalog (which has sparse sky coverage; 384,000,000 sources).

The main results are:

  • Query time is dominated by I/O.
  • Indexing depth—and not choice of index—has the greatest impact on performance: trade-off between too many sources and too many cells.
  • Optimum index depth depends on query radius distribution. (We used a log scale from 1 arcsec to 1 degree).

See the poster for the figures showing these results.

 

Slide1

Link to ADASS paper (PDF)

 

Posted in Uncategorized | 1 Comment

The Montage Image Mosaic Toolkit As A Visualization Engine

It’s January and that means it’s the Annual Winter meeting of the American Astronomical Society, this year in Washington DC. I presented a poster on how the Montage Image Mosaic Toolkit (http://montage.ipac.caltech.edu) is increasingly being used as a visualization engine. The poster was designed by Angela Lerias (U.California, Riverside), who is interning with the Montage project.It gives three examples of visualization:

  • Creating large multi-color images for visualization;
  • Sky-coverage maps of wide area time-series photometric surveys; and
  • Integration of  Montage into the JS9 web based image display tool (http://js9.si.edu).

Updated_AAS_PosterThe poster is based on the Open Access paper  “The Application of Montage to the Visualization of Astronomical Images,” by Bruce Berriman and John Good (PASP, 129, 058006); you can read a summary of the paper here.

Posted in Uncategorized | Leave a comment

The Exoplanet Content of the Keck Observatory Archive (KOA)

This week, I have been attending the “Exoclipse 2017: Exploring New Worlds In the Shade” conference at Boise State University, splendidly organized by Brian Jackson. From the project website, the meeting’s goals are to discuss “…  discovery strategies, nascent population statistics, formation mechanisms, the planetary initial mass function, and connections to other populations of planets in short and moderate periods.”

I presented this poster on the exoplanet content in the Keck Observatory Archive (KOA)’s data holdings. The archive curates data from all Keck instruments, over the operating lifetime of the Observatory, The data discussed in the poster are freely available for download and analysis. Discover your own planet!

Slide1

 

Posted in Uncategorized | Leave a comment