Results from a US survey about Research Software Engineers

This week’s post is a reblog of a post by Daniel S. Katz, Sandra Gesing, Olivier Philippe, and Simon Hettrick on the URSSI blog at http://urssi.us/blog/2018/06/21/results-from-a-us-survey-about-research-software-engineers/, and reproduced here with the permission of Dan Katz.

First, some summary stats from the survey tweeted by Dan Katz:

  •  90% write software that contributes to published research
  • 82% don’t have hand-over plans for their important software
  • 78% have trained others
  • 75% have experience in demand in the labor market

——–

In 2016, the UK Software Sustainability Institute (SSI) ran a first survey of Research Software Engineers (RSEs): the people who write code in academia. This produced the first insight into the demographics, job satisfaction, and practices of RSEs. To support and broaden this work, the Institute planned to run the survey every year in the UK and an ever-expanding number of countries so that insight and comparison can be made across the globe. Ultimately, the SSI hopes that these results, the anonymized version of which are open licensed, will act as a valuable resource to understand and improve the working conditions for RSEs.

In 2017, led by Olivier Philippe and Simon Hettrick from the SSI, a set of such surveys were run across the the UK, Canada, Germany, the Netherlands, South Africa, and the US. One or more people from each non-UK country “translated” the questions so that they made sense in the local language and culture. The UK team ran the surveys, with the collaborators from the other countries—Scott Henwood (Canada), Stephan Janosch and Martin Hammitzsch (Germany), Ben van Werkhoven and Tom Bakker (Netherlands), Anelda van der Walt (South Africa), and Daniel S. Katz and Sandra Gesing (USA)—helping to publicize the survey in their countries. When the surveys were complete, the UK team analyzed the data, and published it for all countries. Building the survey and doing the analysis were done on GitHuband the output data has been published on Zenodo.

This blog post is a short summary of the results for this first USA survey, based on the Jupyter notebook that analyses the data. Given that Dan and Sandra are part of the team conceptualizing a US Research Software Sustainability Institute (URSSI), and because this survey brings up issues that URSSI might address, we are publishing this summary on the URSSI blog.

Possibly the most important insight provided by the survey is that 90% of respondents have written software that has directly contributed to published research. This confirms the central role of RSEs (and anyone who performs this role regardless of their job title) in the generation of research results.

In 2016, the UK Software Sustainability Institute (SSI) ran a first survey of Research Software Engineers (RSEs): the people who write code in academia. This produced the first insight into the demographics, job satisfaction, and practices of RSEs. To support and broaden this work, the Institute planned to run the survey every year in the UK and an ever-expanding number of countries so that insight and comparison can be made across the globe. Ultimately, the SSI hopes that these results, the anonymized version of which are open licensed, will act as a valuable resource to understand and improve the working conditions for RSEs.

In 2017, led by Olivier Philippe and Simon Hettrick from the SSI, a set of such surveys were run across the the UK, Canada, Germany, the Netherlands, South Africa, and the US. One or more people from each non-UK country “translated” the questions so that they made sense in the local language and culture. The UK team ran the surveys, with the collaborators from the other countries—Scott Henwood (Canada), Stephan Janosch and Martin Hammitzsch (Germany), Ben van Werkhoven and Tom Bakker (Netherlands), Anelda van der Walt (South Africa), and Daniel S. Katz and Sandra Gesing (USA)—helping to publicize the survey in their countries. When the surveys were complete, the UK team analyzed the data, and published it for all countries. Building the survey and doing the analysis were done on GitHuband the output data has been published on Zenodo.

This blog post is a short summary of the results for this first USA survey, based on the Jupyter notebook that analyses the data. Given that Dan and Sandra are part of the team conceptualizing a US Research Software Sustainability Institute (URSSI), and because this survey brings up issues that URSSI might address, we are publishing this summary on the URSSI blog.

Possibly the most important insight provided by the survey is that 90% of respondents have written software that has directly contributed to published research. This confirms the central role of RSEs (and anyone who performs this role regardless of their job title) in the generation of research results.

Demographics

164 people completed the survey past the first page, and listed themselves as working in the USA. 115 selected male to represent their gender, while 19 selected female, 6 selected “prefer not say”, and 24 did not answer. 77% of those who provided an ethnic origin described themselves as white, 11% as Asian, 8% as Hispanic, Latino, or Spanish, 6% as other, and 2% as Black or African American. Of those who provided their age, 38% were between 25 and 34, 31% between 35 and 44, 20% between 45 and 54, and 9% between 55 and 64.

According to the 2010 US Census and the Bureau of Labor Statistics, the overall US population is about 50% male and 50% female, 72% white, 5% Asian, 16% Hispanic or Latino American, 6% other, and 13% Black or African American, and the demographics of those employed are roughly similar to those of the overall population. 40% of the labor force is people from ages 16 and 36, 33% from ages 37 and 52, and 25% from ages 53 to 71.

It is interesting to note that the gender imbalance reported here is shared across the other countries that ran RSE surveys. It ranged from a maximum of 92% male in South Africa to a minimum 63% male in the Netherlands. The reason for this require closer investigation, but it is likely to be caused by the gender imbalance in physics and computer science – the main subjects from which RSEs are drawn. It is clear that the RSE community must work to address this imbalance.

Of those who entered their salary, 3% reported less than $30,000, 7% from $30,000 to $50,000, 18% from $50,000 to $70,000, 20% from $70,000 to $90,000, 21% from $90,000 to $110,000, 11% from $110,000 to $130,000, 6% from $130,000 to $150,000, and 7% more than $150,000. 3% reported having a condition defined as a disability by the Americans with Disabilities Act (ADA). 60% of those who reported their highest degree reported a PhD, 25% a Masters, and 12% an undergraduate degree. The high proportion of RSEs reporting a PhD supports the assumption that RSEs understand not just software engineering but also research, and are therefore better placed to work with researchers. Respondents reported a wide variety of subjects for their highest degree, including 25% in computer science, 15% in biological sciences, 15% in physics and astronomy, 11% in engineering, 9% in chemistry, and 5% in mathematics.

Coding

163 respondents write code, and about 75% of them write code that is at least half for others to use. 61% of the respondents consider themselves professional software developers, and the average development experience of the respondents is 14 years. This implies that there are people with a relatively large amount of development experience who do not consider themselves professional software developers, perhaps because they see themselves as “researchers who write software” rather than software developers, or it may be that the lack of recognition for the contribution of RSEs leads them to undervalue their position. While all respondents said they spend time coding each month, they (on average) spend less time on research, then less on management, and the least on teaching. About education, it worths to mention that 78% of them provided training to other researchers in computational techniques. 52% conduct testing on their software, while 35% have users do testing, and 7% do no formal testing. Studies suggest that around 70% of research relies on software [DOI:10.5281/zenodo.1183562], and if almost a half of that software is untested, this is a huge risk to the reliability of research results.

82% do not have a technical hand-over plan for their most important software project, and the bus factor for the same project is 1 for 39% of respondents, 2 for 36%, and 3 or more for the remaining 25%. 74% of respondents use Python, 45% C, 43% C++, 40% R, and 31% use Fortran.

image2

Employment and Job Satisfaction

75% work for universities, 11% for national labs, and 5% for industry. 96% work full-time. 42% are employed permanently as long as funding is available, 40% permanently on institutional core funds, and 15% fixed term. They have been at their current institution a mean of 6.4 years. 51% have their largest project funded by grants, 29% by the institution, 8% by industry, and 6% are volunteering their time on their largest project. The survey also asked the reason for the respondents accepting their current position, with answers that included the desire to advance research, the desire to work in a research environment, freedom to choose own working practices, long-term prospects for continued employment, learning new skills, the opportunity to develop software, flexible working hours, the ability to work across disciplines, salary, and the opportunity for career advancement. The results are summarized below, and they may explain why RSEs choose to work in academia rather than industry, where they can command significantly higher salaries: ultimately, they care about making a contribution to research and are willing to take a lower salary for that opportunity.

The mean response on job satisfaction was about 8 of 10 (where 0 is not at all satisfied and 10 is completely satisfied). Respondents’ jobs generally satisfied them, and they were not eager to move to another job, at least at the same level of compensation, as shown below:

While the data is difficult to summarize, about 45% of respondents felt it would not be difficult for them to get an equivalent job in a different organization while 25% felt it would be difficult, about 75% said they could think of a number of organizations that would probably offer them a job while about 10% couldn’t, and about 75% said that their experience is in demand in the labor market while about 10% didn’t. They also generally positively scored the recognition they receive from their management, and reported that they find enjoyment in and are enthusiastic about their job. While only a minority felt it was likely they would gain a promotion within their current group, a majority felt that they had many options, and that their current position as an integral part of their career plan.

Collaboration

52% of respondents work with different researchers and regularly change who they work with, possible as a generalist across fields or within a field, while 48% consistently work with the same researcher(s), being embedded in a single research group. 43% work in an RSE group at their institution, while 57% do not. The mean number of projects a respondents works on is just under four. 78% of respondents have trained others, and they do so a mean of about six times per year. There are no courses that train a person to conduct the work of an RSE, so the only way to acquire these skills is to through self-learning or working with another RSE.

Publications

90% of respondents have written software that has contributed to published research. 80% are generally named as co-authors in this case, 71% are acknowledged in the paper, and 29% are a main author of the paper. 65% have presented their software at a conference. 81% have released open source software, and 48% always release their code as open source. 32% have used a DOI to identify their software.

Continuing discussion

There are a number of questions that this survey brings up, and we would be happy to have them discussed in the URSSI discussion forum, in part to see what the potentially role of URSSI could be in addressing the questions. In addition to the questions we have suggested below, which we have created as topics in the discussion forum, we also encourage readers to start discussions around other questions.

  • Should there be a US RSE organization, similar to the UK one? (more discussion here)
  • How do the salaries for RSEs compare with those of other researchers? How could this be studied? If there is a difference, could URSSI do anything to address it? (more discussion here)
  • Given that 42% of responding RSEs are employed permanently as long as funding is available, 40% are employed permanently on institutional core funds, and 15% are employed fixed term, is this reasonable? How does it compare to other researchers and other research staff? Is there a role for URSSI to play here? (more discussion here)
  • What can be done to address the gender imbalance in the RSE community? (more discussion here)
  • How can we teach RSE skills to people who wish to pursue a career in research software engineering? (more discussion here)
Advertisements
Posted in Uncategorized | Leave a comment

Software Citation Implementation in Astronomy

This week’s post is a repost of a  summary of a meeting I attended, held at the 231st AAS meeting in National Harbor, MD, January 2018. The original summary, prepared by Daina Bouquin and Arfon Smith, is at https://github.com/CfA-Library/Cite_Astro_Software/blob/master/whitepaper.md, and is reproduced here with permission.

Software Citation Implementation in Astronomy

A White Paper Summarizing Perceptions from the 2018 AAS Splinter Meeting on Implementing the FORCE11 Software Citation Principles

Daina Bouquin , Harvard-Smithsonian Center for Astrophysics, Cambridge, MA

Daniel S. Katz, University of Illinois Urbana-Champaign, Urbana, IL

Edwin Henneken, Harvard-Smithsonian Center for Astrophysics, Cambridge, MA

August (Gus) Muench, American Astronomical Society, Washington, DC

Katie Frey, Harvard-Smithsonian Center for Astrophysics, Cambridge, MA

Arfon M. Smith, Space Telescope Science Institute, Baltimore, MD

Contact: daina.bouquin@cfa.harvard.edu

Introduction

Software citation is foundationally important to the future of astronomy. Deep intellectual contributions are being made by people creating software to enable scientific research, and it is essential that software creators are encouraged to create these valuable resources. Efforts to help authors receive proper academic credit will allow them to prioritize writing software valuable for the astronomy community within their current profession or the ability to focus their whole career on it. With these facts in mind, on January 11, 2018 a “splinter meeting” was held at the 231st Meeting of the AAS that focused on implementing the FORCE11 Software Citation Principles (Appendix A) in Astronomy. The goal of the splinter meeting was to gather feedback from astronomers, developers, and others whose expertise could inform the creation of resources to support the implementation of those principles. The meeting organizers aimed to identify some of the social and technical hurdles that affect software citation practices in astronomy and to understand factors that distinguish it from other fields where software citation is increasingly essential.

The purpose of this document is to summarize the perceptions expressed during the splinter meeting, and to define a few concrete next steps that can be taken to lower perceived barriers. This is not a comprehensive list of all topics discussed by all attendees at the meeting, but will hopefully serve to instigate further discussion and invite disruptive ideas about how to best share increasingly complex, continually changing, digital research artifacts like software.

Structure of the Meeting

Attendees were organized into five groups of five to seven people before the session began to ensure that each discussion group was made up of people representing a mix of relevant perspectives. The goal was to have representation by software authors, developers, journal editors, indexers, and librarians in each group to the extent that this was possible.

The splinter meeting at AAS was 90 minutes long, beginning with three brief presentations: an update about the current FORCE11 Software Citation Principles Implementation Working Group1; the diversity of software being produced by the astronomy community2; and the on-going Asclepias project3 from the AAS. A grant from the Alfred P. Sloan Foundation to the AAS for the Asclepias project was used to fund the splinter meeting.

Attendees were given prompts (Appendix B) after the presentations and asked to discuss the issues outlined within the prompts. A scribe at each table took notes about the ensuing conversation and presented highlights from the discussion at the end of the meeting.

Recurrent Themes from Discussion Groups

Motivations for Software Citation in Astronomy

The FORCE11 Software Citation Working Group identified four primary motivations for developing the software citation principles in 2016: 1) the need for a complete record of the research process; 2) the need to enable software discoverability; 3) the importance of research reproducibility; and 4) the need to give credit to academic researchers of all levels for the software that they develop4. Throughout the splinter meeting, the discussion groups eventually all came to focus predominantly on what was perceived as the primary motivating factor for implementing the principles in astronomy: giving credit for research software. The groups framed their conversations and focused the majority of their discussion around perceived social and technical barriers that impact people’s ability to give and receive credit for their code. Therefore, credit and the lack of credit academics receive for their software became the driving force behind subsequent discussions. This is not to say that the other motivations were not seen as important to attendees, rather that less discussion in this meeting was linked to discoverability, research reproducibility, or completeness of the scientific record than to credit.

Credit Leaves Many Questions Unanswered

By focusing on credit for academic researchers, issues surrounding authorship, native software citation, and subsequently the complexity authorship takes on in the context of software versioning came to the front. When authorship changes from one version of software to the next, the community highlighted how opaque attempts to implement the principles become. Specifically, the following questions remained unanswered:

  1. Who is a software “author”? The astronomy community doesn’t agree on how much someone should contribute to a code before that person is considered an author. This is true independent of version, but becomes an increasingly complicated discussion when multiple versions are being cited as contributors change over time.
    • Authorship questions highlighted a limitation that software papers present, as it         was noted repeatedly that when a software paper exists, astronomers will overwhelmingly cite the paper rather than natively citing code regardless of whether or not the code they want to cite is the same version as the code discussed in the software paper. This may contribute to software paper authors receiving disproportionate amounts of credit and current contributors not receiving any. This preference for papers also runs counter to the FORCE11 Principle aiming for native software citation.
    • There was also discussion about how to give acknowledgement for contributions that might not fit a definition of “authorship” and how giving all contributors equal credit as authors may serve to dilute the perceived importance of authoring software from external perspectives (e.g. tenure committees, funders, etc.).
      • The meeting attendees saw this issue (and also item 1.i.) as something that could be addressed through education and outreach to external groups, but that a concerted effort would be required from allies within those groups.
  2. What should be cited to properly give author(s) credit? The ways in which astronomers are able to share software are varied (e.g. Zenodo record, AAS software paper, JOSS paper, ASCL record, etc.) and one piece of software may be represented as multiple digital objects (i.e. both as a paper and with a natively citable DOI from Zenodo) each with a different authorship list.More specifically, the following questions came up regularly in this context:
    • When more than one digital object exists, what is the best way to give credit.
      • How should we determine this?
      • Knowing native software citation is the goal, should one also cite a software paper if one exists?
      • How do we deal with multiple forks?
    • Where should researchers look to find preferred citations?
      • How can indexes support researchers looking for preferred citations?
    • How should authors determine the preferred citation for their code?
      • Where should an author make this information apparent (i.e. CITATION.md file5)?
    • How should citations be calculated across different types of digital objects and versions of those objects?
      • Should citations be aggregated?
      • Should transitive credit be used in this context6 or approaches described in the “Metrics Toolkit”7?
  3. What metadata do people need to make available to create a proper citation?
    • Should metadata associated with the use of the code be put in the same place as the metadata required to document the creation of the code? Where should usage metadata be documented?
      • Facilitating code reuse, although not as essential as credit from the meeting attendees perspective, should be encouraged. Metadata needed for reuse may be far beyond what is needed for a citation though.
    • Should software papers require a structured metadata file to improve machine-readability (i.e. CodeMeta.json8)?

Limited Situational Awareness

Outreach was another perceived barrier that came up throughout the discussions. Specifically, attendees expressed that outside of the group gathered for the splinter meeting and a few other select groups within the broader astronomy community, there was very little situational awareness about the importance of software citation. For the purposes of this summary, a person with a high degree of situational awareness in the context of research software citation is a person capable of understanding how implementation of the FORCE11 Software Citation Principles will impact them, both immediately and in the future. Situational awareness of this sort came up in two contexts:

  1. A lack of situational awareness pertaining to software citation by the community at large. This type of awareness was discussed as primarily a social issue in that this lack of awareness could be at least partially addressed through an awareness campaign.
    1. Attendees at the splinter meeting expressed that the Astronomy community as a whole may not possess an intuitive understanding of the importance of thinking critically about how and when to cite software and may not fully understand what they should be doing if they want people to cite their own software. As a result, it is not obvious to the community the ways in which research software deviates from a digital object like a research article or image.
      • This lack of awareness may lead to apathy among members of the community who are not seeking credit for their software.
        • Apathy may also stem from the misperception that software presents unresolvable problems for scholarly communication rather than being representative of issues that need to be addressed for all digital objects.
      • Limited awareness about the importance of software citation may exacerbate the problem of software not being seen as a scholarly contribution by external groups (e.g. tenure committees, funders, etc.).
        • Power dynamics can also prevent software authors from pushing back against these perceptions or pushing for software citation to be part of any formal curriculum or policy discussion.
      • A noted lack of awareness was also discussed when splinter meeting attendees touched on publisher’s editorial practices.
        • Meeting attendees discussed the need for more targeted education on software citation best practices for reviewers and the need for more qualified referees.
    2. Uncertainty about expectations from both the community and publishers perspective. The below questions about expectations could also be clarified through an awareness campaign and specific guidelines:
      What level of support is expected from software authors who share their code? (e.g. Code reuse in new contexts?; Feature requests?; Documentation?)
      What is the expectation of “code quality”? (i.e. Is “hacky” code something that should be shared?)
      Should scripts be shared or only packages that are meant to be reused?
      If the primary motivation of sharing/citing code is giving credit, where does Open Science fit? (e.g. Is one expected to ever cite individual scripts?; Can you cite software that doesn’t have a “title”?; How do you share and point people to code that is meant for individual use? )
      Without clarifying this point in particular it is easy for people to become fixed on reproducibility issues (i.e. “Perfect becomes the enemy of the good.”)
      What are publishers supposed to do to educate referees?
      To what degree can automation support referees? (e.g. What can be done to flag papers that should have software citations?)

Tools and Other Resources

Throughout the splinter meeting, attendees also took time to brainstorm about possible tools that could support software citation implementation. Ideas for tools were diverse, but primarily focused on how automated tools could be used to initiate human action by authors and reviewers. The motivation behind developing tools in this context was to find ways to make software citation straightforward and to make giving credit for research software as much a part of writing a paper as citing an article or data source. Tool ideas were not vetted for feasibility, nor was there any consideration of who should create or fund these types of tools.

Ideas for tools included (in no particular order):

  • “Figure Tags” – A way to automatically embed a software citation in figures and/or generate the current preferred software citation when the figure is generated (e.g. figures created using corner.py).
  • “Flag for review” – Upon submission, a script could cross reference an article’s text against a list of keywords or phrases that would indicate that a software citation is expected and trigger an alert for the reviewer.
  • “Look up service for preferred citations” – Could cross-reference known sources of preferred citations and GitHub for CITATION.md files.
  • Wizard to help code authors create CITATION.md files and/or CodeMeta.json files.

Splinter meeting attendees also pointed out a number of other resources that would be useful that were not geared toward automating publication processes. These included:

  • Clear, brief, best practice guidelines for software citation addressing the spectrum of research software including example citations.
    • These could be general enough to be helpful for reviewers and authors and answer many of the above noted questions,
    • These guidelines would need to refer out directly to relevant publisher guidelines or be augmented by publishers as needed.
      • Could act as a template for publishers who do not currently have guidelines in place for software citation.
      • Could be used to recommend changes to current citation style guides and reference management tools.
    • Guidelines could include recommended practices that would support software reuse and give general direction to resources on software licensing.
  • LaTeX templates that include links to software citation guidelines for specific publishers.
  • The splinter meeting attendees also brought up resources that are not specific to software citation implementation in Astronomy, but do touch on issues that impact the ability for academics to make a career writing software, which is fundamentally why giving credit for software is so essential. These included a lack of designated funding mechanisms/opportunities to support software development in astronomy. Attendees also highlighted their perception that funders do not require or fund software “preservation” and archiving in the same way that they may require for the preservation of data. It should be noted though that these perceptions may stem from limited awareness about existing policies from funders like the NSF, which does require software management in the same way it requires data management as it is considered to be a type of data.

Editorial Policies

Discussions during the splinter meeting often moved toward publisher editorial policies and how the astronomy community could influence them. Although it was noted repeatedly that the AAS has reasonable guidelines for publishing software papers and citing software, many other journals do not and might inadvertently create disincentives for citing software by limiting the number of citations authors can include in their papers. Moreover, meeting attendees lamented the limited attention that referees tend to give to software citations as they pertain to software version and use of preferred citations. Some attendees pointed out instances where articles that cited a software paper, or version of software, which predated their contributions would sometimes include them in the acknowledgement section and sometimes not. Editorial policies may not give specific enough guidance to publishing staff or assume that best practices are obvious. Increased dialog about editorial policies is needed between members of the astronomy community who are knowledgeable about software citation issues and journal editors who can influence policies for their respective journals.

Limitations

The information presented in this document was gathered by using an unsystematic and unstructured approach and will miss aspects of the conversations had during the AAS splinter meeting. Aside from this reality though, there are other important limitations to highlight from this process:

The group selected to discuss barriers to implementing the FORCE11 Principles at the splinter meeting were people already involved in conversations about software citation in astronomy. They therefore likely do not represent the perspectives of the broader community who may have much more limited awareness about these issues. Anecdotes made up the majority of examples given during conversations and those examples may be outliers despite efforts to focus on issues that impact the community more holistically.

Another issue that will impact any next steps recommended within this document is the fact that no “solutions” will be comprehensive, seeing as the entire premise of responding to community needs sets up an inverse problem wherein we are attempting to determine causes based on a limited set of observations. This fact should not prevent steps from being taken. Scholarly communication ecosystems are multifaceted and “solutions” will be iterative.

Finally, this document did not focus on intellectual property issues surrounding software and software citation. Discussions during the AAS splinter meeting only peripherally touched on intellectual property, licensing, and software reuse.

Next Steps

It would be advantageous to develop and promote brief guidelines to inform software authors in astronomy of what they can do to help ensure they are given credit for their work within the current scholarly communication landscape. These guidelines could point out simple steps to make their code more citable9 and clarify some of the confusion about author expectations. The guidelines could also explain common methods for publishing software and address citation issues for each case. The guidelines could then be iteratively updated as questions posed by the splinter meeting attendees are addressed by the community. A separate set of short guidelines could be created on how to cite software and where to look for citations. In order for any of these guidelines to be impactful, they would need to include real examples and be promoted by institutions and individuals in as many settings as possible. Any guidelines created for the community would be a foundation on which an awareness campaign could be built, serving as a focal point for future debate and discussion.

In addition to concrete guidelines specific to astronomy, it would be valuable to develop a survey of publisher’s policies and editorial practices that impact software citation in the field. A survey of this sort could then be used to frame conversations with different publishers and indexers about which tools would be most helpful in enabling implementation of the FORCE11 Principles and inform updates to guidelines. The survey would also give publishers and stakeholders data that could support changes to editorial practices.

Moreover, it would be useful to design a study on self-citation practices in astronomy in order to better understand how software authors currently share their own work with the broader scientific community. A study of this kind could act as a standard for comparison to better understand how citation practices change over time.

Future Discussion Needed

Future conversations about software in astronomy are needed to address issues surrounding intellectual property, licensing, and copyright in particular. Research software preservation and resources to support preservation should also be discussed further in the context of software reuse.

Appendix A

The FORCE11 Software Citation Principles

https://doi.org/10.7717/peerj-cs.86

  • Importance: Software should be considered a legitimate and citable product of research. Software citations should be accorded the same importance in the scholarly record as citations of other research products, such as publications and data; they should be included in the metadata of the citing work, for example in the reference list of a journal article, and should not be omitted or separated. Software should be cited on the same basis as any other research product such as a paper or a book, that is, authors should cite the appropriate set of software products just as they cite the appropriate set of papers.
  • Credit and attribution: Software citations should facilitate giving scholarly credit and normative, legal attribution to all contributors to the software, recognizing that a single style or mechanism of attribution may not be applicable to all software.
  • Unique identification: A software citation should include a method for identification that is machine actionable, globally unique, interoperable, and recognized by at least a community of the corresponding domain experts, and preferably by general public researchers.
  • Persistence: Unique identifiers and metadata describing the software and its disposition should persist—even beyond the lifespan of the software they describe.
  • Accessibility: Software citations should facilitate access to the software itself and to its associated metadata , documentation, data, and other materials necessary for both humans and machines to make informed use of the referenced software.
  • Specificity: Software citations should facilitate identification of, and access to, the specific version of software that was used. Software identification should be as specific as necessary, such as using version numbers, revision numbers, or variants such as platforms.

Note: Bolding is not part of the formatting in the original publication of the principles.

Appendix B

Discussion Prompts

Given the Software Citation Principles as a Goal:

  • What are the most substantial barriers you experience when citing software?

 

  • What resources have you used to help you to cite software? (e.g. use cases, blogs, people, reference management tools, etc.)
  • What resources are still needed in astronomy?
  • What specific tools would be helpful when authoring papers and writing new codes? (e.g. LaTeX macros, CITATION file template on GitHub, etc.)\
  • What do you think would most impactful in raising awareness about software citation issues in astronomy?

References

  1. Katz, D. S. (2017). Software Citation: A Solution with a Problem. Retrieved from https://www.slideshare.net/danielskatz/software-citation/danielskatz/software-citation
  2. Smith, A. (2018). Spectrum of Research Software. Retrieved from https://stsci.app.box.com/s/ooamf3ds1fxcqq7nnhgpnbjgi0sra031
  3. Muench, A. (2018). Implementations and provocations from the asclepias project. Retrieved from https://drive.google.com/file/d/1LB1arge6mMSHz1Lgxu6wn8Nr-Rb1LYNw/view?usp=embed\_facebook
  4. Smith, A. M., Katz, D. S., & Niemeyer, K. E. (2016). Software citation principles. PeerJ Computer Science, 2, e86. https://doi.org/10.7717/peerj-cs.87
  5. Druskat, S., Bast, R., Chue Hong, N., Konovalov, K., & Rowley, A. (2017). A standard format for CITATION files. Software Sustainability Institute: Software and Research Blog. Retrieved from https://www.software.ac.uk/index.php/blog/2017-12-12-standard-format-citation-files
  6. Katz, D. S., & Smith, A. M. (2014). Implementing Transitive Credit with JSON-LD. ArXiv:1407.5117 [Cs]. Retrieved from http://arxiv.org/abs/1407.5117
  7. Citations, software. (2017). Retrieved from http://www.metrics-toolkit.org/citations-software/
  8. Chue Hong, N., Druskat, S., & Slaughter, P. (2018). codemeta: Minimal metadata schemas for science software and code, in JSON-LD. CodeMeta. Retrieved from https://github.com/codemeta/codemeta/blob/master/codemeta.json
  9. Smith, A. (n.d.). Making Your Code Citable · GitHub Guides. Retrieved from https://guides.github.com/activities/citable-code/
    © 2018 GitHub, Inc.
Posted in astroinformatics, Astronomy, informatics, information sharing, Scientific computing, Software citation, software engineering, software maintenance, software sustainability | Tagged , , , | Leave a comment

The Virtual Observatory Is Very Much Real!

 

This is the title of a talk given at the Harvard-Smithsonian Center for Astrophysics by my colleague Giuseppina (Pepi) Fabbiano. Pepi has been involved in the VO since its earliest daya, and she currently serves as Chair of the Executive Committee of the International Virtual Observatory Alliance (IVOA), the international body that specifies protocols for uniform access to worldwide data. You can watch the full video below. It explains how the VO is embedded into the infrastructure of many projects, and astronomers use it day-to-day without realizing it.

 

Here is Pepi’s abstract:

“While the VO is now embedded in the fabric of data astronomy, astronomers are still largely unaware of it, and often think of it as a past, perhaps failed, experiment. Instead, chances are that if you work with data, you are already using the VO. In this talk, I will discuss VO perception and reality, and demonstrate recent VO-enabled software and interfaces. Observations of the sky by means of increasingly powerful ground-based and space telescopes (and simulations) produce a rich and ever larger volume of digital data. They constitute a tremendous Virtual Observatory for astronomers to investigate the properties and evolution of the Universe. The realization that a new infrastructure was needed to fully and easily exploit these increasingly complex, diverse and large data sets, led to the constitution in 2002 of the International Virtual Observatory Alliance (IVOA), the standards organization for digital astronomy data access and interoperability. The IVOA has so far been joined by 21 national and international VO projects worldwide, and has produced standards for finding, accessing, selecting, extracting, analyzing and visualizing data. IVOA standards are increasingly implemented in all the major archives worldwide. Data centers are beginning to be built upon these standards, and new telescope projects are planning to use them. IVOA standards are used in a growing set of popular tools and interfaces in astronomy, as well as in tools used for Education and Outreach world-wide.”

Posted in archives, astroinformatics, Astronomy, computer videos, Computing, computing videos, cyberinfrastructure, data archives, Data Management, information sharing, software engineering, software sustainability, Virtual Observatory | Tagged , , , , | Leave a comment

Virtual and Augmented Reality for Space Science and Exploration

This is the title of a Symposium held at the Keck Institute for Space Studies on January 30 2018 in Pasadena.  According to the website, the goals of the meeting were: “The emerging technologies of Virtual and Augmented Reality (VR/AR) are promising to change dramatically the ways in which we perceive information and interact with it. VR/AR offer a natural, intuitive, and highly effective way for a collaborative data visualization and visual exploration. We will focus on two areas of space science and exploration where these technologies can have a major impact: mission planning and operations, and data visualization of high-dimensionality abstract data spaces.”

There were nine presentations in all, covering topics such as Visual Analytics by Demonstration for Interactive and Immersive Data Analysis (Alex Endert), Virtual Mars (P. Abercrombie), and Immersive Analytics Beyond Visualization (D. Bowman). There were also three panel discussions: Data Visualization in VR/AR, VR/AR for Mission Design and Ops, and A View From the Industry and Tech Transfer.

All the presentations and panels have been posted on the symposium web site and on YouTube – all are worth checking out.  Here is the panel discussion on R/AR for Mission Design and Ops:

 

 

 

 

 

 

 

Posted in astroinformatics, Astronomy, astronomy surveys, computer videos, Computing, computing videos, cyberinfrastructure, High performance computing, History of Computing!, programming, Scientific computing, software engineering, visualization | Tagged , , , | Leave a comment

Best Practices for a Future Open Code Policy

This week,  I am posting a copy of white paper prepared by Shamir et al., in response to a call from the National Academies, who are sponsoring a Task Force on “Best Practices for a Future Open Code Policy for NASA Space Science.”  Read more about the Task Force at http://sites.nationalacademies.org/SSB/CurrentProjects/SSB_178892, where you can click the link  “Submitted White Papers” to see a list of the 47 papers submitted.

Our white paper can be downloaded here: ShamirLior

Best Practices for a Future Open Code Policy: Experiences and Vision of the Astrophysics Source Code Library

A white paper submitted to the National Academies of Sciences, Engineering, and Medicine’s Best Practices for a Future Open Code Policy for NASA Space Science Project Committee

Lior Shamir,1 Bruce Berriman,2 Peter Teuben,3 Robert Nemiroff,4 and Alice Allen3 ,5
1 Lawrence Technological University 2 Caltech/IPAC-NExScI
3 University of Maryland
4 Michigan Technological University 5 Astrophysics Source Code Library

Introduction

We are members of the Astrophysics Source Code Library’s Advisory Committee and its editor-in-chief. The Astrophysics Source Code Library (ASCL, ascl.net) is a successful initiative that advocates for open research software and provides an infrastructure for registering, discovering, sharing, and citing this software. Started in 1999, the ASCL has been expanding in recent years, with an average of over 200 codes added each year, and now houses over 1,600 code entries.

Codes registered by the ASCL become discoverable not just through ASCL but also by commonly used services such as NASA’s Astrophysics Data System (ADS) and Google Scholar. ASCL entries are citable and citations to them are tracked by indexers, including ADS and Web of Science (WoS), thus providing one metric for the impact of research software and accruing credit for these codes to the scientists who write them. The number of citations by ASCL ID has increased an average of 90% every year since 2013, and nearly 70 publications indexed by ADS have citations to the ASCL. Figure 1 shows the number of citations to ASCL entries by year.

The ASCL has worked to formalize source code sharing and citation practices in astrophysics (Teuben et al., 2014) and has consulted with other disciplines seeking to do the same. Our work on and experience with the ASCL, as and with software authors, and with open and closed software informs our recommendations to NASA as it contemplates establishing an open code policy.

Figure 1. Citations to ASCL entries by year.

 

Open Source code in astrophysics

Due to the integration of information technology in astronomy research, computing has been becoming increasingly important, making software a pivotal part in the process of scientific conduct. Therefore, source code has become an integral part of the research, and critical for allowing the understanding, replication, and re-usability of research results. Unlike “traditional” methods of scientific communication such as peer-reviewed papers, and despite the clear need for common practices of making source code accessible to the scientific community, practices, guidelines, and requirements for making source code publicly available have not fully crystallized.

Although the requirement for making the source code open has not traditionally been a formal expectation in the field, numerous software tools have been released with Open Source Initiative1 (OSI) licenses, making substantial impact on the field. Some notable examples include SExtractor (Bertin & Arnouts, 1996), Montage (Berriman et al., 2015), and emcee (Foreman-Mackey et al. , 2013) which has been widely used by multiple unrelated projects.

Open development, in which the source is open by design, has also been becoming more common in astrophysics; examples include the data management software stack of the Large Synoptic Survey Telescope (Jurić, 2015) and the popular AstroPy package (Astropy Collaboration et al., 2013).

Releasing source code is important for transparency of the research, as the code is an integral part of the study and can reveal important information about the process that was carried out. Full details of the code are often very difficult to describe in a standard paper. More importantly, the availability of the code allows for replication of the results and reusability of the software, and possible discovery of unintended side effects.

Despite these important advantages, researchers are still often reluctant to release their source code. We believe the NSF Committee members, our readers, are already familiar with the reasons for this reluctance, which has been well-documented (Weiner et al. 2009, Barnes 2010, Stodden 2010, Ince et al. 2012, Katz et al. 2014, Goble et al. 2016).

Software development is a demanding task that requires a substantial amount of time, is often insufficiently funded, and is often regarded as not very important for career development decisions, such as promotion or tenure. The gap between the time required for developing a good software package and the little reward that comes with it encourages researchers to invest more time on peer-reviewed publications rather than on releasing clean documented (re)usable code. Funding software development, documentation, testing, and release would provide a financial incentive to researchers for these activities.

1 https://opensource.org/

Practices of sharing source code

Because software is an integral part of research, the release of source code should become part of the standard scientific communication process. Currently, the primary method of scientific communications is peer-reviewed publications; the publication process can be leveraged to standardize the process of sharing source code (Shamir et al., 2013). For example, Astronomy and Computing recommends that authors register their research source codes with the ASCL when submitting a research paper.2 The link between the publication process of scientific paper and the source code related to it helps to clarify requirements such as when and how source code should become available. It also makes the use of the specific source code clearer to the reader of the paper, with easier access to the specific relevant code. As peer-reviewed publications are the “building blocks” of any scientific discipline, the association between the release of the source code and the publication of the paper also standardizes the guidelines.

Since the maintenance of software after its initial goals have been accomplished is a time-consuming task yet offers little reward to the author, NASA should provide funding for maintenance of useful existing software. Standard regression tests written in the code development phase can help in software maintenance.

To provide stronger incentive for researchers to release their source code, published source codes should be indexed and become citable documents (Allen et al., 2015; this allows the software to be found on indexing engines, and also for citations for these programs to accrue to their authors. This is also consistent with the Force11 Software Citation Principles (Smith et al., 2016) and the Center for Open Science Transparency and Openness Promotion Guidelines.3

Funding agencies should support and encourage journals dedicated to publishing software, such as the Journal of Open Source Software or the Journal of Open Research Software . These journals allow source code and scientific software authors to receive credit equivalent to “traditional” journal publication. Clearly, these papers are also discoverable, indexed, and citable.

Research source code should be released under a licence that allows the reuse of the code, at least for non-commercial uses. Acceptable Open Source Initiative licenses, such as GPL (General Public License) and BSD (Berkeley Software Distribution) should be used to allow legal replication and use of the code with no restriction for scientific or other non-commercial purposes.

2 https://www.elsevier.com/journals/astronomy-and-computing/2213-1337/guide-for-authors 3 https://cos.io/our-services/top-guidelines/

Recommendations

We recommend the following:

Source code that enables research results be open for examination (released to the public) absent any truly compelling reasons, such as ITAR restrictions, that prohibit public release, upon submission of the first research paper that uses the source code to a journal.

Further, we recommend that metadata about these source codes be shared with the ASCL for indexing upon submission to increase the discoverability of the software.

The version of the software used in a paper that is accepted for publication be archived in a repository before publication.

All source code developed for research be explicitly licensed with an Open Source Initiative license that permits legal replication, use, and modification of the software with no restriction for scientific or other non-commercial use.

NASA award funds explicitly for

  • development of research source code, documentation, and testing procedures
  • maintenance of source code developed for research that is of continuing importance to the discipline.
  • creation of new open codes that do the same tasks as closed codes.

    Further, we recommend undertaking a study to determine a hierarchy of codes to rewrite. NASA require compliance with code release requirements when evaluating proposals. Further, we recommend forming a task force to 1.) develop standardized methods of reporting compliance with these requirements in new proposals where applicable, and 2.) develop instructions to referees of funding proposals to ensure these requirements are not overlooked in the evaluation process.

    NASA sponsor one or more journal that publish code and software.

References

Allen, A., et al . 2015, Improving software citation and credit. arXiv preprint arXiv:1512.07919 Astropy Collaboration et al. 2013 Astropy: A community Python package for astronomy. A&A ,

558, A33
Barnes N. 2010, Publish your computer code: it is good enough. Nature , 467, p. 753

Berriman, G. B. and Good, J. C. 2017, The Application of the Montage Image Mosaic Engine to the Visualization of Astronomical Images. PASP, 129, 058006.

Bertin, E. Arnouts, S. 1996, SExtractor: Software for source extraction. Astronomy & Astrophysics Supplement 317, 393

Foreman-Mackey, D. et al. 2013, emcee: The MCMC Hammer. PASP, 125, 306
Goble, C, et al. 2016, Dagstuhl Reports, 6, 62. http://drops.dagstuhl.de/opus/volltexte/2016/6755

Ince D., Hatton L., Graham-Cumming J. 2012, The case for open computer programs. Nature , 482, p. 485

Jurić, M. et al. 2015, The LSST data management system. arXiv preprint arXiv:1512.07914 Katz, D. S., et al. 2014, Summary of the First Workshop on Sustainable Software for Science:

Practice and Experiences (WSSSPE1), arXiv preprint arXiv:1404.7414
Shamir, L., et al. 2013, Practices in source code sharing in astrophysics. Astronomy and

Computing 1: 54-58.
Smith, A. M., et al. 2016, Software Citation Principles, PeerJ Computer Science, 2:e86,

doi:10.7717/peerj-cs.86

Stodden, V. 2010, The Scientific Method in Practice: Reproducibility in the Computational Sciences, Tech. Rep. MIT Sloan Research Paper No. 4773-10, doi:10.2139/ssrn.1550193

Teuben, P., et al., 2014, Ideas for Advancing Code Sharing: A Different Kind of Hack Day, ASP Conference Series, Vol. 485, ADASS XXIII, ed. N. Manset & P. Forshay, 3

Weiner, B., et al. 2009, Astronomical Software Wants To Be Free: A Manifesto, Vol. 2010, astro2010: The Astronomy and Astrophysics Decadal Survey, 61P

Posted in astroinformatics, Astronomy, BSD, Computing, GPL, High performance computing, informatics, information sharing, Licenses, Open Access, Open Source, programming, publishing, Scientific computing, social networking, software engineering, software maintenance, software sustainability | Tagged , , , , , , | 1 Comment

A Study of the Efficiency of Spatial Indexing Methods Applied to Large Astronomical Databases

This is the title of a poster presented by Tom Donaldson, Bernie Shiao (both of STScI) , John Good (Caltech/IPAC-NExScI) and myself at the 231st AAS meeting in Washington DC (January 8-12).  I am attaching a copy of the poster below, and linking a copy of the paper we prepared for the proceedings of ADASS XXVII (Santiago, Chile).

Briefly, we studied the the comparative performance of databases as follows:

  • Indexing depth (cell size) of Hierarchical Triangular Mesh (HTM) vs. HEALPix
  • PostgreSQL vs. SQL servers
  • Linux vs. Solaris vs. Windows

and we did this for two catalogs: the 2MASS All Sky Catalog (which covers the complete sky; 470,000,000) and the unmerged Hubble Source catalog (which has sparse sky coverage; 384,000,000 sources).

The main results are:

  • Query time is dominated by I/O.
  • Indexing depth—and not choice of index—has the greatest impact on performance: trade-off between too many sources and too many cells.
  • Optimum index depth depends on query radius distribution. (We used a log scale from 1 arcsec to 1 degree).

See the poster for the figures showing these results.

 

Slide1

Link to ADASS paper (PDF)

 

Posted in Uncategorized | 1 Comment

The Montage Image Mosaic Toolkit As A Visualization Engine

It’s January and that means it’s the Annual Winter meeting of the American Astronomical Society, this year in Washington DC. I presented a poster on how the Montage Image Mosaic Toolkit (http://montage.ipac.caltech.edu) is increasingly being used as a visualization engine. The poster was designed by Angela Lerias (U.California, Riverside), who is interning with the Montage project.It gives three examples of visualization:

  • Creating large multi-color images for visualization;
  • Sky-coverage maps of wide area time-series photometric surveys; and
  • Integration of  Montage into the JS9 web based image display tool (http://js9.si.edu).

Updated_AAS_PosterThe poster is based on the Open Access paper  “The Application of Montage to the Visualization of Astronomical Images,” by Bruce Berriman and John Good (PASP, 129, 058006); you can read a summary of the paper here.

Posted in Uncategorized | Leave a comment