Many of us in the astronomical software business have been debating the best way to evaluate the quality of software and its success within its user community. Here is one proposal submitted by a group of us to the Computational Science & Engineering Software Sustainability and Productivity Challenges (CSESSP) Workshop, October 15-16, 2015, Washington, DC, USA.
Briefly, we are proposing the creation of a software “peer-review group,” comprised of grant recipients funded to develop sustainable software, who would meet periodically to evaluate each others’ software, developing and refining success metrics along the way. What do others in the field think of this approach?
Toward a Framework for Evaluating Software Success: A Proposed First Step
Stan Ahalt (firstname.lastname@example.org), Bruce Berriman, Maxine Brown, Jeffrey Carver, Neil Chue Hong, Allison Fish, Ray Idaszak, Greg Newman, Dhabaleswar Panda, Abani Patra, Elbridge Gerry Puckett, Chris Roland, Douglas Thain, Selcuk Uluagac, Bo Zhang.
Software is a particularly critical technology in many computational science and engineering (CSE) sectors. Consequently, software is increasingly becoming an important component in the evaluation of competitive grants and the execution of research projects. As a result, software can be viewed as a scholarly contribution and has been proposed as a new factor to consider in tenure and promotion processes. However, existing metrics for evaluating the capability, use, reusability, or success of software are sorely lacking. This lack of software metrics permits the development of software based on poor development practices, which in turn allows poorly written software to “fly under the radar” in the scientific community and persist undetected. The absence of evaluation by knowledgeable peers often leads to the establishment and adoption of tools based on aggressive promotion by developers, ease-of-use, and other peripheral factors, hindering the sustainability, usefulness, and uptake of software and even leading to unreliable scientific findings. All of these factors mean that addressing the current lack of software evaluation metrics and methods is not just a question of increasing scientific productivity, but also a matter of preventing poor science.
As a first step toward creating a methodology and framework for developing and evolving software success metrics for the CSE community, we propose the creation of a software “peer-review group.” This group, comprised of grant recipients funded to develop sustainable software, would meet periodically to evaluate their own and each others’ software, developing and refining success metrics along the way. We envision the group as a pilot test for a potential larger-scale effort to establish a more formal framework for software success metrics and evaluation.
Framing Success Metrics
Our perspective on framing software success metrics arose from a breakout session held at a recent NSF-funded workshop attended by more than 75 Software Infrastructure for Sustained Innovation (SI2) principal investigators. The breakout team identified the need to create a methodology and framework for academic software success metrics, brainstormed factors to consider in developing such a framework, and outlined the actionable steps needed to advance this effort. The idea of a software review group was introduced in these discussions, and possible outcomes—presented briefly here—were discussed. We believe further discussion by Computational Science and Engineering Software Sustainability and Productivity Challenges (CSESSP) workshop attendees will help to further develop these ideas and emphasize the importance of framing software success metrics as an integral part of developing a sustainable software ecosystem.
The Need to Evaluate Software Success
On the whole, the development of research software in academia, government, and national labs trails the rigor of industry-developed software. Incentives and measurements of what constitutes successful software differ among and within these sectors, yet all are ultimately part of the same software ecosystem. Generally speaking, successful software must be reliable, sustainable, have value to the target user community and beyond, and provide outcomes that are meaningful to societal stakeholders. Sound software development and engineering practices lead to sustainable software. Stakeholder adoption, use, and reuse of software create feedback loops that further enhance software success. To improve the productivity and sustainability of research software and the research communities it supports, we should be able to objectively measure what makes software successful—or not.
Factors to Consider
There are multiple dimensions to consider in developing an effective methodology and framework for evaluating software success. One dimension relates to the factors that contribute to software success, such as criticality, usability, performance, functionality, availability, and scientific impact. These terms may have different meanings in different fields; for example, usability may mean something different for networking software than it does for security software. Another dimension relates to the types of outcomes we might want to measure, such as the value of the scientific contributions of a grant or project, the value of the products of a grant or project (i.e., the value of the software), or the nature of the team’s “community conduct” (e.g., its value to the software ecosystem). Another relates to defining needs: for example, what is it that funders, researchers, or the broader community need to know in order to inform better decisions and improve sustainability? Finally, we must develop robust metrics to address these dimensions, inform project goals, and empower software creators, researchers, funders, and others to effectively evaluate software.
To begin to develop and evolve a software evaluation framework, we propose establishing a peer review group —an organization of representative stakeholders who will self-review software works created by their respective communities. This group would effectively constitute a pilot program to inform the feasibility, scope, and approach of a future, larger effort to establish and refine a framework for sustainable software metrics. At a minimum, this group would give its members an opportunity for regular review and enhance their own self- improvement processes. If successful more broadly, the group would help to characterize key challenges in software evaluation, define and refine evaluation criteria, and lead to a more informed approach to software development and evaluation for the CSE community as a whole.
We believe further discussion of this idea at the CSESSP workshop would refine and inform our approach and help to generate momentum toward achieving better software evaluation approaches. Examples of questions that warrant further exploration include:
- How should we determine who should be included in the review group? o What attributes make someone an expert software reviewer?
- How should we manage the process for submitting software for evaluation?
- Should we require all group members to regularly submit their own software
- How can others opt in to have their software reviewed?
- How will the process provide adequate protections against conflicts of interest, address reviewers’ knowledge limitations, and address the possibility that some software creators may be competing with each other or with reviewers?
- How should this activity be structured to continually advance the ultimate aim of establishing anobjective set of review criteria that can be applied to different types of software?
- What evaluation criteria or mechanisms are needed to ensure the group works effectively toward its goals?
- What types of documentation or outcomes would be useful toward developing a larger-scale metrics framework
Submitted to: Computational Science & Engineering Software Sustainability and Productivity Challenges (CSESSP) October 15-16, 2015, Washington, DC, USA
Report from the National Science Foundation-funded workshop held February 17-18, 2015, at the Westin Arlington Gateway in Arlington, Virginia for Software Infrastructure for Sustained Innovation (SI2) Principal Investigators, http://dl.acm.org/citation.cfm?id=2764957.