Toward a Framework for Evaluating Software Success

Many of us in the astronomical software business have been debating the best way to evaluate the quality of software and its success within its user community. Here is one proposal submitted by a group of us to the Computational Science & Engineering Software Sustainability and Productivity Challenges (CSESSP) Workshop, October 15-16, 2015, Washington, DC, USA.

Briefly, we are proposing the creation of a software “peer-review group,” comprised of grant recipients funded to develop sustainable software, who would meet periodically to evaluate each others’ software, developing and refining success metrics along the way. What do others in the field think of this approach?

Toward a Framework for Evaluating Software Success: A Proposed First Step

Stan Ahalt (, Bruce Berriman, Maxine Brown, Jeffrey Carver, Neil Chue Hong, Allison Fish, Ray Idaszak, Greg Newman, Dhabaleswar Panda, Abani Patra, Elbridge Gerry Puckett, Chris Roland, Douglas Thain, Selcuk Uluagac, Bo Zhang.

Software is a particularly critical technology in many computational science and engineering (CSE) sectors. Consequently, software is increasingly becoming an important component in the evaluation of competitive grants and the execution of research projects. As a result, software can be viewed as a scholarly contribution and has been proposed as a new factor to consider in tenure and promotion processes. However, existing metrics for evaluating the capability, use, reusability, or success of software are sorely lacking. This lack of software metrics permits the development of software based on poor development practices, which in turn allows poorly written software to “fly under the radar” in the scientific community and persist undetected. The absence of evaluation by knowledgeable peers often leads to the establishment and adoption of tools based on aggressive promotion by developers, ease-of-use, and other peripheral factors, hindering the sustainability, usefulness, and uptake of software and even leading to unreliable scientific findings. All of these factors mean that addressing the current lack of software evaluation metrics and methods is not just a question of increasing scientific productivity, but also a matter of preventing poor science.

As a first step toward creating a methodology and framework for developing and evolving software success metrics for the CSE community, we propose the creation of a software “peer-review group.” This group, comprised of grant recipients funded to develop sustainable software, would meet periodically to evaluate their own and each others’ software, developing and refining success metrics along the way. We envision the group as a pilot test for a potential larger-scale effort to establish a more formal framework for software success metrics and evaluation.

Framing Success Metrics

Our perspective on framing software success metrics arose from a breakout session held at a recent NSF-funded workshop attended by more than 75 Software Infrastructure for Sustained Innovation (SI2) principal investigators.  The breakout team identified the need to create a methodology and framework for academic software success metrics, brainstormed factors to consider in developing such a framework, and outlined the actionable steps needed to advance this effort. The idea of a software review group was introduced in these discussions, and possible outcomes—presented briefly here—were discussed. We believe further discussion by Computational Science and Engineering Software Sustainability and Productivity Challenges (CSESSP) workshop attendees will help to further develop these ideas and emphasize the importance of framing software success metrics as an integral part of developing a sustainable software ecosystem.

The Need to Evaluate Software Success

On the whole, the development of research software in academia, government, and national labs trails the rigor of industry-developed software. Incentives and measurements of what constitutes successful software differ among and within these sectors, yet all are ultimately part of the same software ecosystem. Generally speaking, successful software must be reliable, sustainable, have value to the target user community and beyond, and provide outcomes that are meaningful to societal stakeholders. Sound software development and engineering practices lead to sustainable software. Stakeholder adoption, use, and reuse of software create feedback loops that further enhance software success. To improve the productivity and sustainability of research software and the research communities it supports, we should be able to objectively measure what makes software successful—or not.

Factors to Consider

There are multiple dimensions to consider in developing an effective methodology and framework for evaluating software success. One dimension relates to the factors that contribute to software success, such as criticality, usability, performance, functionality, availability, and scientific impact. These terms may have different meanings in different fields; for example, usability may mean something different for networking software than it does for security software. Another dimension relates to the types of outcomes we might want to measure, such as the value of the scientific contributions of a grant or project, the value of the products of a grant or project (i.e., the value of the software), or the nature of the team’s “community conduct” (e.g., its value to the software ecosystem). Another relates to defining needs: for example, what is it that funders, researchers, or the broader community need to know in order to inform better decisions and improve sustainability? Finally, we must develop robust metrics to address these dimensions, inform project goals, and empower software creators, researchers, funders, and others to effectively evaluate software.

Next Steps

To begin to develop and evolve a software evaluation framework, we propose establishing a peer review group —an organization of representative stakeholders who will self-review software works created by their respective communities. This group would effectively constitute a pilot program to inform the feasibility, scope, and approach of a future, larger effort to establish and refine a framework for sustainable software metrics. At a minimum, this group would give its members an opportunity for regular review and enhance their own self- improvement processes. If successful more broadly, the group would help to characterize key challenges in software evaluation, define and refine evaluation criteria, and lead to a more informed approach to software development and evaluation for the CSE community as a whole.

We believe further discussion of this idea at the CSESSP workshop would refine and inform our approach and help to generate momentum toward achieving better software evaluation approaches. Examples of questions that warrant further exploration include:

  •  How should we determine who should be included in the review group? o What attributes make someone an expert software reviewer?
  • How should we manage the process for submitting software for evaluation?
  • Should we require all group members to regularly submit their own software
  • How can others opt in to have their software reviewed?
  • How will the process provide adequate protections against conflicts of interest, address reviewers’ knowledge limitations, and address the possibility that some software creators may be competing with each other or with reviewers?
  • How should this activity be structured to continually advance the ultimate aim of establishing anobjective set of review criteria that can be applied to different types of software?
  • What evaluation criteria or mechanisms are needed to ensure the group works effectively toward its goals?
  • What types of documentation or outcomes would be useful toward developing a larger-scale metrics framework

Submitted to: Computational Science & Engineering Software Sustainability and Productivity Challenges (CSESSP) October 15-16, 2015, Washington, DC, USA

Report from the National Science Foundation-funded workshop held February 17-18, 2015, at the Westin Arlington Gateway in Arlington, Virginia for Software Infrastructure for Sustained Innovation (SI2) Principal Investigators,

Posted in Computing, information sharing, Open Source, Peer review, programming, Scientific computing, Software citation, software engineering, software maintenance, software sustainability, user communities | Tagged , , , , , , , , , , | Leave a comment

Machine Learning with Scikit-Learn (I) – PyCon 2015

An excellent introduction to machine learning, by Jake VanderPlas at PyCon 2015. Long, but full of useful information.

Posted in astroinformatics, Astronomy, computer modeling, computer videos, Computing, computing videos, cyberinfrastructure, informatics, information sharing, knowledge based discovery, Machine learning, programming, Python, Scientific computing, software engineering, statistical analysis | Tagged , , , , , , , , | Leave a comment

What’s the Difference Between Cluster, Grid and Cloud Computing?

After my last post on introductory videos on cloud computing, I was asked if there were videos that explained the difference between Cluster, Grid and Cloud Computing. Here is a very good one by Prof. Ajit Pal, Department of Computer Science and Engineering, IIT Kharagpu. He explains the architectural differences between these platforms, as well as the implications for maintenance, deployment and cost. Although long at 55 min, this video is worthwhile for its technical approach for computer professionals  and for scientists wishing to exploit these approaches to computing.


Posted in astroinformatics, Cloud computing, computer videos, Computing, computing videos, cyberinfrastructure, Grid Computing, High performance computing, information sharing, Parallelization, programming, Scientific computing, Uncategorized | Tagged , , , , , , | Leave a comment

Videos on Getting Started With Cloud Computing

A number of people have asked if I would post links to videos that provide useful introductions to Cloud Computing. Believe it or not, there are many, many videos on this topic posted on YouTube, and I will post here links to those that I think might be useful if you are getting started with cloud computing.

Cloud Computing Explained

What is the Cloud? (now with pictures!)

A bit more technical: The Three Ways to Cloud Compute

A longer but very good introduction:  Introduction to Cloud Computing

This last video is by Eli the Computer Guy. His channel at has a lot of interesting computer videos – well worth a look.

Posted in Cloud computing, computer videos, Computing, cyberinfrastructure, Data formats | Tagged , , , | Leave a comment

The Palomar Transient Factory: High Quality Realtime Data Processing in a Cost-Constrained Environment

This is the title of a paper by Surace et al. (2015) currently available on astro-ph and presented as a paper at ADASS XXIV in October 2014. The Palomar Transient Factory (PTF) is an example of a cost-constrained project that is now common in astronomy. It produces a high volume of data, which need near real-time processing for maximum science return, and yet must achieve all of this on a shoestring budget. I will focus in this post on how the cost constraints were managed, rather than give a technical description of the project and its infrastructure. The decisions made exploited many years of expertise at IPAC, developed in managing science operations centers for NASA missions.

The PTF itself is a generic term for several projects, with various observing cadences aimed at discovering supernovae, gamma-ray bursters and other objects. The orginal PTF was succeeded by the “intermediate” Palomar Transient Factory (or iPTF), which concentrates on specific focussed science campaigns, rotated on a quarterly basis. The iPTF was itself succeeded by the Zwicky Transient Facility, which operates with new camera with a field of view of nearly 50 square degrees, composed of inexpensive “wafer-scale” CCDs.

The cost constraints were managed on all parts of the project, from the hardware on the telescope all the way through to the archive system. First of all, it took advantage of a lot of hardware in the data acquisition system:

  • It re-used the the CFHT 12k Mosaic Camera, and replaced the liquid nitrogen dewar with a mechanical cryo-cooler.
  • The system primarily surveyed the sky in one filter, the R-band, which maximizes survey volume.
  • It took advantage of the 1.2-m Oschin-Schmidt telescope, rather than build a new one.
  • Telescopic operations are largely robotic.
  • Transients discovered by PTF can be  followed up in near real-time by other telescopes at Palomar.

All data acquired at the telescope are required for science analysis, and are transmitted to IPAC via a microwave link through the  San Diego Supercomputer Center.  At IPAC, the data are processed on twenty-four dual-CPU compute drones. The processing itself is embarrassing parallel, with data for each CCD processed on a single drone. Mass storage is managed with a ZFS file system, with data compression. The long-term storage is dual-homed and connected to both the operations system and the archive system. This is done because the the  disk is too expensive to manage an operations and an archive copy, at the expense of  complexity cost in controlling file ownership between operations and the archive. See the figure below for a schematic of the processing flow:


The data processing system was developed under an agile management process, with only a handful of core staff and with heavy involvement of scientists throughout. This is a key feature of IPAC’s institutional strategy and ensures alignment of software development with science goals. The system draws heavily on existing community software, with individual modules on various languages carrying out specific tasks.  Utility was valued over elegance.

The archive is managed within the Infrared Science Archive (IRSA) at IPAC. The PTF archive interface is essentially a thin layer built atop a reusable and portable science information system that has supported the archives of many missions and projects at IPAC for the past decade and a half.

Finally, a critical component of the PTF are “science marshalls,” organized around particular topics and which organize and present results for those topics, and, among other things, allow scientists to interact with the results and form collaborations.

Posted in astroinformatics, Astronomy, astronomy surveys, cyberinfrastructure, data archives, Data Management, High performance computing, informatics, information sharing, Observatories, Operations, Parallelization, programming, Scientific computing, software engineering, software maintenance, software sustainability, Uncategorized, user communities | Tagged , , , , , , , , , , , , , | Leave a comment

Montage Mosaics The Pillars of Creation!

I posted this on the Montage blog  the other day. A number of readers suggested that because the images looked spectacular, I should repost on Astronomy Computing Today. 

The Pillars of Creation in the Eagle Nebula (M16) remain one of the iconic images of the Hubble Space Telescope. Three pillars rise from a molecular cloud into an enormous HII region, powered by the massive young cluster NGC 6611. Such pillars are common in regions of massive star formation, where they form as a result of ionization and stellar winds.

In a paper that will shortly be published in MNRAS,  entitled “The Pillars of Creation revisited with MUSE: gas kinematics and high-mass stellar feedback traced by optical spectroscopy,” McLeod et al (2015)  analyze of new data acquired with the Multi Unit Spectroscopy Explorer (MUSE) instrument on the VLT.  They used Montage to create integrated line maps of the single pointings obtained at the telescope. The figure below shows an example of these maps:2015-04-30_15-35-49

The authors confirmed  the pillar tips are being ionized and photo-evaporated by the massive members of NGC 6611.  They found a new  bipolar outflow at the tip of the middle pillar and proposed that an embedded protostar is driving it. With the physical parameters and ionic abundances derived from the spectroscopic study, they estimated a mass loss rate due to photo-evaporation of 70 M⊙/Myr, which implies that these structures can expect to have a lifetime of 3 Myr.

Posted in astronomy, astronomy images, Astronomy software, Image mosaic, Image processing, Images, Integral Field Spectrographs, software, star formation | Tagged , , , , , , , | Leave a comment | Edit
Posted in Uncategorized | Leave a comment

Interactive Visualization of Billion-Point Data Sets: Nanocubes

One of the most interesting talks I heard at the “Tools for Astronomical Big Data” Meeting I attended in March was by Carlos Scheidegger (University of Arizona) on How do you look at a billion data points? Exploratory Visualization for Big Data.”  It was my first detailed introduction to creating and querying nano cubes, a data structure that enables interactive visualizations of data sources with billions of elements. The idea behind them is that they are constructed to fit in a modern laptop’s main memory, even for hundreds of millions of data entries.  In this way large and complex data sets can be visualized interactively, without the latency involved in reading from disk and without the need for high-performance computers for performing any processing.

Scheidegger and his team of Lauro Lins, Horace Ip and James Kloswoski have aweb page,, which explains their project. Their Open Source code is available on GitHub at If you want a detailed technical description, read their paper Lauro Lins, James T. Klosowski, and Carlos Scheidegger. “Nanocubes for Real-Time Exploration of Spatiotemporal Datasets. Visualization and Computer Graphics, IEEE Transactions on 19, no. 12 (2013): 2456-2465″  (free PDF). I found that their article “Into Main Memory: Nanocubes for Interactively Visualizing Billion-Point Data Sets” provides an excellent introduction to the topic.

There is also a youTube video explaining the principles and showing some fascinating visualizations:

So how does it all work?  Compared with a data cube that stores the original data records, a nanocube has a reduced memory footprint because it stores aggregate counts rather than the original records (e.g. total number of tweets on Chicago per hour and so on), and uses a tree structure made up of cells for only those subsets that exist in the data set. But there is more to it than that. Scheidegger takes up the story. ” …  we have merged a dwarf cube’s reduced memory footprint and fast querying with the ability to handle spatial and temporal dimensions. A nanocube retains the hierarchical tree structure of a dwarf cube but introduces a layering concept that includes three types of dimensions: spatial, categorical, and temporal, always traversed in that order. Each dimension has a start node and is subdivided into levels, the number of which depends on the data set. A query is a path through this tree.

The first dimension to be traversed is the spatial one, which in a nanocube can have up to 25 levels. The start point is the entire-world level followed by successively more granular levels all the way down to city blocks. (This is similar to Google Maps 17 spatial layers, with the highest resolution level translating, depending on the imagery, to about 10-30 meters.)

The nanocube uses the quadtree algorithm, a well-known spatial indexing technique in visualization to enable users to smoothly zoom in or out from one location to another. A quadtree divides the world into four quadrants, and each quadrant has four children, each of them likewise having four children also. In this way, the world is successively divided up into finer tiles, allowing users to visualize spatial data at arbitrary zoom levels.”

Here are some examples; see the web page for some interactive demos (from: Nanocubes for Real-Time Exploration of Spatiotemporal Datasets, Lauro Lins, James T. Klosowski, and Carlos Scheidegger;IEEE Transactions on 19, no. 12 (2013): 2456-2465;


Posted in astroinformatics, computer videos, Computing, cyberinfrastructure, Data mining, informatics, information sharing, time series data, visualization | Tagged , , , , , , , , | Leave a comment