Last week, I said I would start making some posts on software sustainability. This week I will talk about an approach to sustainable software architecture that has been successfully applied over a period of 10 years to software services at the NASA/Infrared Processing and Analysis Center (IPAC), Caltech. The approach was developed in response to the need to build and maintain at modest cost the NASA/IPAC Infrared Science Archive, NASA’s archive node for its infrared astronomy data sets. The architectural approach was designed and implemented by my colleague John Good, who served as the IRSA architect for many years.
When the archive opened for business in 1999 serving only two data sets, it was understood that the holdings would grow rapidly in size and diversity, and consequently in the number of queries and volume of data download. And it was understood that platforms and browsers would be modernized, that user interfaces would need to be replaced, and new functionality outside the scope of the original specifications would be needed. Today, the archive serves 40 TB of data from 30 missions and projects, and since 1999 has met 35 million electronic data requests (averaging 3TB/month of end user data downloads), and yet remains underpinned by the same basic architecture.
The data requests to IRSA are generally CGI queries on the attributes of the data that return subsets of data sets, the capability to browse and visualize the results and to download them. Each data set has a web form that supports complex queries on the attributes unique to that data set, and there are services that reach across data sets to return inventories of data. Many of the services share common functions – sky coordinate transformations, filtering of tabulated data. Thus it was recognized at the outset that a generic component-based architecture, represents a sound design that would offer portability, sustainability and extensibility, and avoid later problems caused by excessively tight integration with a particular vendor or environment.
The components are generally written in Open Source GNU ANSI-compliant C, with no shared memory and minimal dependence on system calls. This choice enabled us to migrate from Solaris 10 to Dell Linux servers running Red Hat Enterprise 4.3 with the minimum of bother. Each component within the archive is a module with a standard interface that communicates with other components and fulfills one general function. User applications, in our case web forms, are built by plugging together components. Communication between components is carried out with an executive library. It starts components as child services and parses return values. In this model, user interfaces are thin front ends that present and return information needed by the end user. New components are developed as needed.
Technological choices are always made for the “long-haul” following pilot studies and cost-benefit analyses. The need for sustainability is paramount to end-users and to the sponsor, and our preference is to see how the technology develops rather than rush to use to use it. Since 1999, the archive has been well served by the Informix database engine. To reduce costs, we are replacing it with Oracle. All database queries are made with a single module that reformulates HTML queries into SQL. This module is being updated to replace all embedded SQL with ODBC, and this change is abstracted from all other components.
The component design has allowed the architecture to be extended to support new archive projects, and it now underpins the Keck Observatory archive, the NASA Exoplanet Database and a Science Information System for the Kepler Mission. The architecture does, however, remain relatively compact with a size of 400 K lines of code.
A price that has to be paid for the benefits described above is the need for disciplined software engineering practices to provide well-structured code, clean interfaces and rigorous configuration management procedures.
Our recommended best practices for sustainability are:
- Design for sustainability, extensibility, re-use and portability
- Use modular or component based designs
- Make source code and test and validation data available (if permitted)
- Be careful about new technologies – do a cost benefit analysis before adopting them
- Build a user community that encourages users to contribute to sustainability
- Use rigorous software engineering practices to ensure well-organized and well-documented code.
- Control your and manage your interfaces.