This session began with a call to action from Bob Hanisch:
- FITS is 35 years old; how do we go forward?
- Should we use VO-agreed data models as the high-level abstraction, and use HDF5 as the processing and transport layer? Note that while FITS is an archival format (it is defined by documents) while HDF5 is not (it is defined by its API; the storage format can be designed underneath it).
- Should we retain FITS as an archive layer – simply leave it alone and go forward?
- Time scale – note reorganization of IAU Commission and Working Groups.
Jessica Mink spoke about the history of FITS. While it began as a transport format, it has evolved as a recording, processing and archiving format. This largely came about because of its many advantages; Jessica and Rob Seaman pointed out a lot of them:
- Human and machine readable
- Self-documenting
- It has a well-defined structure for the header unit and the header records.
- Extensible
- Backwards-compatible (“Once FITS, always FITS.”)
- There is a process for introducing changes that is vetted by user community
… and more, which together make FITS a lingua franca for astronomy.
FITS has been extended to include WCS spatial and frequency information. Rots et al have a paper in astro-ph that proposes “Representation of Time Coordinates on FITS.”
One of the issues is that the syntax defined, but the semantics are not, and this has led to custom versions for various communities (e.g. high energy)
Rob Seaman quoted the numbers to demonstrate why FITS will remain important: he estimates that there may be over 1 billion FITS files “in the wild,” and conversion to a new format would not be possible. Rob pointed out that FITS can be extended to support new use cases. It can support Metadata Exchange Format (MEF), and images can be written as binary tables with tile compression (though the compression scheme, Rice compression, is not part of the standard, and parsers may fail).
What of alternate data formats? Bob Mann introduced a Special Issue of Astronomy and Computing on the “Future of Astronomical Data Formats.” There are three papers under review: Jenness et al on Lessons learend from NDS; Kitaeff et al. on the use of JPEG2000 and Thomas et al on a critique of FITS.
Tim Jenness described converting NDF to HDF, and suggested that we mights consider going back to it. NDF is an N-dimensional hierarchical data format, introduced by Starlink UK in 1988, but bad marketing led to no take-up outside Starlink. Twenty-six years later, no-one knows how Hierarchical Data System (HDS) works internally, it has no support for an efficient table format, has a 2GPixel image limit, and does not support external links. And it is written in Fortran. So Tim has been trying to reimplement HDS in HDF5, and has been able to do it aside from a few various compatibility issues. The biggest issue seems to be that it HDF5 does not support memory mapping: it prefers to manage data in small chunks (though someone said they had overcome this problem).
Brian Thomas is the lead author on the paper that strives to use FITS to understand astronomical data format needs. This magnum opus, with 37 co-authors and under review, examined limitations of FITS and set out to derive requirements for the future. The paper can be downloaded at tinyurl.com/acfits-draft-pdf. Brian also set up an astrodataformat google group to collect opinions and use cases to drive future requirements. Among the limitations identified that require attention in future formats are:
- Metadata expression needs to be extended – metadata are becoming complex.
- Better support for data models.
- Improved serialization (choice of endian, support for missing values)
- Support for large and distributed data sets
- Support for virtualization.
- Support for versioning and schema.
The Jenness et al paper is “Lessons learned from NDF”, not FITS.
So noted and corrected. Cheers.
Thanks for this report, Bruce, since I couldn’t make it.
A PDF of the entire BoF PowerPoint presentation is online at http://tdc-www.harvard.edu/mink/adass2014/DataFormats.pdf