The astronomical community has begun to discuss whether the Flexible Image Transport System (FITS) will continue to the de facto standard for storage and exchange of data in astronomy. FITS has been standard since the 1980s, in part because of its backwards compatibility, in part because of the large ecosystem of software that has been developed for processing FITS files.
Price, Barsdell and Greenhill (2014) have recently showed, on their paper “Is HDF5 a good format to replace UVFITS?” that the Hierarchical Data Format 5 (HDF5) (strictly an API rather than a data format) has advantages over FITS for radio visibility data. There are two registered FITS conventions for the storage of visibility data and associated metadata: FITS-ISI and UVFITS. Price et al explain the differences between them: “In UVFITS, the visibility data are stored in a random group HDU (header data unit), whereas in FITS-IDI data are stored in a binary table HDU. In both formats, each row of the table contains columns for the timestamp and a baseline identifier, along with the multidimensional visibility array for that timestamp and baseline.” An alternative format, quite different from FITS, is a CASA MeasurementSet (MS), essentially a directory of files nested in child directories.
Price et al. point out that the FITS files can be readily mapped to the structure of an HDF file, as shown below:
A Python utility fits2hdf uses this mapping to convert FITS files into HDF5, and vice-versa. It uses in turn the PyFITS (ascl:1207.009) and h5py libraries for file I/O. While fits2hdf was aimed at porting VFITS/FITS-IDI data into HDF5, it operates on any valid FITS file.
Further, they argue that HDF5 storage model has a number of advantages over FITS, and MS too, for supporting large data sets. ” … HDF5 provides parallel and network I/O, data chunking meth-ods, external (i.e. distributed) object storage, and a filter pipeline for data compression. Of specific interest for visibility data is bitshuffle2, an HDF5 filter designed for fast compression of visibility data. Using bitshuffle on a 1.2 GB test dataset of data from the LEDA correlator (Kocz et al. 2014), we achieved lossless compression ratio of 1.65x, with total file compression and write time of 7.5 s; in comparison the data compressed by 1.40x in 53.0 s using standard gzip.”