My colleague Anastasia Alexov thinks so, and I am in inclined to agree with her. As data sets become Petabyte (PB) size and the data themselves become more complex, then current file formats such as FITS, which have served astronomy so well, start to break down. She gave a talk on this topic at the Innovations in Data Intensive Astronomy Workshop, and described how the “Low Frequency Array” Project (LOFAR) has chosen Hierarchical Data Format 5 (HDF5) as its primary data format.
LOFAR is an array of low frequency antennas – 36 of them currently – designed for to operate at frequencies below 250 MHz, and produces complex multidimensional data sets, as summarized below:
Anastasia pointed out the advantages of HDF5 for recording these types of data, which I take from her talk:
- “HDF5 is a data model, library, and file format for storing and managing large and complex scientific data (images, N-D arrays, tables, metadata).
- It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data.
- Self-describing and portable to a diversity of computational environments
- No inherent size limitations
- C, C++, Java, Fortran 90 interfaces
- Can be run on single node or massively parallel/distributed systems
- Built-in compression (GNU zlib, but can be replaced with others)
- Parallel reading and writing (via MPI-I/O)
- Partial I/O: “Chunked” (tiled) data for faster access
- Free and in use for 20+ years by NASA and other projects
- Inspection and visualization tools exist (HDFView + command line tools, VisIt + pluggin, PyTables, h5py, MATLAB, IDL).”
Astronomers have been slow to warm to this standard. The NASA Earth Observing System (EOS) adopted HDF as its data standard as long as 1990. I think astronomy as a field should make investigation of HDF5 as its next generation data format a top priority.