This is the title of a paper by Kitaeff, Wicenec, Wu and Taubman recently posted on astro-ph. The paper addresses the issues of accessing and interacting with very large data-cube images that will be be produced by next generation of radio telescopes such as the Square Kilometer Array (SKA), the Low Frequency Array for Radio Astronomy (LOFAR) and others. Individual images may be TB-sized, and one SKA Reference Mission Project, “Galaxy Evolution in the Nearby Uni-verse: HI Observations,” will generate individual images of 70-90 TB each.
Data sets this large cannot reside on local disks, even with anticipated advances in storage and network technology. Nor will any new lossless compression techniques that preserve the low S/N of the data save the day, for the act of decompression will impose excessive computational demands on servers and clients.
Radio astronomy data are in general stored in self-defining formats such as Flexible Image Transport System (FITS), the Hierarchical Data Format 5 (HDF5), or as an hierarchical structure of directories and files such as CASA Image Tables. Kitaeff et al. propose a new approach: using the ISO/IEC 15444 standard, commonly known as JPEG2000, to store data and to provide access by streaming rather than by download. The JPEG 2000 format is gaining support within the science community, as it offers features beneficial to science such as support for multi-component transforms (e.g.wavelet and linear transforms), outstanding compression performance, and lossless and lossy compression in a single compression architecture; the paper provides a complete list of features. Altogether, JPEG2000 offers flexibility to data providers that is not present in existing common formats.
Kitaeff et al.’s idea is that astronomers do not need complete data cubes put on their desktops, but instead need “cutouts”of parts of data cubes they wish to study or analyze. Moreover, the Region of Interest (ROI) may need to be at the highest resolution or analysis, but the surrounding area may be needed for visualization only and can be served at degraded resolution. The data can be stored as a master image on the server, and the JPEG2000 codestream supports the generation of cutouts of varying degrees of granularity. Part 9 of the JPEG200 standards calls out the JPEG 2000 Interactive Protocol (JPIP), a client/server communication protocol, which enables a server to transmit only those parts of a JPEG2000 image that satisfy a query from a client. Using an HTTP-based query syntax, together with TCP or UDP based transport protocols, JPIP therefore enables streaming of needed content – data and metadata – from the master image file. The figure below illustrates how an image streamed via JPIP may be progressively rendered (from https://wiki.ucar.edu/display/NNEWD/JPIP):
Approaches such as those described in the paper – streaming of parts of images, with variable resolution- will I think be essential in the future, and the authors point out that providers of data access standard should start contemplating the requirements imposed by very large data cubes now rather than later.