Yesterday evening, there was a BOF on FITS, and much of the discussion centered on whether FITS is nearing the end of its useful lifetime. There was some discussion on whether, e.g., the IAU Commission on FITS should evolve into a commission on data formats. Ideas were presented on how to upgrade FITS. A show of hands at the end favored ditching FITS over both layering the current FITS library with upgrades and taking components from FITS and incorporating them in a new library.
Onto Day Two proper, and I will write about the mornings festivities. The morning section focused on User Interfaces. Matthew Schneps emphasized the need to design software for usability from the ground up, and used historical algorithms for multiplication as an example. Arvind Gopu and Michael Young presented two complementary tools, IMAGE BROWSER and SOURCE EXPLORER, which use browsers to support image and source analysis while leaving the data on a remote server.
For me, the most interesting talk to date was by Benoît Pirenne on Data Management in Big Science: The challenges are not only technical! The talk centered on how Ocean Sciences, which are now getting data 24 x 7 x 365, are adapting to the era of big data. He outlined what data management (DM) is, and pointed out that a DM system should take up 10% of the hardware costs, and eat up 10% of operational costs. Some important points:
- Technology usually catches up with the needs of a new project
- Data need to be fully describing, and this should be built in from Day One.
- Need buy-in from PIs – a battle nearly won as the NSF requires a DM plan (I don’t agree completely here; the NSF does not provide infrastructure).
- Need buy-in regarding sharing of data.
He described the Canadian Ocean Networks Observatory and how this system brings ocean science into the big data area; it serves the needs of many subdiscplines within the ocean sciences field, and enables real-time response to weather events.
The data rate is 70 TB/year and the system has an expected lifetime of 25 years. It must handle multipleGBit throughput from distributed observing stations. And, end users are not used to handling such large volumes. He pointed out that it took a lot of work to get end users to change their habits, and to use Web 2.o and crowdsourcing.
He disliked data download and the strain it places on resources – he much preferred moving code to data. He used a software model where the S/W team performs all the tasks needed to support the system (my favored model too!).