While attending the ADASS XXV meeting in Sydney, I heard this excellent presentation by Steve Berukoff 0n data management at the National Solar Obervatory’s Daniel K. Inouye Solar Telescope, now under construction on Haleakala, Maui, Hawaii. The telescope is due for completion in Q4 2019, and will then be the world’s largest-aperture national facility solar telescope. It has a planned lifetime of 44 years, will award time based on competitive proposals, and will maintain a long-term archive and implement an Open Data policy. The DKIST has identified an ambitious 26 topical areas, along with a suite of high-resolution visible & near-IR imaging & spectropolarimetry instrumentation.
This ambitious program gives rise to some serious challenges in data management. This chart summarizes the scale of the storage and curation challenges:
The data management team are preparing themselves to handle the data and the metadata when the telescope sees first light. This includes the development of adaptive metadata handling algorithms that will lead to the semi-autonous production of calibrated data sets.
The part of the talk I particularly enjoyed was the discussion on scalable storage approaches. The traditional RAID approach offers bit loss protection and fault tolerance at the expense of slow rebuilds and poor scalability per dollar. The team has instead started looking at object and block based storage. This chart compares them side by side:
Object storage looks very promising. It is widely used in, e.g., cloud storage, Open Stack Swift and others. In particular, it supports Erasure Coding and “Information Dispersal,” whereby data are divided into m fragments, then recoded into n (n>m) fragments. After assigning object IDs to all fragments, they can be dispersed into a storage system, and the system can recover the data from any m fragments. The technique offers :
- Better storage costs at scale than RAID
- Extreme scalability
- Excellent aggregate bandwidth (10s Gbps)
- “Self-healing,” in which the system determines bit loss and replaces replicates the data correctly.
I hope the team continues to report on progress as they develop their data management system, as I think they will learn much of value to the rest of astronomy.
I wish to thank Dr Steve Berukoff for his assistance in preparing this post.