The current model of a data center managing data sets generated by a mission in a local computer room and serving data to astronomers, who download and work with them on their desktops, will break down in the coming years when PB-scale data sets become commonplace. This was one of the conclusions of the “Innovations in Data Intensive Astronomy” conference I attended. Of course, there are many people in science thinking about this problem, and innovative and imaginative new models of data centers are being pursued.
One of the most interesting data center models is that proposed for the Square Kilometer Array (SKA), an internationally developed radio telescope that when complete will provide a million square metres of collecting area and will transform our understanding of key science areas such as Strong-field tests of gravity using pulsars and black holes, the origin and evolution of cosmic magnetism and How were the first black holes and stars formed?
The SKA plans to use an HP Performance Optimised Datacentre modular data center and a GPU-based processing center to provide 200 teraflops of computing power and 1.5 and 1.5 petabytes of storage. But even this impressive data center is thought to be inadequate to meet the processing and storage needs of the project, which will deliver up to an exabyte a day of raw data, compressed to 10’s of PB of images.
The SKA has rejected the use of the Amazon cloud as a data repository because storage costs will be prohibitive. Instead, after a successful prototyping experiment, it proposes to use a “citizen science” application this year based on the open source Nereus V Cloud computing technology, developed at Oxford University, selected because of its Java code base and its security features. The prototype testbed used 200 clients at the University of Western Australia, Curtin University and iVEC with two servers deployed through management at a NereusCloud domain. The clients included Mac Minis and Linux-based desktop machines. When complete, “theskynet,” as it has been called, would give open access to the SKA data sets.
The design is also very green, with an expected average power use of around 10 megawatts, 10 times less than the power use of large supercomputers.