A New Model For Data Centers In The Era Of “Big Data”?

The current model of a data center managing data sets generated by a mission in a local computer room and serving data to astronomers, who download and work with them on their desktops, will break down in the coming years when PB-scale data sets become commonplace. This was one of the conclusions of the “Innovations in Data Intensive Astronomy” conference I attended. Of course, there are many people in science thinking about this problem, and innovative and imaginative new models of data centers are being pursued.

What the SKA will look like. Image Credit: SPDO/Swinburne Astronomy Productions

What the SKA will look like. Image Credit: SPDO/Swinburne Astronomy Productions

One of the most interesting data center models is that proposed for the Square Kilometer Array (SKA),  an internationally developed radio telescope that when complete will provide a million square metres of collecting area and will transform our understanding of key science areas such as Strong-field tests of gravity using pulsars and black holes, the origin and evolution of cosmic magnetism and How were the first black holes and stars formed?

The SKA plans to use an HP Performance Optimised Datacentre modular data center and a GPU-based processing center to provide 200 teraflops of computing power and 1.5  and 1.5 petabytes of storage.  But even this impressive data center is thought to be inadequate to meet the processing and storage needs of the project, which will deliver up to an exabyte a day of raw data, compressed to 10’s of PB of images.

The SKA has rejected the use of  the Amazon cloud as a data repository because storage costs will be prohibitive. Instead, after a successful prototyping experiment, it proposes to use a “citizen science” application this year based on the open source Nereus V Cloud computing technology, developed at Oxford University, selected because of its Java code base and its security features. The prototype testbed used 200 clients at the University of Western Australia, Curtin University and iVEC with two servers deployed through management at a NereusCloud domain. The clients  included Mac Minis and Linux-based desktop machines.  When complete, “theskynet,” as it has been called, would give open access to the SKA data sets.

The design is also very green, with an expected average power use of around 10 megawatts, 10 times less than the power use of large supercomputers.

This entry was posted in Astronomy, Cloud computing, cosmology, cyberinfrastructure, data archives, Data Management, High performance computing, information sharing, Parallelization, programming, software engineering, software maintenance, software sustainability and tagged , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s