This week, I report on what I think is a creative way to use cloud computing to support the development and preconstruction of the Square Kilometer Array (SKA). The work is reported as an Amazon Web Services case study. The computing and storage demands of the SKA are made clear by these numbers, reported in the case study: “… the SKA is expected to gather and process as much data from the sky every day as the world currently produces in a year … A single SKA image could be as big as 600 TB, and each (derived) sky map will need thousands of images.”
To inform the design of the SKA, project partner International Centre for Radio Astronomy Research (ICRAR) developed a community computing initiative called “theSkyNet,” in which ICRAR uses spare CPU cycles volunteered by the public to simulate the behavior of a supercomputer to analyze images of galaxies from the Pan-STARRS1 telescope in Hawaii.
Now, one of the main difficulties in running crowd-sourced computing projects is that physical server capacity is not matched to the load imposed by incoming data.
Enter the Amazon Web Services and the Amazon Elastic Compute Cloud 2 (EC2). Broadly speaking, cloud computing providers offer resources that are elastic, that is, they can expand and shrink on demand and can therefore solve this matching problem. The figure below shows the architecture that is used, and allows ICRAR to process data from 400-500 galaxies simultaneously :
As the case study reports: “ICRAR uses Amazon Route 53 [a scalable Domain Name System (DNS) web service] to route all external users to its theSkyNet websites. The scientists then use one medium Amazon Elastic Compute Cloud (Amazon EC2) instance and on-demand Amazon Machine Images (Amazon AMIs) to process theSkyNet’s crowd-sourced CPU cycles, and another small Amazon EC2 instance as a network file server.”
For storage, ICRAR uses two 60 GB Amazon Elastic Block Store (Amazon EBS) volumes [block-level network-attached storage that persists outside the life of a compute instance] , and manages the flow of data to the community, and archives the data using Amazon Glacier. The Amazon Simple Storage Service (Amazon S3) [object based internet storage] is used as a key-value store that manages the input from the hundreds of thousands of contributing CPUs around the world.
I wish to thank Mr. Jamie Kinney of AWS for drawing my attention to this article.