Amazon Web Services and the Square Kilometer Array

This week,  I report on what I think is a creative way to use cloud computing to support the development and preconstruction of the Square Kilometer Array (SKA). The work is reported as an Amazon Web Services case study. The computing and storage demands of the SKA are made clear by these numbers, reported in the case study:  “… the SKA is expected to gather and process as much data from the sky every day as the world currently produces in a year … A single SKA image could be as big as 600 TB, and each (derived) sky map will need thousands of images.”

To inform the  design of the SKA, project partner International Centre for Radio Astronomy Research (ICRAR) developed a community computing initiative called “theSkyNet,” in which  ICRAR uses spare CPU cycles volunteered by the public to simulate the behavior of a supercomputer to analyze images of galaxies from the Pan-STARRS1 telescope in Hawaii.
Now, one of the main difficulties in running crowd-sourced computing projects is that   physical server capacity is not matched to the load imposed by incoming data.

Enter the Amazon Web Services and the Amazon Elastic Compute Cloud 2 (EC2). Broadly speaking, cloud computing providers offer resources that are elastic, that is, they can expand and shrink on demand and can therefore solve this matching problem.  The figure below shows the architecture that is used, and allows ICRAR to process data from 400-500 galaxies simultaneously :

ICRAR-arch-diag

As the case study reports: “ICRAR uses Amazon Route 53  [a scalable Domain Name System (DNS) web service] to route all external users to its theSkyNet websites. The scientists then use one medium Amazon Elastic Compute Cloud (Amazon EC2) instance and on-demand Amazon Machine Images (Amazon AMIs) to process theSkyNet’s crowd-sourced CPU cycles, and another small Amazon EC2 instance as a network file server.”

For storage, ICRAR uses two 60 GB Amazon Elastic Block Store (Amazon EBS) volumes [block-level network-attached storage that persists outside the life of a compute instance] , and manages the flow of data to the community, and archives the data using Amazon Glacier. The Amazon Simple Storage Service (Amazon S3)  [object based internet storage] is used as a key-value store that manages the input from the hundreds of thousands of contributing CPUs around the world.

I wish to thank Mr. Jamie Kinney of AWS for drawing my attention to this article.

Advertisements
This entry was posted in astroinformatics, Astronomy, astronomy surveys, Cloud computing, computer modeling, Computing, Crowsdsourciong, cyberinfrastructure, data archives, Data Management, databases, galaxies, galaxy formation, High performance computing, informatics, information sharing, Observatories, Operations, Parallelization, programming, Scientific computing, SKA, software engineering, software maintenance, software sustainability, user communities and tagged , , , , , , , , , , , , , , . Bookmark the permalink.

One Response to Amazon Web Services and the Square Kilometer Array

  1. Pingback: This Week in AWS, Jan 19 2014 – This Week In AWS

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s