The Science Cloud 2011 workshop was held on June 8 in San Jose, California, c0-located with the the 20th International ACM Symposium on High-Performance Parallel and Distributed Computing (June 8-11). Following on from last year’s first workshop, the meeting provided a dedicated forum for discussing progress in running scientific computing workloads on Cloud Computing infrastructures. It is thus a technology conference rather than a science conference: it emphasized the use of cloud-based technologies to meet new compute intensive and data intensive scientific challenges that are not well served by the current supercomputers, grids or commercial clouds.
I will write more on some of the presentations in future posts, but today I will give an overview of the meeting. Ion Stoica gave the keynote address on “Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center.” Mesos is a platform for sharing commodity clusters between multiple diverse cluster computing frameworks, such as Hadoop and MPI.
Lavanya Ramakrishnan et al. described their experiences in using the Magellan Science Cloud, and described the challenges that face resource providers in providing stable platforms.
Arkaitz Ruiz-Alvarez and Marty Humphrey described automated tools that allow application developers to select the most cost and performance effective storage choices, using Amazon EC2 as an example.
John Bresnahan et al. described Cumulus, an open source implementation of the Amazon S3 REST API that allows science data centers access to S3, now the de facto interface for storage in the commercial data cloud.
Kristi Morton, Magdalena Balazinska, and Dan Grossman studied the benefits and overheads of lazy MapReduce processing, where the input data are partitioned and only the smallest subset of these partitions are processed to meet a user’s need at any time. They reported guidelines for reducing processing times of analysis tasks.
See the full set of talks and download the presentations here.