What Types of Science Applications Are Best Run On The Cloud?

The pay-as-you-go purchasing model offered by cloud computing services is of obvious appeal. Why pay good money for computers to sit idle in your computer room? Businesses, educators and researchers are taking of this powerful new resource. But what types of science applications are best run on the cloud – are there performance constraints or hidden costs in running the processing or data intensive applications common in astronomy?

I am working with my colleagues at the University of Southern California (see footnote) to address this issue. We have been comparing the performance and processing cost of three workflow applications with very different characteristics – Montage, an I/O bound astronomy mosaic engine, Broadband, a memory-bound seismology simulation program, and Epigenomics, a CPU-bound biochemistry application.

In this first post, I will talk about the performances of these three applications on the Amazon EC2 cloud and on the Abe High Performance Cluster at the National Center for Supercomputing Applications. Briefly, the processors on each platform are commodity Linux servers, but Amazon uses virtualization and Abe is equippe with a high performance network. The figure below shows the wall clock times for workflows to run to completion. We attempted to compare the performances on the two platforms, so the same workflow for each application was always run on single nodes. In the figure legend, m1 and c1 refer to different types of virtual processors on Amazon EC2, roughly less powerful and smaller memory to more powerful and higher memory, and abe refers to processors on Abe, one with a local file system, and the other with a high-performance Lustre (TM) network.

Processing times for three applications on Amazon EC2 and on a High Performance Cluster

For Broadband and Epigenomics, the performance on Amazon EC2 generally improves as the machine becomes more powerful; the poorer performance of Broadband on c1.medium (green) is because this machine has relatively small memory. The high-performance network on Abe offers no clear performance benefit for these two applications. Such is not the case for the I/O-bound Montage application. While performance generally improves as the machines become more powerful, the high-performance network on Abe offers a factor of two performance gain over the Amazon processors. Altogether, then, Amazon provides excellent performance on memory and processor bound applications, but is performance suffers for I/O-bound applications. The high I/O budget for Montage has other implications for cost and performance on Amazon EC2, and I will discuss these in my next post.

This research is described in G. Juve, E. Deelman, K. Vahi, G. Mehta, B. Berriman, B. P. Berman, and P. Maechling, “Scientific Workflow Applications on Amazon EC2,” in Cloud Computing Workshop  in Conjunction with e-Science Oxford, UK: IEEE, 2009. I wish to thank my colleagues for permission to describe our work in this post

This entry was posted in astroinformatics, Astronomy, Cloud computing, High performance computing and tagged , , , , . Bookmark the permalink.

2 Responses to What Types of Science Applications Are Best Run On The Cloud?

  1. mel says:

    Welcome to the blogosphere!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s