The High Cost of Moving Data on the Cloud

Last week I talked about the performance of various types of applications on the Amazon EC2 cloud, compared with their performance on a high-performance cluster. This week I want to focus on the costs of processing and moving and storing data on the Amazon EC2 cloud.  In this study, we are comparing the costs of running three different applications on the Amazon EC2 cloud:  – Montage, an I/O bound astronomy mosaic engine, Broadband, a memory-bound seismology simulation program, and Epigenome, a CPU-bound biochemistry application;   see April 29’s post for details.

The figure below shows the relative costs of running a single job for each application on a number of Amazon EC2 processors – generally, the processors are more powerful and have larger memory from left to right on the histogram.

The most important feature of the Figure is that for Montage, the I/O-bond application, the most cost effective processor is one designated c1.medium, 1 32-bit machine,  even though the processing time is similar to the more expensive machines.

While the processing cost is 0nly $0.60, by far the biggest costs for an I/O-bound application such as  Montage come from moving data into the cloud, storing them while the processing is done, and then transferring the data out of the cloud. The table below shows the data costs for this particular job.

Data costs incurred in running a Montage job on Amazon EC2

For Broadband and Epigenome, the data costs are smaller than the processing costs because the output is much smaller than the input. But for Montage, where the output products are larger than the input files, the data costs are larger than the processing costs on the most cost-effective processor.

Amazon EC2 offers users a smorgasbord of processor choices for end users. My advice is that it is always worthwhile doing a cost benefit analysis to make sure you get the best value for your application, and be careful about data costs – they  add up quickly.

This research is described in G. Juve, E. Deelman, K. Vahi, G. Mehta, B. Berriman, B. P. Berman, and P. Maechling, “Scientific Workflow Applications on Amazon EC2,” in Cloud Computing Workshop  in Conjunction with e-Science Oxford, UK: IEEE, 2009. I wish to thank my colleagues for permission to describe our work in this post.

This entry was posted in astroinformatics, Astronomy, Cloud computing, High performance computing. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s