The cloud provides a powerful environment for running workflow applications, which communicate data between tasks using files. The processing environment must provide access to files, either through a shared file system or by transferring data between nodes. Amazon EC2 provides a number of storage options, differing widely in their architecture, and in this post we will take a look at how cost and performance vary between them. This work is described in the paper by Juve et al.(2010), presented at Supercomputing ’10. The important conclusion is that performance and cost vary widely from file system to file system, and users should do a cost benefit analysis to determine the optimum set-up for their application.
I will examine one application in this post, the Montage image mosaic engine. (I will look at others in future posts.) Montage is a workflow application that creates science grade image mosaics from input FITS files. Figure 1 below illustrates the processing flow.
Montage is strongly I/O-bound. In the use-case considered here, creating a 8-deg square mosaic of M17, Montage spends 95% of its time in I/O operations while executing over 10,000 tasks to produce 8 GB of output from 4.2 GB of input data.
Figure 2 shows how the runtime for the above workflow varies with the number of processing nodes for five different file systems available on Amazon EC2.
Broadly speaking, performance increases with the number of nodes for all the file systems, but performance varies widely, by over a factor of 3, from file system to file system. The paper describes the details of these file systems For this post, we note that the performance variation reflects each file system’s ability to handle the large number of small files for this workflow (29,000 files each of a few MB in size). Users running I/O-bound applications on Amazon EC2 should therefore do a performance analysis to determine the optimum file sharing option.
The same conclusion is obvious from examining the costs of running the workflow. Fig 3 shows that the costs can differ by a factor of 2, and in this case the, slowest performers also cost more.