Recently, I came across an interesting paper by R.J. Sobie et al. (2011) called “Data Intensive High Energy Physics Analysis in a Distributed Cloud.” I am interested in applications of cloud computing to astronomy, so this paper took my eye. Rather than use a single cloud, they showed how large data sets can be managed in a distributed cloud. The diagram below shows their architecture: they used clouds in Ottawa and Victoria, and the Amazon EC2 cloud, with a Condor-based job scheduler to submit jobs to the appropriate queue; the paper gives full technical details.
The input data, stored on a Lustre parallel file system in Victoria, were 4-5 TB subsets of data from Ba Bar, a detector built at SLAC to study the millions of B mesons produced by the PEP-II storage ring. The jobs run on the cloud generally ran for 12 hours and analyzed particles produced in these collisions. They showed that the system successfully ran one hundred jobs efficiently and reliably, and the authors anticipate that the system should scale to thousands of jobs.