I must congratulate my colleague Paul Hirst at Gemini – he is to my knowledge the first astronomer to use “cheap” and “cloud” in the same title. His talk on building a new archive for the Gemini telescopes was presented at ADASS XXV in Sydney.
Gemini operates two 8.1 m optical/IR telescopes, one on Mauna Kea, Hawaii, and the other on Cerro Pachon, Chile. The data sets these telescopes produce are not especially large by modern standards – 5GB per night of raw FITS files, with a total volume to date of 27.5TB raw FITS – yet they are diverse, with Imaging and Spectroscopy (Long Slit, cross-dispersed, Fiber-fed, Integral Field; Polarimetry, Adaptive Optics) over the 0.3 um to 25 um wavelength range.
The archive architecture looks like this:
The interesting part of the diagram is the “AWS S3” block, which represents the S3 storage system of Amazon Web Services (AWS), where the data are housed. The data flow from telescope to cloud may be described as follows:
- Local installs on the summit at each telescope ingest files from local disk during observing and export it (via HTTP post) to the archive.
- An archive server on the Elastic Cloud 2 (ECS) at AWS stores data on S3 and ingests it into archive database. The server is a single 4-core 16GB ram EC2 instance(M3.xlarge in Amazon speak)at AWS.
- The latency from the time the file is written at telescope to being available for user download from the archive is typically 20-60 seconds.
AWS offers many options for scaling the performance on demand, when required.
The really interesting part of this concerns the cost. From Paul’s slides:
- S3 storage: $0.03/GB/month = $2880 /8TB/yr
- EC2: M4.xl (4CPU, 16GB) = $2470 for 3 years
- = $0.09 per hour = $823/yr
- Hilo power: $0.40 /kWh. Say 250W => $0.10/hour just for power and cooling. Let alone buying the actual hardware!
- Data transfer in is Free
- Data transfer internally (eg S3 – EC2) is Free
- Data transfer out to internet: < $90 /TB.
- Expecting ~ 200GB/month => $200/yr
- EBS SSD: $0.10/GB/mo. Say $100GB = $120/yr
- Glacier backup – approx 0.25 * cost(S3) say $1000/yr.
- Allow say $250 / yr for extra CPU and EBS time for rebuilds / test, double up during upgrades, storing snapshots…
These charges total ~$6,000/ yr. The bottom line is that Amazon’s current cost structure is such that the cost per hour operating the archive is approximately the same as the power and cooling costs in Hilo (apart from buying the hardware). The cost benefit analysis given above is a fine example of the analysis that needs to be done if you are thinking of migrating a project to the cloud.
Moreover, the archive is fast, with typical search page response of < 1 second. And new data are available generally <1 minute after readout. Staffing consumed 3 FTEs over 3 years from project start to deployment.
I wish to thank Dr Paul Hirst for supplying his charts and supporting the preparation of this blogpost.