This is the text of a presentation I gave at the 22nd Annual Astronomical Data Analysis Software and Systems Conference, held in Champaign, Illinois (Nov 4-8, 2012). You may download the slides here: Sagan e-Science 2012 gbb best.
The work was performed in collaboration with Carolyn Brinkworth, Dawn Gelino and Dennis Wittman (NASA Exoplanet Science Institute, Caltech), Ewa Deelman, Gideon Juve and Mats Rynge (Information Sciences Institute, USC), and Jamie Kinney (Amazon.com,Inc).
The NASA Exoplanet Science Center (NEXScI) hosts the Sagan Workshops, annual themed conferences aimed at introducing the latest techniques in exoplanet astronomy to young researchers. The workshops emphasize interaction with data, and include hands-on sessions where participants use their laptops to follow step-by-step tutorials given by experts.
The 2012 workshop had the theme “Working With Exoplanet Light Curves, “and posed special challenges for the conference organizers because the three applications chosen for the tutorials run on different platforms, and because over 160 persons attended, much the largest attendance to date. One of the applications, PyKE, is a suite of Python tools designed to reduce and analyze Kepler light curves; called from PyRAF or from the Unix command line. The Transit Analysis Package (TAP) uses Markov Chain Monte Carlo (MCMC) techniques to fit light curves under the Interactive Data Language (IDL) environment, and Transit Timing Variations (TTV) uses IDL tools and Java-based GUIs to confirm and detect exoplanets from timing variations in light curve fitting.
Rather than attempt to run these diverse applications on the inevitable wide range of environments on attendees’ laptops, the conference organizers, in consultation with the Virtual Astronomical Observatory, chose instead to run the applications on the Amazon Elastic Cloud 2 (EC2). This paper summarizes the system architecture, the Amazon resources consumed, and lessons learned and best practices.
2. The System Architecture
The Sagan Workshop took advantage of the EC2’s capabilities to support Virtual Ma- chines (VMs) that can be customized to meet local needs, then replicated, and then released on completion of the jobs. Fig 1 shows the system architecture developed to support the Sagan Workshop.
Participants logged into one of four servers dedicated to the workshop via a Virtual Network Connection (VNC). The Amazon Elastic Block Storage (EBS) system and the Network File System (NFS) were used to share common datasets and user home directories across all virtual machines. An IDL license server at IPAC used ssh to provide the licenses to the servers. The list following describes the architecture component by component and the rationale for the design choices.
- One master virtual machine image, built on the Cent OS 64-bit operating system, was used for all servers. A boot script determined the VMs identity. Usernames and passwords were the same on all machines.
- 1-TB of Elastic Block Storage (EBS), a block-based storage service where vol- umes appear as disk drives connected to VMs, contained applications, tutorial data, and user home directories. Applications and tutorial data are installed on VM images,and so data are not lost If a tutorial server fails.
- The EC2 m1.2xlarge instance type was chosen to handle the load of 20 tutorial servers It has enough memory to cache commonly-accessed files, mounts all the partitions from the EBS volumes, and exports all partitions via NFS to the tutorial servers.
- The tutorial servers were EC2 c1.xlarge instance type, with 8 cores and 7 GB RAM, chosen because the applications were CPU-bound. Server performance was benchmarked with 8 users, but the servers were in fact able to support 25 users
- A Virtual Network Computing (VNC) server provided remote desktop logins to the tutorial servers. VNC is similar to the X window system, but sends com- pressed images instead of drawing commands and proved more responsive than X in our tests. Each tutorial server ran one VNC server that supported up to 30 connections. Screen resolution set to 1024×768 to balance usability and performance. In practice, the server was TigerVNC provided the server and RealVNC was the client.
- The tutorial servers were connected via an ssh tunnel to an to IDL license server at IPAC. IDL thinks the license server is on localhost, and the license server thinks IDL is inside IPAC network.We used autossh to ensure that the tunnel was re-established if disconnected.
- The Amazon AWS Security Rules limited access only to the VNC and the SSH ports, and only from the Caltech and IPAC subnets used to support the workshop.
3. Cost of Using the Amazon EC 2 Cloud
Had the Sagan Workshop’s Amazon EC2 costs not been met by an Educational Grant, the total cost of installation, testing and running the workshop sessions would have been $2,876. The breakdown of the costs is shown in Table 1.
- Automate processes wherever possible, as this allows easier management of large numbers of machines and easy recovery in the case of failure. Tutorial servers are automatically mounted NFS partitions when booted and SSH tunnels are automatically reconnected in failure.
- Test, test, and test again. Document and test all the steps required to recover if a VM fails, and step through the tutorials under as close to operational conditions as possible.
- Develop a failover system. We copied the final software configuration to two local machines for use if Amazon failed.
- Give yourself plenty of time to solve problems. In our case, we needed to assure the IDL vendor that licenses would not persist on the cloud, and we needed to understand the poor performance of X for remote access to the cloud.