This is the title of a paper by Surace et al. (2015) currently available on astro-ph and presented as a paper at ADASS XXIV in October 2014. The Palomar Transient Factory (PTF) is an example of a cost-constrained project that is now common in astronomy. It produces a high volume of data, which need near real-time processing for maximum science return, and yet must achieve all of this on a shoestring budget. I will focus in this post on how the cost constraints were managed, rather than give a technical description of the project and its infrastructure. The decisions made exploited many years of expertise at IPAC, developed in managing science operations centers for NASA missions.
The PTF itself is a generic term for several projects, with various observing cadences aimed at discovering supernovae, gamma-ray bursters and other objects. The orginal PTF was succeeded by the “intermediate” Palomar Transient Factory (or iPTF), which concentrates on specific focussed science campaigns, rotated on a quarterly basis. The iPTF was itself succeeded by the Zwicky Transient Facility, which operates with new camera with a field of view of nearly 50 square degrees, composed of inexpensive “wafer-scale” CCDs.
The cost constraints were managed on all parts of the project, from the hardware on the telescope all the way through to the archive system. First of all, it took advantage of a lot of hardware in the data acquisition system:
- It re-used the the CFHT 12k Mosaic Camera, and replaced the liquid nitrogen dewar with a mechanical cryo-cooler.
- The system primarily surveyed the sky in one filter, the R-band, which maximizes survey volume.
- It took advantage of the 1.2-m Oschin-Schmidt telescope, rather than build a new one.
- Telescopic operations are largely robotic.
- Transients discovered by PTF can be followed up in near real-time by other telescopes at Palomar.
All data acquired at the telescope are required for science analysis, and are transmitted to IPAC via a microwave link through the San Diego Supercomputer Center. At IPAC, the data are processed on twenty-four dual-CPU compute drones. The processing itself is embarrassing parallel, with data for each CCD processed on a single drone. Mass storage is managed with a ZFS file system, with data compression. The long-term storage is dual-homed and connected to both the operations system and the archive system. This is done because the the disk is too expensive to manage an operations and an archive copy, at the expense of complexity cost in controlling file ownership between operations and the archive. See the figure below for a schematic of the processing flow:
The data processing system was developed under an agile management process, with only a handful of core staff and with heavy involvement of scientists throughout. This is a key feature of IPAC’s institutional strategy and ensures alignment of software development with science goals. The system draws heavily on existing community software, with individual modules on various languages carrying out specific tasks. Utility was valued over elegance.
The archive is managed within the Infrared Science Archive (IRSA) at IPAC. The PTF archive interface is essentially a thin layer built atop a reusable and portable science information system that has supported the archives of many missions and projects at IPAC for the past decade and a half.
Finally, a critical component of the PTF are “science marshalls,” organized around particular topics and which organize and present results for those topics, and, among other things, allow scientists to interact with the results and form collaborations.