A New Python Pipeline for Reducing NIRSPEC Data Acquired at the Keck Observatory

The Keck Observatory Archive (KOA) (https://koa.ipac.caltech.edu) has just released a new data reduction pipeline (NSDRP) for data acquired with the Keck Observatory NIRSPEC cross-dispersed infrared echelle spectrograph. The pipeline is intended to reduce high-resolution (0.947 – 2.63 um) object spectra. All phases of reduction – including order location and tracing, flat fielding, background subtraction, spectral extraction, wavelength calibration and data product generation – proceed automatically without user intervention. You can read full details on the project web page, and the code is freely available for download from GitHub. NSDRP is optimized for point sources, but extended objects are accommodated provided there is a discernible peak in the spatial intensity profile. It can reduce a wide range of echelle and cross disperser angles and all of the high resolution slits. NSDRP works best on data that have:

  • A single point source in the slit
  • Reasonably bright targets with detectable continuum
  • NIRSPEC-1 through NIRSPEC-7 filters
  • Well-separated orders without overlapping
  • Sufficient exposure times (~> 30s) with detectable sky lines

Single frames can be reduced,  in which case sky windows adjacent to the spatial peak are used for background subtraction. “AB nod pairs” can be reduced in which case sky background is removed by pair subtraction. OH sky emission lines are used for wavelength calibration. For short exposures with weak sky lines, NSDRP uses the grating equation to approximate the wavelength scale. A  night of data can be reduced together with wavelength calibration determined from long exposure frames can be applied to short exposure frames taken in the same temporarily contiguous instrument configuration. Low-resolution mode and wavelength calibration using arc lamps have been identified for future development.

The NSDRP is written in the Python programming language and makes extensive use of common libraries including numpy, astropy, scipy and matplotlib. The software is written mainly in the procedural style, though the main application-specific data structures are implemented as Python classes.

Data products generated from the reduction results include flux and noise spectra and spatial profiles for each order and wavelength calibration line identification tables in ASCII and FITS formats as well as preview plots in PNG format. Two types of log files are produced: one to record per-night summary data and another to record details of data reduction on a per object frame basis.

KOA has used the pipeline to extract and serve browse-quality spectra from 1,821 nights of NIRSPEC high-dispersion data. The data products comprise a total of over 4,400,000 files altogether; see one sample below


 Rectified image, spatial profile and flux spectrum of the asteroid 511 Davida, extracted with the KOA NIRSPEC Data Reduction Pipeline from observations acquired on April 29, 2010 (H. Roe, PI).

The NSDRP  is described in detail in the software design document and the the data products document.

The high level design is as follows:



Below are some figures illustrating steps in the reduction process, taken from the design document:






Disclosure: I am the manager of the Keck Observatory Archive.

Posted in archives, astroinformatics, Astronomy, Computing, cyberinfrastructure, Data formats, FITS, information sharing, Observatories, Pipelines, programming, publishing, Python, Scientific computing, software engineering, softwarte sustainability, user communities, W. M. Keck Observatory, workflows | Tagged , , , | Leave a comment

The Montage Mosaic Engine and Visualization

Those of you who have used Montage probably know it as a toolkit for processing images to create mosaics. It is, however, finding applicability in the visualization of images, as described in “The Application of Montage to the Visualization of Astronomical Images,” by Bruce Berriman and John Good. It is an Open Access paper published in a PASP Special Focus Edition on Visualization, edited by Brian Kent.

You can watch a video abstract of the paper here:

There are several areas where Montage brings value to visualization work, primarily because its design as a toolkit enables integration into software environments:

  • Integration into visualization environments, usually asa reprojection engine to co-register and re-project multi-wavelength images to a common set of image parameters, and represent them in a common coordinate system. Examples are JS9 and APLPy. Luciani et al 2014  integrated Montage into a highly-scalable client-server architecture intended as a demonstration of how a visualization environment would operate when extended to petascale processing.
  • Creation of cutouts of large images and down-sampled versions of large images that can then be visualized on desktops or in browsers.
  • Rectification of the sky background to a common level to reveal faint, diffuse features in large-scale images.
  • Creation of PNG representations of FITS images by mViewer, which allows full-color (three-image) displays of images with optional color enhancement, and integration with Python. mViewer can be used to create complex images in bulk. It can with a single command create 
complex images such as this one



Three-color mosaic of M51 in the u, g, and r bands of the Sloan Digital Sky Survey (SDSS), shown with an Equatorial J2000 coordinate grid, overlaid with the positions of 2MASS point source catalog sources in the J-band, scaled according to brightness (yellow circles), and with footprints from the Spitzer InfraRed Spectrograph (IRS) Peak-up images (red boxes) and the Multiband Imaging Photometer for Spitzer (MIPS) Spectral Energy Distribution (SED) images (green boxes). The image was created with a single call to mViewer, the visualization tool included in Version 4 of Montage.

as well as maps of sky coverage, such as this one:



Sky coverage of the KELT-N fields (turquoise) and the KELT-S field (blue) compared with those of the Kepler and K2 missions (red). The fields are shown superposed on the 100 μm map of Schlegel et al. (1998).

mViewer includes an adaptive image stretch algorithm that optimizes three features at once: the structure of the brightest pixels; the definition of faint structures; and the definition of mid-brightness level structure. See how it compares with other commonly used stretches in this image:


Three 2MASS JHK color composite image mosaics of NCG 6357 shown side by side to compare three image presentation algorithms: (a) adaptive histogram matching used by Montage; (b) a stretch based on the hyperbolic sine function; and (c) application of the gamma correction.

Montage cam be used to create animations of image cubes, such as this one:  it represents an average of the central 10 velocity planes of a mosaic of five data cubes released as part of the Galactic Arecibo L-band Feed Array HI (GALFA-HI) survey. GALFA is a high-resolution, large-area, high spectral resolution, and wide band survey of the Galactic interstellar medium in the 21 cm line hyperfine transition of neutral hydrogen conducted at Arecibo Observatory.


It can transform images in HEALPix format into an image in any WCS supported image, as in this Plank All-Sky Map:


Enter a caption Planck 857 GHz skymap in Figure 6 reprojected to the Aitoff projection by Montage.

And it can prepare images for consumption by the WWT, as in the above Planck example:



Sample screenshot of the WWT Web interface presenting a section of the Planck 857 GHz map.

All images are reproduced under a Creative Commons 3.0 license. “The Application of the Montage Image Mosaic Engine to the Visualization of Astronomical Images, ” by G. B. Berriman and J C. Good. Published in PASP.  https://doi.org/10.1088/1538-3873/aa5456.  http://iopscience.iop.org/article/10.1088/1538-3873/aa5456/meta



Posted in astroinformatics, Astronomy, astronomy surveys, computer videos, Computing, computing videos, cyberinfrastructure, image mosaics, informatics, programming, Scientific computing, software engineering, software maintenance, software sustainability, Uncategorized, visualization | Tagged , , , | 2 Comments

Best Practices for HPC Software Developers

This is a series of seven videos describing best software engineering practices. I would recommend it to all  HPC software engineers. The series was the result of a collaboration between the IDEAS project, Argonne Leadership Computing Facility, NERSC, and Oak Ridge Leadership Computing Facility.

The videos cover all aspects of software engineering. Here is the first video in the series, “What All Codes Should Do”:

The remaining topics are:

  • “Developing, Configuring, Building, & Deploying”
  • “Distributed Version Control & CI Testing”
  • “Testing & Documenting Your Code”
  • “How HPC Environment is Different from Desktop”
  • “Introduction to High-Performance Parallel I/O”
  •  “Basic Performance Analysis & Optimization”


Posted in astroinformatics, computer modeling, computer videos, Computing, computing videos, cyberinfrastructure, High performance computing, informatics, information sharing, Parallelization, programming, Scientific computing, software engineering, software maintenance, software sustainability | Tagged , , , | Leave a comment

Research Software Engineers: State of the Nation Report 2017

This report provides a history of the U. K.’s Research Software Engineers campaign and a snapshot of the state of the U.K.’s Research Software Engineer (RSE) community.  As I understand it, the term was coined at a U. K. Software Sustainability Institute (SSI) workshop, and now refers to an 800 strong community in the U.K. You can download the report from Zenodo. I urge all of you interested in career paths for academic software engineers to read it.

Briefly, modern research is next to impossible without quality software (one survey in the UK reported that 92% of researchers across all disciplines used research software). By extension, a a significant majority of research results are based on the skill of software engineering experts.  Nevertheless, the work of software experts in academia remains poorly recognized, and they earn much less than their counterparts in industry. They are often hired as researchers, and are their careers are measured by the quality of research they are left unable to perform.

The report  details these difficult straits, such as restrictions related to human resources, finances and funding policies, as well a culture in universities that tends to overlook the importance of software. Various solutions tried to date succeeded largely in making the software engineering  community transient and unrecognized. Thus, since 2012, a community of these experts has grown around a campaign to raise awareness of the software community, who now now in many cases identify as Research Software Engineers.

A grass-roots campaign, initiated and still supported by the Software Sustainability Institute and now  coordinated by the research software engineering community itself has led to thje rapid growth of an active community of almost 800 Research Software Engineers, generated significant international interest, seen the proliferation of a new type of group in academia that increases access to software expertise, and led to the creation of a fellowship that builds on these successes. Case studies in the document describing the careers of RSEs  illuminate all the issues described in this post.

Despite the successes, the report recognizes that much work still needs to be done in engaging all research stakeholders – universities and funding agencies – in implementing software career paths across the U.K.

Growth of the UK RSE Community:


Posted in Career Advice, careers, Computing, programming, Scientific computing, social media, social networking, software engineering, software maintenance, software sustainability | Tagged , , , , , , , | Leave a comment

My Favorite Computing Videos on YouTube

I have often been asked what are my favorite computing videos on YouTube. There are so many, it’s hard to give a definitive answer and I keep discovering more, some posted several years ago. As of today only, here are some of my favorites.

The annual SciPy conferences, held each year in the US and Europe, post videos of the presentations in dedicated channels each. They provide an enormous amount of information on the rapid developments in Python.  You can find the SciPy 2016 talks here, and one of my favorites is “Python and R Together at Last: Writing Cross Language Tools,” by Bill Lattner:

There are many talks and videos on cloud computing. Eli the Computer Guy has a fine introduction to cloud computing, intended as a class:

Some of my favorites though are talks and interviews on the history and philosophy of computing that are hard to find anywhere else. Here is Grady Booch of IBM talking on The History (and the Future) of Software:

It is part of the Computer History Museum channel.

Here is the Professor Linux himself, Linus Torvalds (very funny in parts):




Posted in Cloud computing, computer videos, Computing, computing videos, cyberinfrastructure, Data Science, History of Computing!, informatics, information sharing, programming, Python, R, software engineering, software maintenance, software sustainability | Tagged , , , , , , , | Leave a comment

Community Building Through Software Design

This the title of a very interesting talk (which I wish I had given myself!) by Jed Brown of CU Boulder at the 2017 NSF SI2 meeting. Jed is involved in writing code for the Portable, Extensible Toolkit for Scientific Computation (PETSc) , which is a suite of data structures and routines for the scalable (parallel) solution of scientific applications modeled by partial differential equations. The practices he describes came largely out of supporting PETSc, yet are applicable across many disciplines. I would recommend anyone wishing to build a software community read these slides before diving into the development.

I am not able to post the slides here, but I will show some “slideshots” that I think have particularly broad applicability; in particular, note the comments about forking in the slide on “Upstreaming and community building,” something I haven’t given a lot of thought to.




Posted in astroinformatics, Computing, cyberinfrastructure, Data Science, High performance computing, informatics, information sharing, programming, Scientific computing, software engineering, software maintenance, software sustainability, Uncategorized, user communities | Tagged , , | Leave a comment

A guide to sustainability models for research software projects.

.. subtitled “How can we find funding for our open source project?”

This is a GitHub page started by my colleague Dan Katz of NCSA. This is a living document that lists, with examples, techniques for sustaining software once initial development funding has ended.  To maximize the value of the page, Dan is soliciting contributions and examples from the scientific community.  As of this writing, he lists the following categories:

Donation button
Crowdfunding (one-time)
Crowdfunding (recurring)
Books & merchandise
Advertising & sponsorships
Industry support
Consulting & services
Dual license
Open core
Foundations & consortiums
Venture capital
Trademark licensing & franchising

Here is a sample content, from Industry Support:

“Companies sometimes support particular projects via paying for some development in that project, or by supporting a PhD or researcher to undertake a specific development project. Find a company that uses your project, and determine something the project wants to do that the company is willing to put work into.
(Note that this might overlap Consulting & services below.)


Taps into those who have resources (i.e. companies)
Can be well-aligned with company needs
In certain areas, industrial support for PhD or researcher positions is already an established mechanism


Usually involves “getting lucky”: no clear, repeatable path to finding this arrangement
Project already needs to be well-known and used
Governance issues, company could have undue influence over project
Can affect project dynamics + balance
Case Studies

BoneJ – with F + P Specialist Modelling Group”

Posted in Uncategorized | Leave a comment