Summer App Space Innovation Speaker Series

Summer App Space is a paid apprenticeship for LA students and teachers to learn to program while getting paid to do fun space-related projects, and this year they are funding 12 students to attend a six-week school learning to program in Python.

Part of this program is the Innovation Speaker Series, where “industry leaders in  astronomy, astrophysics, aerospace, software engineering, investment, and entrepreneurship (startups) share their journey and technical knowledge.”  These talks are I think of broad value and interest. Speakers to date include Adam Lichtl, CEO at Delta Brain; Athena Ahmadi, Software Engineer at Google; Sean M. Carroll, theoretical physicist at Caltech; Andrew Pryor-Miller, Quality Assurance Engineer Lead at Snapchat; D.A. Wallach, recording artist, songwriter, investor, and essayist; and my colleague at IPAC, Solange Ramirez. The talks are posted on-line at the Summer App Space web page and on YouTube on the Caltech channel.

Here are my two favorite talks.  Sean Carroll speaking about his life in science and scientific outreach.



and Solange Ramirez on Bridging Science and Engineering. Solange describes how she came to realize the need to have an interdisciplinary approach to science and her career.

Posted in astroinformatics, Astronomy, Career Advice, careers, Computing, computing videos, cyberinfrastructure, informatics, information sharing, programming, publishing, Scientific computing, social media, social networking, software engineering, software maintenance, software sustainability, visualization | Tagged , , , , , , , | 1 Comment

Scientific Workflows for Science and the Science of Scientific Workflows

In this era of compute intensive astronomy, tools for managing complex workflows are becoming a crucial part of our science infrastructure. This video offers a fine introduction to the Pegasus workflow manager by the team leader, Ewa Deelman (ISI, USC).

The abstract for Ewa’s talk follows::

“Modern science often requires the processing and analysis of vast amounts of data in search of postulated phenomena, and the validation of core principles through the simulation of complex system behaviours and interactions. This is the case in fields such as astronomy, bioinformatics, physics, and climate and ocean modelling, and others.

In order to support the computational and data needs of today’s science, new knowledge must be gained on how to deliver the growing high-performance and distributed computing resources to the scientist’s desktop in an accessible, reliable and scalable way.

In over a decade of working with domain scientists, the Pegasus project has developed tools and techniques that automate the computational processes used in data- and compute-intensive research. Among them is the scientific workflow management system, Pegasus, which is being used by researchers to discover gravitational waves, model seismic wave propagation, to discover new celestial objects, to study RNA critical to human brain development, and to investigate other important research questions.

This talk will review the conception and evolution of the Pegasus research program. It will touch upon the role of scientific workflow systems in advancing science, and will give specific examples of how the Pegasus Workflow Management System has done so. It will describe how the Pegasus project has adapted to changes in application needs and to advances in high performance and distributed computing systems. It will discuss the interleaving of Computer Science research and software development and how each benefits from the other while providing value to other science domains.”

If you want a more general overview of workflow managers, see this talk at NCSA by Scott Callaghan:

Posted in astroinformatics, Astronomy, Cloud computing, computer modeling, computer videos, Computing, computing videos, cyberinfrastructure, Grid Computing, High performance computing, informatics, information sharing, Parallelization, programming, Scientific computing, software engineering, software maintenance, software sustainability, Uncategorized, workflows | Tagged , , , | 1 Comment

A New Python Pipeline for Reducing NIRSPEC Data Acquired at the Keck Observatory

The Keck Observatory Archive (KOA) ( has just released a new data reduction pipeline (NSDRP) for data acquired with the Keck Observatory NIRSPEC cross-dispersed infrared echelle spectrograph. The pipeline is intended to reduce high-resolution (0.947 – 2.63 um) object spectra. All phases of reduction – including order location and tracing, flat fielding, background subtraction, spectral extraction, wavelength calibration and data product generation – proceed automatically without user intervention. You can read full details on the project web page, and the code is freely available for download from GitHub. NSDRP is optimized for point sources, but extended objects are accommodated provided there is a discernible peak in the spatial intensity profile. It can reduce a wide range of echelle and cross disperser angles and all of the high resolution slits. NSDRP works best on data that have:

  • A single point source in the slit
  • Reasonably bright targets with detectable continuum
  • NIRSPEC-1 through NIRSPEC-7 filters
  • Well-separated orders without overlapping
  • Sufficient exposure times (~> 30s) with detectable sky lines

Single frames can be reduced,  in which case sky windows adjacent to the spatial peak are used for background subtraction. “AB nod pairs” can be reduced in which case sky background is removed by pair subtraction. OH sky emission lines are used for wavelength calibration. For short exposures with weak sky lines, NSDRP uses the grating equation to approximate the wavelength scale. A  night of data can be reduced together with wavelength calibration determined from long exposure frames can be applied to short exposure frames taken in the same temporarily contiguous instrument configuration. Low-resolution mode and wavelength calibration using arc lamps have been identified for future development.

The NSDRP is written in the Python programming language and makes extensive use of common libraries including numpy, astropy, scipy and matplotlib. The software is written mainly in the procedural style, though the main application-specific data structures are implemented as Python classes.

Data products generated from the reduction results include flux and noise spectra and spatial profiles for each order and wavelength calibration line identification tables in ASCII and FITS formats as well as preview plots in PNG format. Two types of log files are produced: one to record per-night summary data and another to record details of data reduction on a per object frame basis.

KOA has used the pipeline to extract and serve browse-quality spectra from 1,821 nights of NIRSPEC high-dispersion data. The data products comprise a total of over 4,400,000 files altogether; see one sample below


 Rectified image, spatial profile and flux spectrum of the asteroid 511 Davida, extracted with the KOA NIRSPEC Data Reduction Pipeline from observations acquired on April 29, 2010 (H. Roe, PI).

The NSDRP  is described in detail in the software design document and the the data products document.

The high level design is as follows:



Below are some figures illustrating steps in the reduction process, taken from the design document:






Disclosure: I am the manager of the Keck Observatory Archive.

Posted in archives, astroinformatics, Astronomy, Computing, cyberinfrastructure, Data formats, FITS, information sharing, Observatories, Pipelines, programming, publishing, Python, Scientific computing, software engineering, softwarte sustainability, user communities, W. M. Keck Observatory, workflows | Tagged , , , | Leave a comment

The Montage Mosaic Engine and Visualization

Those of you who have used Montage probably know it as a toolkit for processing images to create mosaics. It is, however, finding applicability in the visualization of images, as described in “The Application of Montage to the Visualization of Astronomical Images,” by Bruce Berriman and John Good. It is an Open Access paper published in a PASP Special Focus Edition on Visualization, edited by Brian Kent.

You can watch a video abstract of the paper here:

There are several areas where Montage brings value to visualization work, primarily because its design as a toolkit enables integration into software environments:

  • Integration into visualization environments, usually asa reprojection engine to co-register and re-project multi-wavelength images to a common set of image parameters, and represent them in a common coordinate system. Examples are JS9 and APLPy. Luciani et al 2014  integrated Montage into a highly-scalable client-server architecture intended as a demonstration of how a visualization environment would operate when extended to petascale processing.
  • Creation of cutouts of large images and down-sampled versions of large images that can then be visualized on desktops or in browsers.
  • Rectification of the sky background to a common level to reveal faint, diffuse features in large-scale images.
  • Creation of PNG representations of FITS images by mViewer, which allows full-color (three-image) displays of images with optional color enhancement, and integration with Python. mViewer can be used to create complex images in bulk. It can with a single command create 
complex images such as this one



Three-color mosaic of M51 in the u, g, and r bands of the Sloan Digital Sky Survey (SDSS), shown with an Equatorial J2000 coordinate grid, overlaid with the positions of 2MASS point source catalog sources in the J-band, scaled according to brightness (yellow circles), and with footprints from the Spitzer InfraRed Spectrograph (IRS) Peak-up images (red boxes) and the Multiband Imaging Photometer for Spitzer (MIPS) Spectral Energy Distribution (SED) images (green boxes). The image was created with a single call to mViewer, the visualization tool included in Version 4 of Montage.

as well as maps of sky coverage, such as this one:



Sky coverage of the KELT-N fields (turquoise) and the KELT-S field (blue) compared with those of the Kepler and K2 missions (red). The fields are shown superposed on the 100 μm map of Schlegel et al. (1998).

mViewer includes an adaptive image stretch algorithm that optimizes three features at once: the structure of the brightest pixels; the definition of faint structures; and the definition of mid-brightness level structure. See how it compares with other commonly used stretches in this image:


Three 2MASS JHK color composite image mosaics of NCG 6357 shown side by side to compare three image presentation algorithms: (a) adaptive histogram matching used by Montage; (b) a stretch based on the hyperbolic sine function; and (c) application of the gamma correction.

Montage cam be used to create animations of image cubes, such as this one:  it represents an average of the central 10 velocity planes of a mosaic of five data cubes released as part of the Galactic Arecibo L-band Feed Array HI (GALFA-HI) survey. GALFA is a high-resolution, large-area, high spectral resolution, and wide band survey of the Galactic interstellar medium in the 21 cm line hyperfine transition of neutral hydrogen conducted at Arecibo Observatory.


It can transform images in HEALPix format into an image in any WCS supported image, as in this Plank All-Sky Map:


Enter a caption Planck 857 GHz skymap in Figure 6 reprojected to the Aitoff projection by Montage.

And it can prepare images for consumption by the WWT, as in the above Planck example:



Sample screenshot of the WWT Web interface presenting a section of the Planck 857 GHz map.

All images are reproduced under a Creative Commons 3.0 license. “The Application of the Montage Image Mosaic Engine to the Visualization of Astronomical Images, ” by G. B. Berriman and J C. Good. Published in PASP.



Posted in astroinformatics, Astronomy, astronomy surveys, computer videos, Computing, computing videos, cyberinfrastructure, image mosaics, informatics, programming, Scientific computing, software engineering, software maintenance, software sustainability, Uncategorized, visualization | Tagged , , , | 2 Comments

Best Practices for HPC Software Developers

This is a series of seven videos describing best software engineering practices. I would recommend it to all  HPC software engineers. The series was the result of a collaboration between the IDEAS project, Argonne Leadership Computing Facility, NERSC, and Oak Ridge Leadership Computing Facility.

The videos cover all aspects of software engineering. Here is the first video in the series, “What All Codes Should Do”:

The remaining topics are:

  • “Developing, Configuring, Building, & Deploying”
  • “Distributed Version Control & CI Testing”
  • “Testing & Documenting Your Code”
  • “How HPC Environment is Different from Desktop”
  • “Introduction to High-Performance Parallel I/O”
  •  “Basic Performance Analysis & Optimization”


Posted in astroinformatics, computer modeling, computer videos, Computing, computing videos, cyberinfrastructure, High performance computing, informatics, information sharing, Parallelization, programming, Scientific computing, software engineering, software maintenance, software sustainability | Tagged , , , | Leave a comment

Research Software Engineers: State of the Nation Report 2017

This report provides a history of the U. K.’s Research Software Engineers campaign and a snapshot of the state of the U.K.’s Research Software Engineer (RSE) community.  As I understand it, the term was coined at a U. K. Software Sustainability Institute (SSI) workshop, and now refers to an 800 strong community in the U.K. You can download the report from Zenodo. I urge all of you interested in career paths for academic software engineers to read it.

Briefly, modern research is next to impossible without quality software (one survey in the UK reported that 92% of researchers across all disciplines used research software). By extension, a a significant majority of research results are based on the skill of software engineering experts.  Nevertheless, the work of software experts in academia remains poorly recognized, and they earn much less than their counterparts in industry. They are often hired as researchers, and are their careers are measured by the quality of research they are left unable to perform.

The report  details these difficult straits, such as restrictions related to human resources, finances and funding policies, as well a culture in universities that tends to overlook the importance of software. Various solutions tried to date succeeded largely in making the software engineering  community transient and unrecognized. Thus, since 2012, a community of these experts has grown around a campaign to raise awareness of the software community, who now now in many cases identify as Research Software Engineers.

A grass-roots campaign, initiated and still supported by the Software Sustainability Institute and now  coordinated by the research software engineering community itself has led to thje rapid growth of an active community of almost 800 Research Software Engineers, generated significant international interest, seen the proliferation of a new type of group in academia that increases access to software expertise, and led to the creation of a fellowship that builds on these successes. Case studies in the document describing the careers of RSEs  illuminate all the issues described in this post.

Despite the successes, the report recognizes that much work still needs to be done in engaging all research stakeholders – universities and funding agencies – in implementing software career paths across the U.K.

Growth of the UK RSE Community:


Posted in Career Advice, careers, Computing, programming, Scientific computing, social media, social networking, software engineering, software maintenance, software sustainability | Tagged , , , , , , , | Leave a comment

My Favorite Computing Videos on YouTube

I have often been asked what are my favorite computing videos on YouTube. There are so many, it’s hard to give a definitive answer and I keep discovering more, some posted several years ago. As of today only, here are some of my favorites.

The annual SciPy conferences, held each year in the US and Europe, post videos of the presentations in dedicated channels each. They provide an enormous amount of information on the rapid developments in Python.  You can find the SciPy 2016 talks here, and one of my favorites is “Python and R Together at Last: Writing Cross Language Tools,” by Bill Lattner:

There are many talks and videos on cloud computing. Eli the Computer Guy has a fine introduction to cloud computing, intended as a class:

Some of my favorites though are talks and interviews on the history and philosophy of computing that are hard to find anywhere else. Here is Grady Booch of IBM talking on The History (and the Future) of Software:

It is part of the Computer History Museum channel.

Here is the Professor Linux himself, Linus Torvalds (very funny in parts):




Posted in Cloud computing, computer videos, Computing, computing videos, cyberinfrastructure, Data Science, History of Computing!, informatics, information sharing, programming, Python, R, software engineering, software maintenance, software sustainability | Tagged , , , , , , , | Leave a comment