Videos on Getting Started With Cloud Computing

A number of people have asked if I would post links to videos that provide useful introductions to Cloud Computing. Believe it or not, there are many, many videos on this topic posted on YouTube, and I will post here links to those that I think might be useful if you are getting started with cloud computing.

Cloud Computing Explained

What is the Cloud? (now with pictures!)

A bit more technical: The Three Ways to Cloud Compute

A longer but very good introduction:  Introduction to Cloud Computing

This last video is by Eli the Computer Guy. His channel at https://www.youtube.com/user/EliComputerGuyLive has a lot of interesting computer videos – well worth a look.

Posted in Cloud computing, computer videos, Computing, cyberinfrastructure, Data formats | Tagged , , , | Leave a comment

The Palomar Transient Factory: High Quality Realtime Data Processing in a Cost-Constrained Environment

This is the title of a paper by Surace et al. (2015) currently available on astro-ph and presented as a paper at ADASS XXIV in October 2014. The Palomar Transient Factory (PTF) is an example of a cost-constrained project that is now common in astronomy. It produces a high volume of data, which need near real-time processing for maximum science return, and yet must achieve all of this on a shoestring budget. I will focus in this post on how the cost constraints were managed, rather than give a technical description of the project and its infrastructure. The decisions made exploited many years of expertise at IPAC, developed in managing science operations centers for NASA missions.

The PTF itself is a generic term for several projects, with various observing cadences aimed at discovering supernovae, gamma-ray bursters and other objects. The orginal PTF was succeeded by the “intermediate” Palomar Transient Factory (or iPTF), which concentrates on specific focussed science campaigns, rotated on a quarterly basis. The iPTF was itself succeeded by the Zwicky Transient Facility, which operates with new camera with a field of view of nearly 50 square degrees, composed of inexpensive “wafer-scale” CCDs.

The cost constraints were managed on all parts of the project, from the hardware on the telescope all the way through to the archive system. First of all, it took advantage of a lot of hardware in the data acquisition system:

  • It re-used the the CFHT 12k Mosaic Camera, and replaced the liquid nitrogen dewar with a mechanical cryo-cooler.
  • The system primarily surveyed the sky in one filter, the R-band, which maximizes survey volume.
  • It took advantage of the 1.2-m Oschin-Schmidt telescope, rather than build a new one.
  • Telescopic operations are largely robotic.
  • Transients discovered by PTF can be  followed up in near real-time by other telescopes at Palomar.

All data acquired at the telescope are required for science analysis, and are transmitted to IPAC via a microwave link through the  San Diego Supercomputer Center.  At IPAC, the data are processed on twenty-four dual-CPU compute drones. The processing itself is embarrassing parallel, with data for each CCD processed on a single drone. Mass storage is managed with a ZFS file system, with data compression. The long-term storage is dual-homed and connected to both the operations system and the archive system. This is done because the the  disk is too expensive to manage an operations and an archive copy, at the expense of  complexity cost in controlling file ownership between operations and the archive. See the figure below for a schematic of the processing flow:

pipe

The data processing system was developed under an agile management process, with only a handful of core staff and with heavy involvement of scientists throughout. This is a key feature of IPAC’s institutional strategy and ensures alignment of software development with science goals. The system draws heavily on existing community software, with individual modules on various languages carrying out specific tasks.  Utility was valued over elegance.

The archive is managed within the Infrared Science Archive (IRSA) at IPAC. The PTF archive interface is essentially a thin layer built atop a reusable and portable science information system that has supported the archives of many missions and projects at IPAC for the past decade and a half.

Finally, a critical component of the PTF are “science marshalls,” organized around particular topics and which organize and present results for those topics, and, among other things, allow scientists to interact with the results and form collaborations.

Posted in astroinformatics, Astronomy, astronomy surveys, cyberinfrastructure, data archives, Data Management, High performance computing, informatics, information sharing, Observatories, Operations, Parallelization, programming, Scientific computing, software engineering, software maintenance, software sustainability, Uncategorized, user communities | Tagged , , , , , , , , , , , , , | Leave a comment

Montage Mosaics The Pillars of Creation!

I posted this on the Montage blog  the other day. A number of readers suggested that because the images looked spectacular, I should repost on Astronomy Computing Today. 

The Pillars of Creation in the Eagle Nebula (M16) remain one of the iconic images of the Hubble Space Telescope. Three pillars rise from a molecular cloud into an enormous HII region, powered by the massive young cluster NGC 6611. Such pillars are common in regions of massive star formation, where they form as a result of ionization and stellar winds.

In a paper that will shortly be published in MNRAS,  entitled “The Pillars of Creation revisited with MUSE: gas kinematics and high-mass stellar feedback traced by optical spectroscopy,” McLeod et al (2015)  analyze of new data acquired with the Multi Unit Spectroscopy Explorer (MUSE) instrument on the VLT.  They used Montage to create integrated line maps of the single pointings obtained at the telescope. The figure below shows an example of these maps:2015-04-30_15-35-49

The authors confirmed  the pillar tips are being ionized and photo-evaporated by the massive members of NGC 6611.  They found a new  bipolar outflow at the tip of the middle pillar and proposed that an embedded protostar is driving it. With the physical parameters and ionic abundances derived from the spectroscopic study, they estimated a mass loss rate due to photo-evaporation of 70 M⊙/Myr, which implies that these structures can expect to have a lifetime of 3 Myr.

Posted in astronomy, astronomy images, Astronomy software, Image mosaic, Image processing, Images, Integral Field Spectrographs, software, star formation | Tagged , , , , , , , | Leave a comment | Edit
Posted in Uncategorized | Leave a comment

Interactive Visualization of Billion-Point Data Sets: Nanocubes

One of the most interesting talks I heard at the “Tools for Astronomical Big Data” Meeting I attended in March was by Carlos Scheidegger (University of Arizona) on How do you look at a billion data points? Exploratory Visualization for Big Data.”  It was my first detailed introduction to creating and querying nano cubes, a data structure that enables interactive visualizations of data sources with billions of elements. The idea behind them is that they are constructed to fit in a modern laptop’s main memory, even for hundreds of millions of data entries.  In this way large and complex data sets can be visualized interactively, without the latency involved in reading from disk and without the need for high-performance computers for performing any processing.

Scheidegger and his team of Lauro Lins, Horace Ip and James Kloswoski have aweb page, nanocubes.net, which explains their project. Their Open Source code is available on GitHub at https://github.com/laurolins/nanocube. If you want a detailed technical description, read their paper Lauro Lins, James T. Klosowski, and Carlos Scheidegger. “Nanocubes for Real-Time Exploration of Spatiotemporal Datasets. Visualization and Computer Graphics, IEEE Transactions on 19, no. 12 (2013): 2456-2465″  (free PDF). I found that their article “Into Main Memory: Nanocubes for Interactively Visualizing Billion-Point Data Sets” provides an excellent introduction to the topic.

There is also a youTube video explaining the principles and showing some fascinating visualizations:

So how does it all work?  Compared with a data cube that stores the original data records, a nanocube has a reduced memory footprint because it stores aggregate counts rather than the original records (e.g. total number of tweets on Chicago per hour and so on), and uses a tree structure made up of cells for only those subsets that exist in the data set. But there is more to it than that. Scheidegger takes up the story. ” …  we have merged a dwarf cube’s reduced memory footprint and fast querying with the ability to handle spatial and temporal dimensions. A nanocube retains the hierarchical tree structure of a dwarf cube but introduces a layering concept that includes three types of dimensions: spatial, categorical, and temporal, always traversed in that order. Each dimension has a start node and is subdivided into levels, the number of which depends on the data set. A query is a path through this tree.

The first dimension to be traversed is the spatial one, which in a nanocube can have up to 25 levels. The start point is the entire-world level followed by successively more granular levels all the way down to city blocks. (This is similar to Google Maps 17 spatial layers, with the highest resolution level translating, depending on the imagery, to about 10-30 meters.)

The nanocube uses the quadtree algorithm, a well-known spatial indexing technique in visualization to enable users to smoothly zoom in or out from one location to another. A quadtree divides the world into four quadrants, and each quadrant has four children, each of them likewise having four children also. In this way, the world is successively divided up into finer tiles, allowing users to visualize spatial data at arbitrary zoom levels.”

Here are some examples; see the web page for some interactive demos (from: Nanocubes for Real-Time Exploration of Spatiotemporal Datasets, Lauro Lins, James T. Klosowski, and Carlos Scheidegger;IEEE Transactions on 19, no. 12 (2013): 2456-2465;  http://nanocubes.net/assets/pdf/nanocubes_paper.pdf)

2015-04-23_16-40-19

Posted in astroinformatics, computer videos, Computing, cyberinfrastructure, Data mining, informatics, information sharing, time series data, visualization | Tagged , , , , , , , , | Leave a comment

Keynote From The Amazon Web Services Summit January 2015

If you are an Amazon Web Services user (aka AWS or Amazon Cloud or Amazon EC2) or are thinking about giving these services a test drive, you may be interested in this video of the Keynote address by Andy Jassy (Senior Vice President, Amazon Web Services) at the AWS Summit in  January 2015.

While the video may convey a sense of corporate cheerleading, I think it is worth a look because it provides an introduction to AWS’s new services, including the following (and I take the text verbatim from the material at the Summit):

Amazon Elastic File System
Amazon Elastic File System (Amazon EFS) is a new, fully-managed service that makes it easy to set up and scale shared file storage in the AWS Cloud. Amazon EFS supports NFSv4, and is designed to be highly available and durable. Amazon EFS can support thousands of concurrent EC2 client connections, making it ideal for use cases that require on-demand scaling of file system capacity and performance.

Amazon Machine Learning 
Amazon Machine Learning makes it easy for developers of all skill levels to use machine learning technology. Amazon Machine Learning is based on the same proven machine learning (ML) technology used by Amazon’s internal data scientists. The service guides you through the process of creating ML models and then generates predictions for your application without having to manage any infrastructure.

Amazon EC2 Container Service 
Amazon EC2 Container Service (Amazon ECS) is now generally available to all AWS customers. Amazon ECS is a highly scalable, high performance container management service that supports Docker containers and allows you to easily run applications on a managed cluster of Amazon EC2 instances. Amazon ECS eliminates the need for you to install, operate, and scale your own cluster management infrastructure. You can launch and stop container-enabled applications, query the complete state of your cluster, and access many familiar features like security groups, Elastic Load Balancing, EBS volumes and IAM roles.

Build Mobile Backends with AWS Lambda 
AWS Lambda is now generally available for production use and is introducing new features that make it even easier to build mobile, tablet, and IoT backends that scale automatically without provisioning or managing infrastructure. AWS Lambda now supports both real-time (synchronous) and asynchronous events, and has added additional features that make it even easier to configure and manage event sources.

New AWS Marketplace for Desktop Apps 
AWS Marketplace for Desktop Apps is a new category on the AWS Marketplace that makes it easy to search for and buy desktop applications for Amazon WorkSpaces. Amazon WorkSpaces Application Manager (Amazon WAM) makes it easy to manage and deploy these applications on-demand and at scale, or upload, deploy, and manage your own applications.

New Features for Mobile Developers 
With the general availability of AWS Lambda, and its integration with Amazon SNS and Amazon Cognito, mobile developers can build apps easily. Run custom code in response to events on mobile devices without worrying about compute, storage, load balancing, or performance. You can also trigger Lambda Functions using Amazon SNS and Amazon Cognito – enabling you to dynamically and automatically improve app experiences.

Many thanks to David Imel for pointing out the video to me.

Posted in astroinformatics, Cloud computing, computer videos, Computing, cyberinfrastructure | Tagged , , , , , | Leave a comment

Get Credit For Your Code: The Growth and Impact of the Astrophysics Source Code Library (Annual Report 2014)

The Astrophysics Source Code Library (ASCL) recently published its 2014 report, and you can find it on-line at http://ascl.net/wordpress/?page_id=1337, along with reports from the previous two years. The 2014 report highlights the growth and impact of the ASCL on the astronomy community. This is a considerable achievement given that ASCL is staffed by volunteers.  The day to day work is carried out by the team of Alice Allen (Calverton, MD) and Kimberly DuPrie (formerly in Tokyo, Japan and now at Space Telescope Science Institute in Baltimore, MD), who are Editor and Associate Editor, respectively. Judy Schmidt (New York, NY) donates development and design work for the ASCL.  This post is excerpted or adapted from the 2014 report, prepared by Alice Allen.

Overview

Founded in 1999 by Robert Nemiroff (Michigan Technological University) and John Wallin (Middle Tennessee State University), is a free online registry for source codes of interest to astronomers and astrophysicists. ASCL source codes have been used to generate results published in or submitted to a refereed journal; the home page (http://ascl.net) offers information about the resource and access to the code entries.

The ASCL is indexed by the SAO/NASA Astrophysics Data System (ADS) and is citable by using the unique ascl ID assigned to each code. The ascl ID can be used to link to the code entry by prefacing the number with ascl.net (i.e.ascl.net/1201.001). If you have a code that you wish to see indexed, get on touch with ASCL, have your code cited and receive credit for your work. The ASCL is active in the community in supporting citation and credit for publication of code. Working with publishers to develop code citation methods is a major goal for 2015. If you develop astronomy software, I encourage you to get in touch with ASCL and see how you can support this goal.

2015-03-27_15-52-59

2014 Growth

The number of codes indexed in ASCL grew an average of 19 per month, up from the average growth of 16.7 codes per month over the 2011-2013 timeframe and the 15 codes per month average for 2013. 228 codes were added in 2014. Use of the site in 2014 increased by 12% over 2013; the site was viewed 104,894 times in 2013 and 117,923 times in 2014. The percentage increase is much lower than the 45% growth from 2012-2013; we cannot account for the lower growth.

There were no changes in the membership of the Advisory Committee; one member, Robert Hanisch, changed his affiliation. The current members are:

Peter Teuben, University of Maryland, Chair
Bruce Berriman, Infrared Processing and Analysis Center/Caltech Robert Hanisch, National Institute of Standards and Technology

Jessica Mink, Center for Astrophysics
Robert Nemiroff, Michigan Technological University Lior Shamir, Lawrence Technological University
Keith Shortridge, Australian Astronomical Observatory Mark Taylor, University of Bristol, UK
John Wallin, Middle Tennessee State University

Alice Allen (Calverton, MD) and Kimberly DuPrie (formerly in Tokyo, Japan and now at Space Telescope Science Institute in Baltimore, MD) are Editor and Associate Editor, respectively. Judy Schmidt (New York, NY) provides development and design work for the ASCL.

Funding

The ASCL remains unfunded. In January, the American Astronomical Society (AAS) provided support in the form of one-day registrations for the AAS meeting (Washington, DC) for non- AAS members speaking at the Special Session the ASCL and AAS Working Group on Astronomical Software (WGAS) organized on software issues. Later in the year, the editor’s participation in two events was supported by the organizations sponsoring those events, the Second Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE2) meeting in New Orleans in November, and .astronomy in Chicago in December. The ASCL and its editor are very grateful for the support.

2014 Highlights

January: Special session held at 223rd AAS in Washington Harbor (DC) Poster presented at 223rd AAS in Washington Harbor

ASCL blog published three posts about the session (announcement/report/tweets) Astronomy Computing Today reposts report on special session
AAS reposts slightly shorter version of special session report

March: The ASCL registers its 800th code
Teuben and Allen meet with Kelle Cruz, Matthew Turk, David Hogg at AMNH in NYC to coordinate winter AAS activities and decide that Hogg and Allen will propose a special session on software licensing and Turk will be a presenter Hanisch and Allen attended National Academy of Sciences Strategies for Economic Sustainability of Publicly Funded Data Repositories: Asking the Right Questions
ASCL blog features guest post by Jonathan Petters on creating and evaluating data management plans

April: Frossie Economou, the new AAS Working Group on Astronomical Software (WGAS) chair, requests the ASCL create a Special Interest Group (SIG) on software publishing

May: Schmidt, Teuben, and Allen meet chiefly with Chris Erdmann and Alberto Accomazzi, also with Lars Holms Nielsen (via Skype), Gus Muench, and Jessica Mink in Boston to discuss possible collaboration, ASCL infrastructure under development, and projects to work on, with Schmidt taking on getting monthly data to ADS, Teuben working on getting monthly stats from ADS, and Allen generating files for uploading associated papers links to ADS.

June: Sidebar for Special Edition on Computing in Astronomy (IEEE) accepted, authored by Allen, Shamir, and Teuben

Frossie Economou and Allen meet to discuss common goals and activities

July:

August: September: October:

November: December:

New infrastructure/website goes into production
Looking before leaping: Creating a software registry paper by Allen and Schmidt submitted to arXiv for WSSSPE2 consideration
Proposal for Special Session on code licensing for 223rd AAS meeting is accepted Enhancements for the Astrophysics Source Code Library article appears on AAS site
“Mini article” (sidebar) about the ASCL for Computer’s Special Edition on Computing in Astronomy (IEEE) published
Poster on ASCL’s new infrastructure, titled Astrophysics Source Code Library Enhancements, presented by Hanisch at ADASS XXIV
ASCL mentioned in the SIE de Investigación y Enseñanza newsletter
Allen attends WSSSPE2
Preprint for ADASS XXIV poster Astrophysics Source Code Library Enhancements appears on arXiv
Nemiroff and Allen attend .astronomy6; Allen leads sessions on improving recognition for software authors and software citation
List of AAS225 software events published in ASCL news; notice about the list posted on AstroBetter, AAS news, and the Astrostatistics and Astroinformatics Portal (ASAIP)

Impact on the community

Some authors are citing codes explicitly and independent of a code paper using ASCL entries. At the end of the year, 116 ASCL entries collectively had 254 citations, meaning 12.1% of the 960 codes indexed in ADS at that time having citations, up from 7.5% in January.

Since implementation of the new infrastructure in mid-year, 65 codes have been submitted by their authors or representatives via the improved submissions page, more than all author submissions in previous years combined. Though a large number of these were submitted by one person on behalf of a large combined code package (Starlink), even the remaining 29 codes is a remarkable number of submissions for the ASCL, moreso considering this number was achieved in less than six months. We take this as an indication that software authors are beginning to recognize value in ASCL registration and having journals request registration is a successful strategy.

It became clear at the December .astronomy meeting that some community members look to the ASCL as a leader to resolve issues around software citation and greater recognition for code authors. The ASCL has taken this to heart and has stepped up efforts to engage publishers and improve software citation, work that will be a major focus for the ASCL in 2015.

2014 Plans Revisited

The following 2014 goals for the ASCL were achieved:

  • Write a style guide to better guide the editors and help standardize practiceso This exists in a Google doc and a suitable public version has been published on the ASCL
  • Organize a session for the AAS January 2015 meeting in coordination with others, particularly Kelle Cruz and David Hogg, who had mentioned possible topics of interest at

the 223rd AAS meeting, and Matt Turk, who has expertise that will be a valuable addition to topics we might cover

o The result was a session on software licensing at the AAS 225 meeting sponsored by the Data Science Environment at NYU, WGAS, and the ASCL

  • Work more closely with staff at ADS to determine better ways of working together and moving data from the ASCL to ADSo ASCL and ADS personnel met in Boston in May
    o A dynamic report created as a result of the meeting and follow-up conversationallows ADS to pick up completely formatted data on its own schedule
  • Improve the infrastructure in some visible way, perhaps by moving the ASCL forumaway from Asterisk and editing the display template for a more professional appearance o A completely new infrastructure was built by Judy Schmidt and implemented inJuly that integrated a MySQL database and new user interface with WordPress for content management and the discussion forumThe following 2014 goal was partially completed:

• Write an editorial policy for the ASCL to better communicate the ASCL’s goals to the

astrophysics community
o The editorial policy is written in nearly final form and exists in a Google doc; it

has not been published on the ASCL site as yet The following 2014 goal was not met:

• Build library to reach 1000 codes by the end of 2014

Previous Plans Still Outstanding

The following 2013 goal for the ASCL was not completed:
• Add two or three people to the Advisory Committee to get input from outside the

Anglosphere; suggested additions include a contact in Tokyo, the Netherlands, Germany, Italy, or France, and possibly a South American country

The following 2013 goal for the ASCL was partially completed:
• Find ways to discover and track impact the ASCL has on the astrophysics community

2015 Plans

  • Publish WSSSPE2 paper in Journal of Open Research Software
  • Create ASCL index in 2014 ADASS proceedings
  • Organize a session for AAS 227th meeting on software issues
  • Present ASCL enhancements at ADASS XXV
  • Work with publishers to improve software citation in journals
  • Close out old sites and employ redirects to the new site
  • Sustain reasonable growth in number of entries (190-210 additions)
  • Upgrade current site to add two new features
  • Complete outstanding plans from previous years

 

Press and bibliography

You’ve Written a Cool Astronomy Code! Now What Do You Do with It?, Alice Allen et al, American Astronomical Society, AAS Meeting #223, January 2014, #255.25, poster

Astrophysics Code Sharing II: The Sequel at AAS 223, Astronomy Computing Today, January 18, 2014
Astrophysics Code Sharing II: The Sequel, AAS news and newsletter, January 28, 2014
Ideas for Advancing Code Sharing (A Different Kind of Hack Day), Peter Teuben et al, Astronomical Data Analysis Software and Systems XXIII Proceedings, May 2014, arXiv pre- print | Blog post | Working documents

Astrophysics Source Code Library: Incite to Cite!, Kimberly DuPrie et al, Astronomical Data Analysis Software and Systems XXIII Proceedings, May 2014, arXiv pre-print | poster
Looking before leaping: Creating a software registry, Alice Allen, Judy Schmidt, WSSSPE2, 21 Jul 2014, arXiv preprint

The Astrophysics Source Code Library: Where do we go from here?, Alice Allen et al, Astronomical Data Analysis Software and Systems XXIII Proceedings, May 2014 Enhancements for the Astrophysics Source Code Library, AAS news and newsletter, August 18, 2014

Enhancements to the Astronomy Source Code Library (ASCL), Astronomy Computing Today, September 11, 2014
Astrophysics Source Code Library Enhancements, Robert J. Hanisch et al, Astronomical Data Analysis Software and Systems XXIII October 2013, arXiv preprint | poster

Software Events at the 225th AAS Meeting in Seattle, AAS news and newsletter, December 22, 2014
Join the Software Publication Special Interest Group (SPSIG), AAS news and newsletter, December 23, 2014

Disclosure:  I am a member of the ASCL advisory board.

Posted in astroinformatics, Astronomy, BSD, Computing, GPL, informatics, information sharing, Licenses, Open Access, programming, Scientific computing, social media, social networking, Software citation, software engineering, software maintenance, software sustainability, user communities | Tagged , , , , , , , , , , | Leave a comment

Why developing research software is like a startup

This is the title of a keynote presentation by Neil Chue Hong (Director of the UK’s Software Sustainability Institute) at the International Symposium on Grids and Clouds 2015 in Taipei, Taiwan (March 15-20, 2015). Neil emphasizes the crucial point that startups and software research have many survival rules in common, but one of these  surpasses others in importance, and that is: understand your users. He describes in detail four stages making software sustainable by understanding users, and gives real world case studies for each:

  • Discovery – is the problems you are solving of interest to others?
  • Validation – have you implemented core features others want? This is the phase where you start giving demonstrations and presentation.
  • Efficiency – can you support new users by refining your processes? This is where user support processes are put in place.
  • Scale – are you ready to drive the growth in users, and reengineer code as needed?

[See also the transcript of the talk at at http://www.slideshare.net/npch/why-developing-research-software-is-like-a-startup-and-why-this-matters]

A related issue is that scientific  research is all about reaching researchers. Software needs a user base to support funding proposals and to encourage contributions. While the above two statements may be obvious, they impose conditions on software providers to make this happen. Neil calls these conditions the four Golden Rules:

  • Act amiably
  • Breakdown barriers
  • Cultivate Contributions
  • Dispora and Dispersion.

 

Posted in astroinformatics, Computing, cyberinfrastructure, High performance computing, informatics, information sharing, Open Source, programming, publishing, Scientific computing, social networking, software engineering, software maintenance, software sustainability, user communities | Tagged , , , , , , , , , , , | Leave a comment