Interactive Visualization of Billion-Point Data Sets: Nanocubes

One of the most interesting talks I heard at the “Tools for Astronomical Big Data” Meeting I attended in March was by Carlos Scheidegger (University of Arizona) on How do you look at a billion data points? Exploratory Visualization for Big Data.”  It was my first detailed introduction to creating and querying nano cubes, a data structure that enables interactive visualizations of data sources with billions of elements. The idea behind them is that they are constructed to fit in a modern laptop’s main memory, even for hundreds of millions of data entries.  In this way large and complex data sets can be visualized interactively, without the latency involved in reading from disk and without the need for high-performance computers for performing any processing.

Scheidegger and his team of Lauro Lins, Horace Ip and James Kloswoski have aweb page, nanocubes.net, which explains their project. Their Open Source code is available on GitHub at https://github.com/laurolins/nanocube. If you want a detailed technical description, read their paper Lauro Lins, James T. Klosowski, and Carlos Scheidegger. “Nanocubes for Real-Time Exploration of Spatiotemporal Datasets. Visualization and Computer Graphics, IEEE Transactions on 19, no. 12 (2013): 2456-2465″  (free PDF). I found that their article “Into Main Memory: Nanocubes for Interactively Visualizing Billion-Point Data Sets” provides an excellent introduction to the topic.

There is also a youTube video explaining the principles and showing some fascinating visualizations:

So how does it all work?  Compared with a data cube that stores the original data records, a nanocube has a reduced memory footprint because it stores aggregate counts rather than the original records (e.g. total number of tweets on Chicago per hour and so on), and uses a tree structure made up of cells for only those subsets that exist in the data set. But there is more to it than that. Scheidegger takes up the story. ” …  we have merged a dwarf cube’s reduced memory footprint and fast querying with the ability to handle spatial and temporal dimensions. A nanocube retains the hierarchical tree structure of a dwarf cube but introduces a layering concept that includes three types of dimensions: spatial, categorical, and temporal, always traversed in that order. Each dimension has a start node and is subdivided into levels, the number of which depends on the data set. A query is a path through this tree.

The first dimension to be traversed is the spatial one, which in a nanocube can have up to 25 levels. The start point is the entire-world level followed by successively more granular levels all the way down to city blocks. (This is similar to Google Maps 17 spatial layers, with the highest resolution level translating, depending on the imagery, to about 10-30 meters.)

The nanocube uses the quadtree algorithm, a well-known spatial indexing technique in visualization to enable users to smoothly zoom in or out from one location to another. A quadtree divides the world into four quadrants, and each quadrant has four children, each of them likewise having four children also. In this way, the world is successively divided up into finer tiles, allowing users to visualize spatial data at arbitrary zoom levels.”

Here are some examples; see the web page for some interactive demos (from: Nanocubes for Real-Time Exploration of Spatiotemporal Datasets, Lauro Lins, James T. Klosowski, and Carlos Scheidegger;IEEE Transactions on 19, no. 12 (2013): 2456-2465;  http://nanocubes.net/assets/pdf/nanocubes_paper.pdf)

2015-04-23_16-40-19

Posted in astroinformatics, computer videos, Computing, cyberinfrastructure, Data mining, informatics, information sharing, time series data, visualization | Tagged , , , , , , , , | Leave a comment

Keynote From The Amazon Web Services Summit January 2015

If you are an Amazon Web Services user (aka AWS or Amazon Cloud or Amazon EC2) or are thinking about giving these services a test drive, you may be interested in this video of the Keynote address by Andy Jassy (Senior Vice President, Amazon Web Services) at the AWS Summit in  January 2015.

While the video may convey a sense of corporate cheerleading, I think it is worth a look because it provides an introduction to AWS’s new services, including the following (and I take the text verbatim from the material at the Summit):

Amazon Elastic File System
Amazon Elastic File System (Amazon EFS) is a new, fully-managed service that makes it easy to set up and scale shared file storage in the AWS Cloud. Amazon EFS supports NFSv4, and is designed to be highly available and durable. Amazon EFS can support thousands of concurrent EC2 client connections, making it ideal for use cases that require on-demand scaling of file system capacity and performance.

Amazon Machine Learning 
Amazon Machine Learning makes it easy for developers of all skill levels to use machine learning technology. Amazon Machine Learning is based on the same proven machine learning (ML) technology used by Amazon’s internal data scientists. The service guides you through the process of creating ML models and then generates predictions for your application without having to manage any infrastructure.

Amazon EC2 Container Service 
Amazon EC2 Container Service (Amazon ECS) is now generally available to all AWS customers. Amazon ECS is a highly scalable, high performance container management service that supports Docker containers and allows you to easily run applications on a managed cluster of Amazon EC2 instances. Amazon ECS eliminates the need for you to install, operate, and scale your own cluster management infrastructure. You can launch and stop container-enabled applications, query the complete state of your cluster, and access many familiar features like security groups, Elastic Load Balancing, EBS volumes and IAM roles.

Build Mobile Backends with AWS Lambda 
AWS Lambda is now generally available for production use and is introducing new features that make it even easier to build mobile, tablet, and IoT backends that scale automatically without provisioning or managing infrastructure. AWS Lambda now supports both real-time (synchronous) and asynchronous events, and has added additional features that make it even easier to configure and manage event sources.

New AWS Marketplace for Desktop Apps 
AWS Marketplace for Desktop Apps is a new category on the AWS Marketplace that makes it easy to search for and buy desktop applications for Amazon WorkSpaces. Amazon WorkSpaces Application Manager (Amazon WAM) makes it easy to manage and deploy these applications on-demand and at scale, or upload, deploy, and manage your own applications.

New Features for Mobile Developers 
With the general availability of AWS Lambda, and its integration with Amazon SNS and Amazon Cognito, mobile developers can build apps easily. Run custom code in response to events on mobile devices without worrying about compute, storage, load balancing, or performance. You can also trigger Lambda Functions using Amazon SNS and Amazon Cognito – enabling you to dynamically and automatically improve app experiences.

Many thanks to David Imel for pointing out the video to me.

Posted in astroinformatics, Cloud computing, computer videos, Computing, cyberinfrastructure | Tagged , , , , , | Leave a comment

Get Credit For Your Code: The Growth and Impact of the Astrophysics Source Code Library (Annual Report 2014)

The Astrophysics Source Code Library (ASCL) recently published its 2014 report, and you can find it on-line at http://ascl.net/wordpress/?page_id=1337, along with reports from the previous two years. The 2014 report highlights the growth and impact of the ASCL on the astronomy community. This is a considerable achievement given that ASCL is staffed by volunteers.  The day to day work is carried out by the team of Alice Allen (Calverton, MD) and Kimberly DuPrie (formerly in Tokyo, Japan and now at Space Telescope Science Institute in Baltimore, MD), who are Editor and Associate Editor, respectively. Judy Schmidt (New York, NY) donates development and design work for the ASCL.  This post is excerpted or adapted from the 2014 report, prepared by Alice Allen.

Overview

Founded in 1999 by Robert Nemiroff (Michigan Technological University) and John Wallin (Middle Tennessee State University), is a free online registry for source codes of interest to astronomers and astrophysicists. ASCL source codes have been used to generate results published in or submitted to a refereed journal; the home page (http://ascl.net) offers information about the resource and access to the code entries.

The ASCL is indexed by the SAO/NASA Astrophysics Data System (ADS) and is citable by using the unique ascl ID assigned to each code. The ascl ID can be used to link to the code entry by prefacing the number with ascl.net (i.e.ascl.net/1201.001). If you have a code that you wish to see indexed, get on touch with ASCL, have your code cited and receive credit for your work. The ASCL is active in the community in supporting citation and credit for publication of code. Working with publishers to develop code citation methods is a major goal for 2015. If you develop astronomy software, I encourage you to get in touch with ASCL and see how you can support this goal.

2015-03-27_15-52-59

2014 Growth

The number of codes indexed in ASCL grew an average of 19 per month, up from the average growth of 16.7 codes per month over the 2011-2013 timeframe and the 15 codes per month average for 2013. 228 codes were added in 2014. Use of the site in 2014 increased by 12% over 2013; the site was viewed 104,894 times in 2013 and 117,923 times in 2014. The percentage increase is much lower than the 45% growth from 2012-2013; we cannot account for the lower growth.

There were no changes in the membership of the Advisory Committee; one member, Robert Hanisch, changed his affiliation. The current members are:

Peter Teuben, University of Maryland, Chair
Bruce Berriman, Infrared Processing and Analysis Center/Caltech Robert Hanisch, National Institute of Standards and Technology

Jessica Mink, Center for Astrophysics
Robert Nemiroff, Michigan Technological University Lior Shamir, Lawrence Technological University
Keith Shortridge, Australian Astronomical Observatory Mark Taylor, University of Bristol, UK
John Wallin, Middle Tennessee State University

Alice Allen (Calverton, MD) and Kimberly DuPrie (formerly in Tokyo, Japan and now at Space Telescope Science Institute in Baltimore, MD) are Editor and Associate Editor, respectively. Judy Schmidt (New York, NY) provides development and design work for the ASCL.

Funding

The ASCL remains unfunded. In January, the American Astronomical Society (AAS) provided support in the form of one-day registrations for the AAS meeting (Washington, DC) for non- AAS members speaking at the Special Session the ASCL and AAS Working Group on Astronomical Software (WGAS) organized on software issues. Later in the year, the editor’s participation in two events was supported by the organizations sponsoring those events, the Second Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE2) meeting in New Orleans in November, and .astronomy in Chicago in December. The ASCL and its editor are very grateful for the support.

2014 Highlights

January: Special session held at 223rd AAS in Washington Harbor (DC) Poster presented at 223rd AAS in Washington Harbor

ASCL blog published three posts about the session (announcement/report/tweets) Astronomy Computing Today reposts report on special session
AAS reposts slightly shorter version of special session report

March: The ASCL registers its 800th code
Teuben and Allen meet with Kelle Cruz, Matthew Turk, David Hogg at AMNH in NYC to coordinate winter AAS activities and decide that Hogg and Allen will propose a special session on software licensing and Turk will be a presenter Hanisch and Allen attended National Academy of Sciences Strategies for Economic Sustainability of Publicly Funded Data Repositories: Asking the Right Questions
ASCL blog features guest post by Jonathan Petters on creating and evaluating data management plans

April: Frossie Economou, the new AAS Working Group on Astronomical Software (WGAS) chair, requests the ASCL create a Special Interest Group (SIG) on software publishing

May: Schmidt, Teuben, and Allen meet chiefly with Chris Erdmann and Alberto Accomazzi, also with Lars Holms Nielsen (via Skype), Gus Muench, and Jessica Mink in Boston to discuss possible collaboration, ASCL infrastructure under development, and projects to work on, with Schmidt taking on getting monthly data to ADS, Teuben working on getting monthly stats from ADS, and Allen generating files for uploading associated papers links to ADS.

June: Sidebar for Special Edition on Computing in Astronomy (IEEE) accepted, authored by Allen, Shamir, and Teuben

Frossie Economou and Allen meet to discuss common goals and activities

July:

August: September: October:

November: December:

New infrastructure/website goes into production
Looking before leaping: Creating a software registry paper by Allen and Schmidt submitted to arXiv for WSSSPE2 consideration
Proposal for Special Session on code licensing for 223rd AAS meeting is accepted Enhancements for the Astrophysics Source Code Library article appears on AAS site
“Mini article” (sidebar) about the ASCL for Computer’s Special Edition on Computing in Astronomy (IEEE) published
Poster on ASCL’s new infrastructure, titled Astrophysics Source Code Library Enhancements, presented by Hanisch at ADASS XXIV
ASCL mentioned in the SIE de Investigación y Enseñanza newsletter
Allen attends WSSSPE2
Preprint for ADASS XXIV poster Astrophysics Source Code Library Enhancements appears on arXiv
Nemiroff and Allen attend .astronomy6; Allen leads sessions on improving recognition for software authors and software citation
List of AAS225 software events published in ASCL news; notice about the list posted on AstroBetter, AAS news, and the Astrostatistics and Astroinformatics Portal (ASAIP)

Impact on the community

Some authors are citing codes explicitly and independent of a code paper using ASCL entries. At the end of the year, 116 ASCL entries collectively had 254 citations, meaning 12.1% of the 960 codes indexed in ADS at that time having citations, up from 7.5% in January.

Since implementation of the new infrastructure in mid-year, 65 codes have been submitted by their authors or representatives via the improved submissions page, more than all author submissions in previous years combined. Though a large number of these were submitted by one person on behalf of a large combined code package (Starlink), even the remaining 29 codes is a remarkable number of submissions for the ASCL, moreso considering this number was achieved in less than six months. We take this as an indication that software authors are beginning to recognize value in ASCL registration and having journals request registration is a successful strategy.

It became clear at the December .astronomy meeting that some community members look to the ASCL as a leader to resolve issues around software citation and greater recognition for code authors. The ASCL has taken this to heart and has stepped up efforts to engage publishers and improve software citation, work that will be a major focus for the ASCL in 2015.

2014 Plans Revisited

The following 2014 goals for the ASCL were achieved:

  • Write a style guide to better guide the editors and help standardize practiceso This exists in a Google doc and a suitable public version has been published on the ASCL
  • Organize a session for the AAS January 2015 meeting in coordination with others, particularly Kelle Cruz and David Hogg, who had mentioned possible topics of interest at

the 223rd AAS meeting, and Matt Turk, who has expertise that will be a valuable addition to topics we might cover

o The result was a session on software licensing at the AAS 225 meeting sponsored by the Data Science Environment at NYU, WGAS, and the ASCL

  • Work more closely with staff at ADS to determine better ways of working together and moving data from the ASCL to ADSo ASCL and ADS personnel met in Boston in May
    o A dynamic report created as a result of the meeting and follow-up conversationallows ADS to pick up completely formatted data on its own schedule
  • Improve the infrastructure in some visible way, perhaps by moving the ASCL forumaway from Asterisk and editing the display template for a more professional appearance o A completely new infrastructure was built by Judy Schmidt and implemented inJuly that integrated a MySQL database and new user interface with WordPress for content management and the discussion forumThe following 2014 goal was partially completed:

• Write an editorial policy for the ASCL to better communicate the ASCL’s goals to the

astrophysics community
o The editorial policy is written in nearly final form and exists in a Google doc; it

has not been published on the ASCL site as yet The following 2014 goal was not met:

• Build library to reach 1000 codes by the end of 2014

Previous Plans Still Outstanding

The following 2013 goal for the ASCL was not completed:
• Add two or three people to the Advisory Committee to get input from outside the

Anglosphere; suggested additions include a contact in Tokyo, the Netherlands, Germany, Italy, or France, and possibly a South American country

The following 2013 goal for the ASCL was partially completed:
• Find ways to discover and track impact the ASCL has on the astrophysics community

2015 Plans

  • Publish WSSSPE2 paper in Journal of Open Research Software
  • Create ASCL index in 2014 ADASS proceedings
  • Organize a session for AAS 227th meeting on software issues
  • Present ASCL enhancements at ADASS XXV
  • Work with publishers to improve software citation in journals
  • Close out old sites and employ redirects to the new site
  • Sustain reasonable growth in number of entries (190-210 additions)
  • Upgrade current site to add two new features
  • Complete outstanding plans from previous years

 

Press and bibliography

You’ve Written a Cool Astronomy Code! Now What Do You Do with It?, Alice Allen et al, American Astronomical Society, AAS Meeting #223, January 2014, #255.25, poster

Astrophysics Code Sharing II: The Sequel at AAS 223, Astronomy Computing Today, January 18, 2014
Astrophysics Code Sharing II: The Sequel, AAS news and newsletter, January 28, 2014
Ideas for Advancing Code Sharing (A Different Kind of Hack Day), Peter Teuben et al, Astronomical Data Analysis Software and Systems XXIII Proceedings, May 2014, arXiv pre- print | Blog post | Working documents

Astrophysics Source Code Library: Incite to Cite!, Kimberly DuPrie et al, Astronomical Data Analysis Software and Systems XXIII Proceedings, May 2014, arXiv pre-print | poster
Looking before leaping: Creating a software registry, Alice Allen, Judy Schmidt, WSSSPE2, 21 Jul 2014, arXiv preprint

The Astrophysics Source Code Library: Where do we go from here?, Alice Allen et al, Astronomical Data Analysis Software and Systems XXIII Proceedings, May 2014 Enhancements for the Astrophysics Source Code Library, AAS news and newsletter, August 18, 2014

Enhancements to the Astronomy Source Code Library (ASCL), Astronomy Computing Today, September 11, 2014
Astrophysics Source Code Library Enhancements, Robert J. Hanisch et al, Astronomical Data Analysis Software and Systems XXIII October 2013, arXiv preprint | poster

Software Events at the 225th AAS Meeting in Seattle, AAS news and newsletter, December 22, 2014
Join the Software Publication Special Interest Group (SPSIG), AAS news and newsletter, December 23, 2014

Disclosure:  I am a member of the ASCL advisory board.

Posted in astroinformatics, Astronomy, BSD, Computing, GPL, informatics, information sharing, Licenses, Open Access, programming, Scientific computing, social media, social networking, Software citation, software engineering, software maintenance, software sustainability, user communities | Tagged , , , , , , , , , , | Leave a comment

Why developing research software is like a startup

This is the title of a keynote presentation by Neil Chue Hong (Director of the UK’s Software Sustainability Institute) at the International Symposium on Grids and Clouds 2015 in Taipei, Taiwan (March 15-20, 2015). Neil emphasizes the crucial point that startups and software research have many survival rules in common, but one of these  surpasses others in importance, and that is: understand your users. He describes in detail four stages making software sustainable by understanding users, and gives real world case studies for each:

  • Discovery – is the problems you are solving of interest to others?
  • Validation – have you implemented core features others want? This is the phase where you start giving demonstrations and presentation.
  • Efficiency – can you support new users by refining your processes? This is where user support processes are put in place.
  • Scale – are you ready to drive the growth in users, and reengineer code as needed?

[See also the transcript of the talk at at http://www.slideshare.net/npch/why-developing-research-software-is-like-a-startup-and-why-this-matters]

A related issue is that scientific  research is all about reaching researchers. Software needs a user base to support funding proposals and to encourage contributions. While the above two statements may be obvious, they impose conditions on software providers to make this happen. Neil calls these conditions the four Golden Rules:

  • Act amiably
  • Breakdown barriers
  • Cultivate Contributions
  • Dispora and Dispersion.

 

Posted in astroinformatics, Computing, cyberinfrastructure, High performance computing, informatics, information sharing, Open Source, programming, publishing, Scientific computing, social networking, software engineering, software maintenance, software sustainability, user communities | Tagged , , , , , , , , , , , | Leave a comment

Tools For Astronomical Big Data

This week, I attended the Tools for Astronomical Big Data Workshop, held in Tucson from March 9 -11 and sponsored by NOAO. Over 100 practitioners gathered to “survey the present state of the art in addressing the challenges of conducting research with large astronomical data sets.” Topics for discussion included:

  1. The efficient development of, and interaction with, large databases.
  2. A survey of efficient algorithms for processing large datasets.
  3. The visualization of large datasets.
  4. The use of “machine learning” methodologies.

In lieu of a formal set of proceedings, all the invited and contributed presentations have been posted on the conference website at  http://www.noao.edu/meetings/bigdata/schedule.php.

I will give an overview of some of the talks  here, primarily to give the flavor of the meeting. I recommend exploring the many excellent presentations to get the full sense of the topics discussed, and I plan to describe some of these presentations in more detail in future posts.

Alyssa Goodman described the value of combining data from different data sets, and discussed the growing number of tools that enable such data aggregation. While Alyssa emphasized visualization of “wide data,” Carlos Scheidegger spoke about the power of nanocubes in visualizing and exploring data sets with as many as 1 billion points.

Gautham Narayan spoke on “Hunting the Rarest of the Rare: From PS1 to LSST.” He emphasized that the problem of managing transients that LSST is expected to generate is more a problem of rate than scale. He described a prototype system based on the Pan-STARRS 1 (PS1) Medium Deep Survey (MDS) images from 2010-4, which discovered over 5000 supernovae.

Sarah Loebman spoke about MyMergerTree: A Cloud Service For Creating and Analyzing Galactic Merger Trees.” Her talk described what she expected to be difficult and what was difficult. Physical tuning turned out to be a difficult task, but visualization was not, because tools such as D3 Javascript for Visualization were available.

Ben Williams talked about using the Amazon Cloud to reduce the Panchromatic Hubble Andromeda Treasury (PHAT)  6-band UV-IR photometry survey of over 100 million stars.  He described how he overcame problems with  security, file sharing, network communication, error trapping and more.

Yusra AlSayyad described how her team used the high throughput computing power of XSEDE  to classify high-redshify quasars in the repeated imaging of SDSS Stripe 82 imaging and generated 16 billion photometry measurements for 40 million objects. The project consumed  200,000 hrs of processing time and  used 120 TB of disk storage.

Sample slides:

Top to bottom:

                                   Yusra AlSayyad – Pipeline for Processing SDSS Stripe 82 data

2015-03-12_17-07-10

 

                                    Sarah Loebman – Schematic of how to user merger trees

2015-03-12_17-08-07

                        Ben Williams – Processing organization for PHAT on the Amazon Cloud 

2015-03-12_17-10-21

Posted in astroinformatics, Astronomy, astronomy surveys, Catalogs, Cloud computing, cyberinfrastructure, Data Management, Data mining, databases, DBMS, Grid Computing, High performance computing, image mosaics, informatics, information sharing, Montage, Observatories, Parallelization, programming, Scientific computing, SDSS, social networking, software engineering, software maintenance, software sustainability, Uncategorized, user communities, visualization, XSEDE | Tagged , , , , , , , , , , , , , , , , , , | Leave a comment

A new blog on computational and software matters

My colleague Dan Katz has started a new blog, called Daniel S. Katz’s Blog and I wanted to take advantage of Astronomy Computing Today to recommend it to you. Dan is one of the team that built the Montage software. He is now the lead at NSF for the Software Infrastructure for Sustained Innovation, SIprogram (see http://nsf.gov/si2), which funds around 100 projects, and he is one of the founders of the annual Working Towards Sustainable Software for Science Practice and Experiences workshops.

Dan’s blog concerns itself with issues related to software sustainability, and he has so far written posts on topics such as

  • Open Source Software and University Intellectual Property Policies
  • Catalogs and Indices for Finding (Scientific) Software
  • Project CRediT and Contributorship Roles
  • Thinking about Open Peer Reviews

If you are interested in these topics, I recommend taking a look. I look forward to more interesting posts from Dan.

Posted in astroinformatics, Computing, cyberinfrastructure, High performance computing, informatics, information sharing, Open Access, Open Source, programming, publishing, Scientific computing, social media, social networking, software engineering, software maintenance, software sustainability, user communities | Tagged , , , , , , , , , , | Leave a comment

The Kid Is Back: The Next Generation of the Montage Image Mosaic Engine

I am delighted to say that we have received funding from the National Science Foundation (NSF) to deliver the next generation of the Montage Image Mosaic Engine. This new effort responds to the dramatic evolution in the computational landscape astronomy in the past few years.  We will deliver, over the next two years:

  • Support for data cubes.
  • Support for two sky partitioning schemes, the Hierarchical Equal Area isoLatitude Pixelization (HEALPix), standard in cosmic background experiments; and the Tessellated Octahedral Adaptive Subdivision Transform (TOAST), used in immersive platforms such as the World Wide Telescope.
  • A set of turnkey tools and associated tutorial  that will enable astronomers who are not expert in distributed platforms and technologies to launch and manage processing at scale.
  • A library that will allow Montage to be run directly from languages such as Python.

Montage has recently been relicensed, and is now available under a BSD 3-clause license. We will be making the code available on GitHub.  We will also overhaul the web page and revive the Montage blog.

The project staff are: Bruce Berriman (PI), John Good (Architect), Marcy Harbut (Documentation), Tom Robitaille and Ewa Deelman (collaborators). We are guided by a Users’ Panel consisting of Adam Ginsburg, August Muench and Suzanne Jacoby.

Just to whet your appetite, we show  a short video that displays the structure of a molecular disk wind in HD 163296, measured by ALMA (PI: M. Rawlings). The video shows a re-projection by Montage of a data cube of the star that covers multiple velocities relative to the center of the CO J=3-2 line.

And here is a poster that describes some of the features we will be delivering, presented at the 2015 NSF SI2 PI Workshop, February 15 and 16 2015 in Arlington, VA.

Montage-SI2-PI-Meeting-2015-Feb-11-fixed

PDF version:

Montage-SI2-PI-Meeting-2015-Feb-11-fixed

 

 

Posted in Uncategorized | Leave a comment