Tools For Astronomical Big Data

This week, I attended the Tools for Astronomical Big Data Workshop, held in Tucson from March 9 -11 and sponsored by NOAO. Over 100 practitioners gathered to “survey the present state of the art in addressing the challenges of conducting research with large astronomical data sets.” Topics for discussion included:

  1. The efficient development of, and interaction with, large databases.
  2. A survey of efficient algorithms for processing large datasets.
  3. The visualization of large datasets.
  4. The use of “machine learning” methodologies.

In lieu of a formal set of proceedings, all the invited and contributed presentations have been posted on the conference website at

I will give an overview of some of the talks  here, primarily to give the flavor of the meeting. I recommend exploring the many excellent presentations to get the full sense of the topics discussed, and I plan to describe some of these presentations in more detail in future posts.

Alyssa Goodman described the value of combining data from different data sets, and discussed the growing number of tools that enable such data aggregation. While Alyssa emphasized visualization of “wide data,” Carlos Scheidegger spoke about the power of nanocubes in visualizing and exploring data sets with as many as 1 billion points.

Gautham Narayan spoke on “Hunting the Rarest of the Rare: From PS1 to LSST.” He emphasized that the problem of managing transients that LSST is expected to generate is more a problem of rate than scale. He described a prototype system based on the Pan-STARRS 1 (PS1) Medium Deep Survey (MDS) images from 2010-4, which discovered over 5000 supernovae.

Sarah Loebman spoke about MyMergerTree: A Cloud Service For Creating and Analyzing Galactic Merger Trees.” Her talk described what she expected to be difficult and what was difficult. Physical tuning turned out to be a difficult task, but visualization was not, because tools such as D3 Javascript for Visualization were available.

Ben Williams talked about using the Amazon Cloud to reduce the Panchromatic Hubble Andromeda Treasury (PHAT)  6-band UV-IR photometry survey of over 100 million stars.  He described how he overcame problems with  security, file sharing, network communication, error trapping and more.

Yusra AlSayyad described how her team used the high throughput computing power of XSEDE  to classify high-redshify quasars in the repeated imaging of SDSS Stripe 82 imaging and generated 16 billion photometry measurements for 40 million objects. The project consumed  200,000 hrs of processing time and  used 120 TB of disk storage.

Sample slides:

Top to bottom:

                                   Yusra AlSayyad – Pipeline for Processing SDSS Stripe 82 data



                                    Sarah Loebman – Schematic of how to user merger trees


                        Ben Williams – Processing organization for PHAT on the Amazon Cloud 


This entry was posted in astroinformatics, Astronomy, astronomy surveys, Catalogs, Cloud computing, cyberinfrastructure, Data Management, Data mining, databases, DBMS, Grid Computing, High performance computing, image mosaics, informatics, information sharing, Montage, Observatories, Parallelization, programming, Scientific computing, SDSS, social networking, software engineering, software maintenance, software sustainability, Uncategorized, user communities, visualization, XSEDE and tagged , , , , , , , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s