This week, I will be posting about the technical sessions at the Astronomical Data Analysis Software and Systems XXIV meeting (ADASS, http://adass2014.org) in Calgary, Alberta. This annual meeting ” … is a forum for scientists, developers and programmers working in areas related to algorithms, software and systems for the acquisition, reduction, analysis, and dissemination of astronomical data.” This year’s themes are:
- Big Data Challenges
- Large surveys
- Web services and APIs
- How we code today / Data Analysis / Pipelines
- meta-data management
- (2D & 3D) visualisation for astronomy
- File formats for astronomical data
I will be blogging about some of the sessions (sorry, not able to do all of them), held from October 5th through October 9th.
Sunday afternoon, October 4, kicked off with Big Data Challenges. Patrick Ogle of IPAC spoke about “Rule Based Cross-Matching of Very Large Catalogues” in the NASA Extragalactic Database (NED). He described a new a rule-based algorithm called MatchEx, based on two decades of experience at NED. By design, MatchEx extends traditional position-based cross matching algorithms by combining available data with expert logic to determine which candidate match is the best. Furthermore, the local background density of sources is used to determine and minimize the false-positive match rate and to estimate match completeness. The logical outcome and statistical probability of each match decision is stored in the database, and is used inter alia to tune the algorithm and adjust match parameter thresholds.
The first production run cross-matched the GALEX All Sky Survey Catalog (GASC), containing nearly 40 million NUV-detected sources, with NED sources with a 7.5″ radius. The candidates were filtered on position-based matching probability, and on other criteria including object type and object name. The match completeness was 97.7% and the match accuracy of 99.8%. Over the next year, NED plans to cross-match nearly 1 billion new catalog sources with NED sources, including the 2MASS point-source catalog, All-WISE, SDSS DR 12, and the Spitzer Source List. At the same time, NED will add new capabilities to filter candidate matches based on source diameters, redshifts, etc, and will extend MatchEx to handle heterogenous datasets federated from smaller catalogs through NED’s literature pipeline.
Jose Hernandez talked about data management challenges on Gaia, which was launched December 2013. Gaia transmits 30 million observations daily, which are processed at ESAC and then different products are sent to 5 other data processing centers. Managing the processing and tracking the data represents a formidable challenge and has required new approaches to data management. Data processing will be iterative and will last more than 5 years, with each data processing center responsible for one aspect of the data reduction. Jose explained how Gaia manages changes in, and the inter-dependencies between, the different pipelines running at each center.
Arvind Gopu spoke about using QuickReduce for Reprocessing the ODI Pipeline – see my earlier post on the ODI pipeline here.
Jacek Becla talked about LSST is developing Qserve, a database system that will store 10+ PB of LSST catalogs, and enable complex analyses and data mining. As there are no suitable off-the-shelf database systems, LSST will build its own by taking advantage of existing tools (e.g. Apache Zookeeper) and by distributing the database to handle dedicated sky partitions. Jacek described the driving requirements, assumptions made, and design trade offs that were considered in developing the baseline architecture, and highlighted its features, status, and potential for usage in astronomy and in other disciplines. You can see the design document at http://ls.st/lpj.