Guest Post: A Request for More Transparent Archives

This week, we have a Guest Post from Amit Kapadia. He is a software developer for the Zooniverse based at the Adler Planetarium in Chicago. His interests are in developing research grade web applications and software. Recently he has started developing JavaScript astronomical modules to facilitates web based analysis tools.

One task at the Zooniverse is to build a collection of science grade tools for our volunteers.  Each volunteer gravitates towards the Zooniverse for a different reason.  We have the folks who are mildly curious, we have those who have classified nearly every piece of data for a given project, and we have those in between.  These tools are being built for the volunteers who explore more, those who go straight to the source – the science archives.

Imagine Topcat or Aladin working on the web – an environment where quick analysis is conducted.  Part of the process is utilizing the vast information from archives scattered around the world.  This is not an easy task, astronomical data is complex.  These archives make a best effort to open their data to the community.  Services like NED and SIMBAD aggregate data from various missions and export results in various formats.  Beautiful.

These services are integral to building useful software.  We take archives for granted, but the idea behind them is great – a centralized place to access data.  They are meant to be large and robust, they are meant to provide data when requested.  They are built as a community resource so that we don’t all carry the entire Sloan Digital Sky Survey in our back pocket.  But, like technology goes, the archives are dated.  The services they provide are dated.

The web has changed dramatically in the last five years.  An arms race broke out between browser venders to build faster JavaScript engines.  Now we have great computational power directly in the browser.  With all this power it’s time to build astronomical web applications – not just web sites, but full-blown applications.  The idea sounds great, and we even have some pretty cool tools being built at Zooniverse HQ, but bottlenecks exist.

Web applications require direct access to external data.  We want to forget about the notion of having a server hosting the application, and run all functionality from the browser.  The application needs to access services like NED, CAS and SIMBAD, it needs to make requests across various domains.  Unfortunately, this is not currently possible without a server to proxy requests.  One quick (really quick) solution for the database developers is to provide a new export option – JSONP.  This would permit web applications to successfully receive data without a proxy.  Another solution is to enable cross-origin resource sharing.  A web application would register with the archives, and the archives would permit this web application to make cross-domain requests.

We need these functionalities from the archives.  With these capabilities we’ll be able to build sophisticated applications that aggregate from large data.  This would be a step forward in generating quick correlations, conducting quick analyses, and allowing astronomers to be more efficient in their research.

 

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

4 Responses to Guest Post: A Request for More Transparent Archives

  1. Thanks for this interesting post, it triggered me to read up on JSONP/CORS!

    I can see how JSONP is well-suited for catalog data, but would you also use it for images? Do you see a way to access binary data through JSONP, without having to text-encode the data … or is CORS the only real solution for opening up access to images?

  2. Interesting post. Thanks for your thoughts. Along the lines of enabling better interactivity and data mining using JSONP, I would advocate that all archive should expose their services as fully documented APIs. This will allow the astronomy ecosystem to experience “competition” and innovation. Software developers would be able to build new, interactive interfaces that are de-facto archive aware. New pipelines would be able to be produced by taking into account in real time about what we already “know” and have in archive. Too often the biggest single barrier to archive data mining is the lack of a proper programmatic interface.

    • astrocompute says:

      Alberto, thanks for you comment. I’m glad you agree. The reason for this post is to make the archive developers aware that we need just a hint of more functionality. It would be helpful if you can advocate this at STScI.

      Geert, glad you found it interesting. I believe you are correct. To share binary data (e.g. images) across domains would require CORS. The alternative is just as you mention, text-encoded data. Encoding binary data as text requires base64 encoding which inflates the data by approximately 33% – not a good option if we are transferring larger files across the wire.

      Replies by Amit.

  3. astrocompute says:

    From Amit:
    Thomas: It’s not advised to allow all domains access to CDS services as it exposes the web site to CSRF attacks. Rather it would be great to register specific domains that are permitted access to your services. This would require tight communication between application developers and the CDS. Providing a JSONP response would be a great place to start. Then once an application is ready for production, we can request access via CORS.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s