Why Don’t You Use The Cloud?

One of the topics that came up at coffee at the recent SPIE was why astronomers are not using the cloud very much. I decided to ask some of my colleagues this very question.  Among the answers:

“What’s the cloud?”

“I can do all of my work at the desktop, so why bother with it?”

“It’s just a fad”

“Probably need to know a lot of IT to use. Don’t speak the language, and have no time to learn.”

“Is it reliable?”

” Don’t know what to use it for”

The responses alarmed me. Fair enough, a lot of science computing can be done on desktops – I do quite a bit of my own computing on my desktop too. But astronomers are missing out on a powerful tool.

I have been using the Amazon EC2  cloud a lot in the past year,in collaboration with my colleagues Ewa Deelman, Gideon Juve and Mats Rynge at ISI,  and I can put a lot of misconceptions  to rest.  It is easy to get started. Just create an account the way you would for an any on-line service, and pick the processors you want to use, launch a virtual machine (the same type of VM you use to, say, run Windows from your Mac). And you are ready to use Amazon EC2 as if it were running on your desktop. In all the time I have been using it, it has proven more reliable than machines at my office.

It’s definitely not a fad, as usage has grown and the cloud is now part of the computing mainstream. The chart below shows the growth in usage of Amazon Ec2:

Growth in usage of Amazonn EC2 (from http://blog.rightscale.com)

The question of what to use it for is more interesting.  There is one simple answer. If you are doing processing that bogs down your processing for any length of time, it is a candidate for running on the cloud. Some examples from my own work:  we generated periodograms for over 200,000 light curves (variation of intensity with time) released by the Kepler project. It took an hour and cost $11. We did not want to process them on our local cluster because it is in heavy day-to-day operations use already.  We are generating multi-wavelength galactic plane mosaics to use in studies of the star formation history of our Galaxy.   The processing would put excessive load our local server.

A more difficult issue is assessing cost, as Amazon is pay-as-you-go. Users should really do a cost-benefit study to determine whether they are prepared to pay for the costs.  Such as study need not be hard and can use results from other cost-benefit studies – see my earlier posts. And it can be cheap – raw processing costs very little, for instance, but transferring data can be expensive.

The above discussion highlights a real concern of mine, that astronomers are missing out on powerful tools for doing science.  In this age of data-intensive science, astronomers armed with knowledge of these technologies have a real competitive advantage. So what to do about it? In my view:

– Introduce IT into Ph D science curricula. The IT knowledge is as important as knowing your field.

– Have and on-line journal dedicated to science computing.

– Publish web-pages telling astronomers how to get the most out of the cloud.  A discussion forum would be useful too.

– Hands-on How To sessions at meetings and conferences.

Advertisements
This entry was posted in Astronomy, Cloud computing, High performance computing and tagged , , , . Bookmark the permalink.

5 Responses to Why Don’t You Use The Cloud?

  1. archimedesbooks says:

    I disagree that for the working (or future) astronomer “The IT knowledge is as important as knowing your field.”

    Early in the days of personal computers I told a friend of mine “You need to learn how to talk to computers.” He replied, “No, computers need to learn how to talk to me.”

    In my opinion, it’s not the job of astronomers (or grad students) to learn IT. It’s the job of computer scientists like yourself to make that unnecessary.

  2. astrocompute says:

    Thank you for your comment. As a practicing astronomer and a practicing computer scientist, I think I can see this from both sides. In one sense I agree with you. An example: if I making available a service for users to search a large astronomy databases, the service should be in the language of the end-user, who should not need to know anything about the database structure, the server architecture and so on. But astronomers need to analyze data and write programs all time: in this instance, they are talking to computers , and even though languages abstract details of processor architecture away from programmers (and a good thing too!), a knowledge of how computers work, such as knowledge of how data are manged in memory, helps programmers write more efficient code. In this age of data intensive science, where 1 TB of data is no longer “big,” the need to write efficient code is greater than ever.

    I am not proposing that astronomers become IT specialists. In grad school, we are all required to take classes in instruments and methods. Knowing how a spectrograph works prepares us for writing proposals and doing research. I think the same level of instruction should be required for computing. The computer is now, after all, an instrument every astronomer now uses. Had I received such formal instruction even 30 years ago, when I entered grad school, it would have made me a better astronomer.

  3. John O'Meara says:

    “- Have and on-line journal dedicated to science computing.”

    Couldn’t agree more. But if at all possible make sure a fair portion of said journal relates to data archiving as well as processing 😉

  4. Michael Aye says:

    I just learned, that a new NASA project again rejects to have any project related data in the cloud, not even Dropbox shall be used.
    So I guess, unless the data can be encrypted somewhere on the way, NASA won’t be using the cloud for the foreseeable future. 😦

  5. astrocompute says:

    It is disappointing that projects reject cloud processing. Security is generally cited as the primary reason, though I don’t know which project you refer to.

    Note that public data sets can be stored and processed on the cloud – I do this all the time with public data sets served by NASA and other organizations.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s