Debunking Some Common Misconceptions of Science in the Cloud

This is the title of a presentation by Shane Canon (LBNL) and collaborators  given at the Science Cloud2011 workshop in San Jose, June 2o11. I gave an overview of this meeting in an earlier post.

Canon et al. give what I think is the most realistic assessment of the value of cloud computing to scientific applications. They point out that the cloud is at the peak of inflated expectations in the Hype Cycle, and set out to get it on the path to the Plateau of Productivity:

They debunk the following misconceptions about the cloud:

  • Clouds are simple to use and don’t require system administrators.
  • My job will run immediately in the cloud.
  • Clouds are more efficient.
  • Clouds allow you to ride Moore’s Law without additional investment.
  • Commercial Clouds are much cheaper than operating your own system.

They conclude that

  • Cloud Computing as it exist today is not ready for High Performance Computing because
  •  Large overheads to convert to Cloud environments
  •  Virtual instances under perform bare-metal systems and
  •  The cloud is less cost-effective than most large centers”

Commercial clouds are however effective in the following examples:

  • Individual projects with high-burst needs:
  •  Avoid paying for idle hardware
  •  Access to larger scale resources (elasticity)
  • High-Throughput Applications with modest data needs:
  • Bioinformatics
  • Monte-Carlo simulations
  • Infrastructure Challenged Sites where facilities cost much greater than IT costs
  • Undetermined or Volatile Needs:  Use Clouds to baseline requirements and build application in-house
This entry was posted in Astronomy, Cloud computing, cyberinfrastructure, High performance computing, Parallelization, software engineering, software maintenance, software sustainability and tagged , , , , , , , , , . Bookmark the permalink.

4 Responses to Debunking Some Common Misconceptions of Science in the Cloud

  1. Andrew says:

    I’ve found commercial clouds to be very useful for my projects – they fall into the “I need a whole bunch of computers for a few days” category. One advantage to using a cloud service that is very important to me is the ability to have root access to the virtual instances so I can configure them and install software just the way I want. On more traditional platforms this isn’t possible and it’s frequently painful to get newer versions of software/libraries installed.

    • Jonathan Dursi says:

      This sounds to me exactly like the cases it said it would work: “Infrastructure Challenged Sites where facilities cost much greater than IT costs” (you’re happy to do the work as root as long as you can get access to cheap machines) and “Individual projects with high-burst needs:”

      • Andrew says:

        Pretty much. My site isn’t infrastructure challenged in principle, but the barrier for making use of that infrastructure can be high (writing proposals, waiting for access, working with them to get the system configured the way I want). Using a commercial cloud has a much lower barrier to get up and running. But, it does require being willing and able to do the work yourself.

  2. carolune says:

    I just asked Erik Rosolowsky about Shane’s work – Erik just presented a cloud-based data processing pipeline that he proposed could be used for the SKA. His reply: ‘I haven’t yet seen a good distributed Fourier Transform’ 🙂

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s