Archive for June, 2010

Quick Computation Infusion

June 2nd, 2010 3 comments

I remember when I had to carry out an experiment on Word-Sense-Disambiguation while doing by PhD. I was dealing with several million word collections of text (corpora), and when I started the WSD process, I realized that it would take me around two months to get the data I needed.

That was crazy. However, luckily, Cambridge had access to a CONDOR supercomputing cluster that was available. I paralleliized my code,  and launched it on this cluster. After a couple of false starts I got all the data I needed. 42 computers working over 4 days (a weekend+!) got me the 60 days of data.

Nowadays, if you have a bit of cash to burn, you can use the cloud for the same purposes. Here is a case study of an individual who (two years ago!) employed a thousand nodes on Amazon to get their data processed.

It cost them 900 for the CPU and perhaps another 800 or so for the data transfer. Not bad.. that’s pretty cheap to have a thousand computers working for you !


Cloud computing costs

June 1st, 2010 No comments

I carried out a quick estimate on the monthly cost of a reasonably CPU intensive web application being run on a cloud computing platform. The results are quite interesting:

Cloud Computing Costs Summary
Configuration: 10 CPU instances, 1 TB of data transfer, and 0.5 TB of data storage

Amazon EC2: 720 + 150 + 50 = 920
Google App Engine: 720 + 100 + 75 = 895
Slicehost: 70 * 10 = 700

Amazon EC2 pricing:

CPU (monthly):
0.1/hour (windows)
$72 per month

0.15 per GB
150 per TB

$0.10 per GB-month of provisioned storage
100 per TB-month

Google App Engine

Outgoing Bandwidth      gigabytes      $0.12
Incoming Bandwidth     gigabytes     $0.10
CPU Time     CPU hours     $0.10
Stored Data     gigabytes per month     $0.15

CPU (monthly): 72
$163 per month

0.15 per GB
150 per TB

$0.10 per GB-month of provisioned storage
100 per TB-month


10 instances of 1GB slice
400GB storage
6TB bandwidth
10 cpu instances (burstable to equivalent of 40)
$70 * 10 = 700

Note: The primary constraint on slicehost is storage. You get more bandwidth and CPU capability using multiple slices working together as compared to the standard cloud computing offered by EC2 or App Engine