Wednesday, March 07, 2007

Amazon Web Services in Telephony

With all this talk about Amazon Turks, I wanted to push another idea out there which I think is just as important : EC2 and S3.

EC2 stands for the "Elastic Computing Cloud", which is Amazon's rent-a-server service. Need a server for an hour? Rent it for an hour. You package your standard web-like program in their image, upload it, and off you go. Need a thousand servers right now? Go get them. Not now? Get rid of them. I remember a few years ago I worked on this application for a "get out the vote" effort in Utah, where we had crazy traffic, but only for a few days.

S3 is the same sort of deal for storage. Need storage, go get it. It's fairly inexpensive too. This is a pretty interesting thing for a different reason: cash flow. As a service provider, you don't need to scale your resources until you scale your business. From a technical standpoint, nothing changes. All good.

Does the name Erlang ring a bell? If you happen to be a telephone engineer, it should. Mr. Erlang figured out how many telephone lines you needed to handle the traffic from a set of subscribers. For instance, everyone doesn't pick up the phone at the same time in a town and talk at once. On average, maybe one in a hundred people are using the phone at any one time. To be safe, telephone switches are typically setup to handle ten times that amount, just in case everyone decides to let their fingers to the walking. In Israel, they call it a "Scud Event". When a Scud missile flies across the country, all the grandmothers call to make sure everyone's OK. So, you actually have about ten times more hardware than what you would typically use, just in case somebody decides to start a war, or the equivalent.

How does this apply? Well, consider EC2. If you could deploy servers "on demand", then you could ramp up capacity when you needed it. You might think that it takes longer than a few seconds to deploy a server, and it does need a little more time than that, but peak usage actually grows rather slowly on the phone system, as the growth comes not only from new calls, but longer ones. The impact? One tenth the amount of hardware? Not bad.

Doesn't stop there. A big thing in telephony is resilience, which typically means redundant hardware. (Note to self : this is the wrong time to get on a soap box about how idiotic it is that telephony guys are always fighting a fragile network through making each element stronger instead of making the system stronger. They never watched that Borg episode in Star Trek, apparently.) If you could ramp up other servers, you could radically increase reliability at a lower cost. In this, don't let your head stop at failure of hardware, keep it moving to think about distributed denial of service attacks, or military applications. For this, EC2 and S3 shines.



Hey Tom,
Very interesting site! By the way, you did a fine job on your "white belt" test tonight on your "Mai Tai" exam! Hey, if you ever need a wing man or body guard on one of your trips, give me a shout. All the best, Captain Jim
aka/ judo jim

Paul Sweeney said...

This makes perfect sense and it is only a matter of time before somebody models it.BTW: congratulations on your mash up win.

Thomas Howe said...

Thank you!

Patrick said...

Interesting aside about the telco resilience mindset on building it component by component instead of holistically. Check out this very good video from Van Jacobson's presentation at Google.

Thomas Howe said...

Yes, yes, yes. I've seen Van Jacobson speak before, and that's where I understood the difference in the different architectural approaches. I think this deserves a blog post.