/Interview/ Q&A: Jeff Barr
18/09/2008 | Filed under Discover > Interview

Amazon Web Services consume more bandwidth than all of its global sites combined. Evangelist Jeff Barr explains who’s using them and why
.net: This year you announced that the Amazon Web Services use more bandwidth than all of Amazon’s global sites combined. Why did Amazon decide to open up its web services in the first place?
JB: We spent over a decade building one of the world’s most reliable, scalable and cost-efficient web infrastructures to run Amazon.com. In that time we learned a great deal about how to both develop these services and run them. Amazon Web Services gives any software developer the keys to Amazon’s back-end infrastructure, which they can use to build and grow any business. This makes it possible for any business to reach the scale of major internet players such as Amazon.com, but without the expensive price tag these companies must pay to build and maintain such a reliable, secure and scalable infrastructure.
.net: Who are they aimed at?
JB: When we started this business we imagined that smaller companies, particularly startups, would be the first to take advantage of our services given their low cost nature and the fact that they get to leverage our massive scaling capabilities with no upfront investment. Now that our services have become more mature, we’re seeing larger companies take advantage as well. We thought that transition would take a couple of years and what we’re seeing is that it’s happening a lot more quickly than expected. It’s clear that even large companies don’t like dealing with the muck, so we’re happy to do it for them.
.net: What kind of technical knowledge do you need to use Amazon Web Services?
JB: These services are built for developers, so one needs to have some level of coding ability to use them. A number of powerful load management tools, high-level programming libraries, and complete programming systems have also been built by members of our developer community, both commercial and open source.
.net: What’s the take up been like? How many people are using Amazon Web Services, and who are they?
JB: We have over 370,000 registered developers to date and this includes a broad spectrum of companies, ranging from startups like Ooyala and Animoto to large companies like The New York Times Company, NASDAQ, and SanDisk to partners like Red Hat and Sun Microsystems. Companies ranging from small startups to large enterprises are leveraging Amazon Web Services. We have found that companies of all sizes can benefit from the reliability, scalability and low cost of Amazon Web Services. We’re pleasantly surprised with the traction we’re starting to see with larger companies who are using AWS. We’ve been very happy with the number of users using multiple Amazon Web Services. Another indicator of growth is the number of objects stored in Amazon S3, which is currently 18 billion. Just two of our infrastructure services - Amazon S3 and Amazon EC2 - now consume more bandwidth than all of Amazon.com’s global websites combined.
.net: How profitable is this part of Amazon’s business?
JB: We don’t disclose that information.
.net: At the beginning of the year the S3 storage service was hit by a major outage. It only lasted a few hours but it meant that many websites went down as well. Then in April it hit the EC2 cloud computing service. What happened and what have you done to ensure it doesn’t happen again?
JB: I’ll take these separately. In regards to Amazon S3… In one of our locations we started seeing elevated levels of authenticated requests from multiple users. While we carefully monitor our overall request volumes and these remained within normal ranges, we had not been monitoring the proportion of authenticated requests. Importantly, these cryptographic requests consume more resources per call than other request types. Within a short amount of time, we began to see several other users significantly increase their volume of authenticated calls. The last of these pushed the authentication service over its maximum capacity before we could complete putting new capacity in place. In addition to processing authentication requests, the authentication service also performs account validation on every request Amazon S3 handles. This caused Amazon S3 to be unable to process any requests in that location. As part of the post mortem for this event, we identified a set of short-term actions as well as longer term improvements. We took immediate action on the following: (a) improving our monitoring of the proportion of authenticated requests; (b) further increasing our authentication service capacity; and © adding additional defensive measures around the authenticated calls. Though we’re proud of our uptime track record over the past two years with this service, any amount of downtime is unacceptable.
In regards to Amazon EC2… We began a maintenance change with one of our redundant internet access points. Under normal circumstances, our network routing protocols automatically shift traffic away from these internet access points until a change is completed. In this case, a latent, incorrect configuration caused affected instances to route outbound traffic to the degraded internet access point. As a result, the outbound internet traffic routed via this internet access point was not successfully forwarded to the internet. Our monitoring correctly detected the initial loss of connectivity and the engineering teams were fully engaged within minutes. Because this unidirectional failure was not an anticipated failure mode of our network topology, our monitoring and debugging did not help us identify the problem quickly. The problem was correctly identified and traffic was forced away from the internet access point. At this point, affected instances fully recovered.
Unlike previous EC2 networking issues, this issue affected instances in multiple Availability Zones. It’s worth reiterating that our Availability Zones are engineered to fail independently. For example, EC2 Availability Zones do not share power transformers, generators or common cooling. Each Availability Zone is physically separated from other Availability Zones to prevent correlated failure from events like fires, floods or physical damage to a datacenter. Additionally, each Availability Zone has physically and logically redundant connections to multiple internet access points, and utilises routing protocols to independently choose which internet access point to use. This helps ensure high availability for each individual Availability Zone. This event affected multiple Availability Zones because more than one Availability Zone independently routed some traffic to the faulty internet access point.
As with any operational issue, our internal post mortem process helped us identify ways that we can prevent this sort of issue in the future, and more generally, reduce our recovery time when the unexpected does happen. In addition to correcting the routing protocol issue, we identified a failsafe that will help assure automatic failover of Availability Zones from degraded internet access points. We also identified a series of improvements to our internal networking monitoring that will help us more quickly isolate the cause of networking issues in this part of our infrastructure.
.net: Where do you see the future of cloud computing?
JB: Over the past two years, the Amazon Web Services platform has become increasingly robust and customer-friendly as we’ve listened to our developers about the services they are using and innovating on their behalf. We’ve added numerous features to our services as a result of this ongoing feedback loop. We’re pleased by the number of businesses in many different types of industries who are utilising the services. We are very excited about offering web scale computing services that free software developers and businesses from the heavy-lifting typically associated with launching and growing a successful web business. I think the future is bright for these types of services.
.net: How do you feel about Google’s App Engine?
JB: We have a long-standing policy of not discussing other companies.
.net: What kind of web services are you planning to add in the future?
JB: We don’t speculate much on future plans but I can tell you that we will continue to listen to our developers and innovate on their behalf.
.net: What’s your vision for Amazon Web Services?
JB: What AWS is doing is offering web scale computing services that free software developers and businesses from the heavy lifting typically associated with launching and growing a successful web business.
Jeff Barr
Job title Amazon Web Services Evangelist
Age 47
Education Bachelor’s degree in computer science from the American University in Washington, DC; graduate studies at the George Washington University, Washington DC
Previous career Jeff operated his own consulting practice and also spent some time at Microsoft, where he worked on Visual Basic and .NET
Blogs ww.jeff-barr.com and aws.typepad.com






