Cloud load balancing is critical to delivering scalable and reliable services to customers, and yet many cloud providers need to modernize their approach to load balancers and application delivery controllers.
Amazon Web Services (AWS) experienced its fourth outage of the year on Christmas Eve, citing a disruption due to the accidental deletion of configuration files for its elastic load balancing (ELB) service, which is responsible for distributing incoming data from applications to be handled by different computing hardware at the Northern Virginia data center location, the provider said in an event summary on the AWS website.
Once AWS determined that more than 6% of its load balancers were not functioning properly, it disabled the rest of its production load balancers to prevent them all from being affected. The customers had no load balancing technology available to them in the AWS cloud in the meantime, much to the chagrin of high-profile AWS customers like Netflix. The failure left the video streaming provider with a four-hour partial outage for users attempting to stream video from Netflix across some devices, the company tweeted on Dec. 24.
While human error cannot be completely eradicated, cloud providers can adopt different strategies with their application delivery controllers (ADCs) and load balancers, such as giving customers more control with virtual appliances and improving automation.
Cloud load balancing: Control in the customers' hands
Amazon's ELB service is a multitenant service. When AWS experiences a hardware failure it can affect some, if not all, customers in its data center. But many service providers are electing to offer their customers load balancing as a dedicated service that can be deployed by the customer instead of the provider, said Apurva Dave, vice president of products and marketing for the Stingray Business Unit of San Francisco.-based Riverbed Technology.
"While some providers offer a fully managed approach, in which they take ownership of everything -- including networking and application delivery -- some cloud providers are choosing to give their customers access to those capabilities through virtual instances of those tools," said Sam Barnett, directing analyst for data center and cloud at Campbell, Calif.-based Infonetics Research Inc.
Joyent, a San Francisco-based cloud provider and Riverbed customer, offers its users access to virtual application delivery and load balancing from the Stingray product line via both a provider-managed service -- like Amazon's ELB -- or as a service controlled and managed by the customer, said Jason Hoffman, chief technology officer for Joyent.
The shared structure of Amazon's ELB service was a contributing factor in the Christmas Eve failure, Hoffman said. "Instead of having a monolithic service and scaling it across all customers, cloud providers can employ services on a per-customer basis so when one service goes down, it doesn't have a cascading effect."
"If a [Joyent] customer has their own instance of Stingray software, the customer has control and anything that impacts other customers will not affect their data path, even if they are on a multi-tenant infrastructure," he said.
More automation, less risk of failure
Granting customers more control over their cloud infrastructure is one way to eliminate catastrophic failures, but configuration changes and managing networking elements doesn't have to always be the user's responsibility.
"It should really be about what policies and procedures the provider has in place to minimize the impact of failure, or prevent configuration change failures from happening, Infonetics' Barnett said.
More on cloud load balancing
Cloud balancing: Interconnecting data centers
Monitoring cloud workload activities
Load balancing in a private cloud environment
Cloud providers need more automation and change management to reduce the risk of mistakes that could result in failures. Infoblox, a Santa Clara, Calif. -based network management vendor, offers NetMRI, a network confirmation automation tool for both enterprises and service providers.
NetMRI can track all configuration, movement and deployment changes and trace them back to the employee that made the changes, said Stu Bailey, Infoblox CTO and founder. The tool is also designed to automate any configuration process that is manual today for cloud providers.
There are a lot of processes that have not yet been automated in cloud environments, Bailey said. "This is a growing indication of the need for multi-disciplinary automation in the network infrastructure and Web services space."
"Every little networking piece that is a manual process today… could mean a million dollar outage tomorrow," he said.