server rack

High Availability: Load Balancers and Auto Scaling

This is the third article of many where I write about everything I’m learning about the AWS cloud. The last article was on storage options for EC2 instances. In this post, I’ll talk about high Availability and how the AWS load balancer can be used to achieve it.


  1. EC2 Fundamentals
  2. EC2 Storage
  3. This article

High Availability

High Availability in computer systems is the ability of a system to keep working and provide a reasonable level of service even in the event of failures, faults or challenges to normal operation. Four standard techniques to achieve High Availability are:

Redundancy: Having multiple systems or servers that perform the same function, if one fails, the other servers can take over.

Load Balancing: This prevents one server from overloading by spreading or distributing the load or incoming traffic to multiple servers.

Scalability: Scalability refers to the ability of a system to handle a growing amount of work or demand. Scalability allows a system to handle increased demand by adding additional capacity. This allows the system to handle peak demand without compromising performance or availability.

Failover Mechanisms: Failover refers to having backup systems that you can automatically switch to in case of failure. The transition to the backup systems should happen seamlessly.

Monitoring: This involves proactively monitoring the system performance to identify issues early, enabling you to take corrective actions before users are impacted.

In this post, I’ll focus on load balancing and scalability in an AWS cloud context. I’ll talk about the different load balancers offered by AWS and when to use each one and then I’ll talk about how to set up instances to scale automatically.

Load Balancers

In AWS, load balancers are called Elastic Load Balancers. Load balancers are a single point of contact for your clients. They distribute incoming traffic across multiple targets such as EC2 instances (virtual machines), containers or IP addresses. This increases the availability of your application. Load balancers allow you to expose a single domain name or IP address to users, this way, the users don’t need to know the DNS name or IP addresses of specific instances serving up your application.

Since load balancers spread incoming traffic across multiple instances, they can check on the health of each instance. If an instance or downstream server becomes unhealthy, the load balancer can stop sending traffic or initiate a process to terminate it. AWS offers three types of Load Balancers: the Application Load Balancer, the Network Load Balancer and the Gateway Load Balancer.

Application Load Balancer

Application Load Balancers(ALB) work at the Application or layer 7 of the OSI model. This means they understand the content of HTTP and HTTPS requests and can decide how to route the traffic based on its content. An ALB can make decisions based on the URL, the request method or headers in the request and route the traffic to the appropriate servers. The servers fronted by a load balancer are called target groups and ALB allows you to create Listener Rules that determine which target group gets traffic based on information in the HTTP or HTTPS request.

For example, you can use listener rules to route traffic based on

  • The URL e.g to one group and to another group of servers.
  • the hostname: &
  • Query string paramaters: &

Target groups can be EC2 instances, Lambda functions, ECS tasks, or IP addresses. ALBs offer a static DNS name and no IP addresses. This means that your load balancer is accessible by a DNS name that doesn’t change even if the underlying infrastructure(IP addresses of the targets it fronts) changes. When a load balancer receives a request and forwards it to a target, the target does not see the client IP directly; it sees the request’s source as being from the load balancer itself. To go around this, the load balancer adds the X-Forwarded-For header, which contains the client’s true IP address. It also adds the client’s port and protocol they used to the X-Forwarded-Port and X-Forwarded-Proto headers.

Network Load Balancer

Network Load Balancers(NLB) work in the OSI model’s layer 4 (transport). This means that they only understand the IP addresses and ports in the requests; they don’t understand the content of the requests. NLBs handle TCP and UDP traffic and forward it to downstream servers. NLBs can handle millions of requests per second, making them ideal for systems that require extreme performance or systems that have high volumes of traffic and require low latency or systems that use different ports for different services.

NLBs can be used to front EC2 instances, Private IP addresses, and Application Load Balancers. This load balancer can be given static IP addresses and a DNS name.

Gateway Load Balancer

This load balancer operates in layer 3(Network) of the OSI model. It allows you to deploy 3rd party network virtual appliances such as firewalls, deep packet inspection, intrusion detection and payload manipulation systems. It can be used as a single point of entry for *all* traffic going in or out of a system. It uses the GENEVE protocol on port 6081.

Other benefits of using a load balancer

Load balancers have additional benefits, such as enabling session affinity, cross-zone load balancing and connection draining.

Session Affinity (Sticky Sessions)

By default, AWS Load balancers route each request independently to downstream targets based on a chosen routing algorithm. Load balancers can be configured to bind requests from the same client to the same server for some time and this is called session affinity or sticky sessions. Session affinity is good for applications that require session state such as login sessions or shopping carts. The load balancer achieves stickiness by creating or tracking a cookie in the client.

Sticky sessions work only for the Classic Load Balancer, Application Load Balancer, and Network Load Balancer.

Cross Zone Load Balancing

Cross-zone load balancing is a feature of load balancers that distributes traffic across multiple availability zones. This can improve the availability of your application, ensuring that traffic is not routed to zones experiencing problems. When cross-zone load balancing is on, the load balancer instance distributes traffic evenly across all registered targets in all the availability zones. This ensures that AZs with fewer instances don’t get overwhelmed.

Load balancers in two different availability zones with cross zone load balancing enabled
With cross-zone load balancing enabled
With cross-zone load balancing disabled.


Load balancers distribute traffic to different servers, and this helps keep your application highly available and ensures that your application servers don’t get overloaded with requests. Load balancers can also scale your application by adding or removing servers as needed. The next post will be on how to achieve automatic scaling.