AWS Elastic Load Balancing DEEP DIVE
- In AWS, ELB (also Route 53) performs a health check on an instance at regular intervals. If any instance is unhealthy during a defined period of time, it will stop sending traffic to that instance. AutoScaling also performs an internal health check and if any instance is unhealthy it will terminate and start a new instance. This helps achieve self-resiliency and high availability
- You cannot assign on-premises instances to ELB. ELBs cannot span regions
- When a client sends a request to a load balancer, it first resolves the domain name with DNS servers, the DNS servers use the DNS round robin to determine which LB node in the specific AZ will receive the request. The selected LB node then sends the request to the healthy instances within the same AZ using the leastconns routing algorithm.
- Sticky session is that ELB binds the users’ IP with a specific instance every time. E.g. checkout process on an e-commerce website.
How to implement a Service Discovery
- For an EC2 hosted service, a simple way to achieve service discovery is through the Elastic Load Balancing service. Because each load balancer gets its own hostname you now have the ability to consume a service through a stable endpoint. This can be combined with DNS and private Amazon Route53 zones, so that even the particular load balancer’s endpoint can be abstracted and modified at any point of time.
- Another option would be to use a service registration and discovery method to allow retrieval of the endpoint IP addresses and port number of any given service. Because service discovery becomes the glue between components, it is important that it is highly available and reliable. If load balancers are not used, service discovery should also cater for things like health checking. Example implementations include custom solutions using a combination of tags, a highly available database and custom scripts that call the AWS APIs, or open source tools like Netflix Eureka, Airbnb Synapse, or HashiCorp Consul.
- You will have SSL or TLS traffic terminating at ELB all the time for security. You need to upload a certificate for it to work.
- ELB utilizes every single AZ!
2 types of Load Balancing: Network (Layer 4) and Application Load Balancing (Layer 7)
Network Load Balancing
- Supports TCP and SSL, incoming client connection bound to server connection, no header modification. Proxy protocol prepends source and dest IP and ports to request
Application Load Balancing:
- Supports HTTP and HTTPS, connection terminated at the load balancer and pooled to the server, headers may be modified, X-Forwarded-For header contains client IP address.
- Multiple applications can sit behind a single Application Load Balancer with path forwarding which reduce costs
- App LB provides native support for microservices/container based app. Instances can be registered with multiple ports, allowing for requests to be routed to multiple containers on a single instance. ECS will automatically register tasks with the load balancer using a dynamic port mapping App LB is integrated with many other services including Route 53 Request Tracing: ALB insert a unique trace identifier into each request into a customer header: X-Amzn-Trace-ID: Trace identifiers are preserved through the request chain to allow for request tracing. Trace identifiers are included in access logs and can also be logged by apps themselves
When should we use ALB over ELB?
- Only supports HTTP/S, if you have a TCP or SSL LB, continue to use that For all other use cases, use ALB
- Contains a world of information that you can parse and leverage to secure your environment. Includes request time, client IP address, latencies, request path, server responses, ciphers and protocols, and user-agents. Move all ELB logs to your S3 buckets with the right permissions and policies.
- ELB Access Logging: Network level logs on TCP traffic coming in and going away from ELBs.
Scalability of ELB
- Global Scalability: Integrates with Route53 latency based routing and geo-based routing or traffic policies (recently launched). Use cases are within trading, online advertising bidding etc. where latency is critical
- Latency definition: It measures the time elapsed in seconds after the request leaves the load balancer until the response is received. Test by sending requests to the back-end instance from another instance.
- It is actually a scalability product balancing the load onto EC2 instances
- Garbage collection cause latency, too little RAM too load etc.
- Caching: ELB does app caching at different layers when service requests from outside world
- Least Connection Weighted Round Robin: AWS determines the instances with least load and then ELB sends the queued traffic to the instance with least load etc. As long as capacity available. Beware of blackholing of traffic.
- ELB’s own scaling is a mix of pre-emptive, based on the instance capacity you add, and reactive, based on the load you receive.
- SurgeQueueLength: it queues up requests that backend are not yet ready to process. Holds up to 1024 requests. Once your backend is available, AWS slowly released them. Number of inbound requests currently queued by the load balancer waiting to be accepted and processed by a backend instance (max)
- SpilloverCount: This happens when the request rate exceeds the capacity of the instances. Number of requests that have been rejected due to a full surge queue during the selected time period (sum).
- Pay attention to the traffic coming in and scale accordingly. Every ELB is %50 overscaled already just in case for disaster recovery etc. Also, make use of predictive scaling by CloudWatch metrics such as HealthHostCount: Metric to use and trigger an alarm. The count of number of healthy instances in each AZ. Most common cause of unhealthy hosts are health check exceeding allocated timeout
Availability of ELB
- New instance behind the ELB, if the health checks are ok, move the app to the new instance and kill the other instance.
- Health checks: TCP and HTTP checks. You can customize frequency and failure thresholds. Must return a 200 response.
- Idle timeouts: If the request comes to ELB and timeout is taking too long, it allows for connections to be closed by the load balancer when no longer in use. It can be used for both client and back-end connections
- One ELB with 2 AZs in best practices for high availability.
- Traffic imbalances: can happen in different AZ. Cause of this is DNS caching and spreading issue. DNS TTLs are generally honored. Sometimes there simply are not enough DNS servers to spread around load fairly. Mobile networks typically have a dozen or so top-level resolvers.
- Cross Zone LB: All the ELB nodes register all the backend in different AZs.
- Running on 3 zones with 3 instances each will be cheaper than running on 2 zones with 6 instances each to keep the minimum number of instances requirement of 6
- Idle Timeouts: Allow for connections to be closed by ELB when no longer in use. Timeouts should decrease when going up in the stacks. Like the timeout between EC2 instances and S3 should be shorter
Cross Zone Load Balancing
- Load balancer absorbs impact of DNS caching. Eliminates imbalances in back-end instance utilisation. Requests distributed evenly across multiple AZs. Check connection limits before enabling. No additional bandwidth charge for cross zone traffic. Until now, ELB relied on DNS to distribute requests from clients to the Availability Zones where the back-end instances were located. This meant that you could experience situations where some instances would receive a disproportionally higher amount of inbound requests than others because clients were incorrectly caching DNS information. In addition, if you did not have an equal number of healthy instances in each of the Availability Zones (e.g. if you had taken some instances down for maintenance), your requests would be balanced across a smaller number of healthy back-end instances in the affected -Availability Zones – meaning that you would have to closely monitor and adjust your back-end instance capacity.
- If you enable cross-zone load balancing, you no longer have to worry that clients caching DNS information will result in requests being distributed unevenly. And now, ELB ensures that requests are distributed equally to your back-end instances regardless of the Availability Zone in which they are located. This change makes it easier for you to deploy and manage your applications across multiple Availability Zones.
DNS Caching and Spreading
- Sometimes there is only a few number of DNS servers answering requests and spreading the load fairly. Mobile networks typically have a dozen or so top level resolvers. Enterprise networks have as few as one. Workaround for this => DNS Optimization: DNS Caching by clients and ISPs can often cause clients to target a specific IP address or stop resolving at all. Register a wildcard CNAME or ALIAS within Route53