Deep Dive on EC2 Instances (Performance Optimization)

  • Different things you can think about optimising the performance of your EC2 fleet: API, purchase options, networking and instances. AWS gave you to define the physical topology of your instances in 2011 with cc2 type of instances with “placement groups”. To find the right instance family, you need to look at what your application is constrained by. If a lot of memory, start with RX etc. Below some of the key features to improve EC2 performance
  • T2 instances for Amazon EC2 will dramatically reduce costs for applications that can benefit from bursts of CPU power. Although the baseline performance of the t2.medium instance seems significantly lower than the c3.large instance, remember that it can burst above the performance of the c3.large instance. If your workloads don’t use the full CPU often, but occasionally need to burst, t2.medium can give you the performance you need while reducing the cost in comparison to using a c3.large instance
  • Although enhanced networking does result in higher performance, lower latency, and lower jitter, these benefits happen between the EC2 instance and the EBS volume where your operating system is running. This would not give you lower latency in serving web content up to your end users on a significant scale. This would not easily scale horizontally; therefore, it may not be able to keep up with the unexpected heavy load. In addition, it lacks resiliency because placement groups cannot span across multiple Availability Zones. *

AMIs

  • It’s the base for creating an EC2 instance, they contain all the information required to launch an EC2 instance.
  • They contain the owner of the AMI, launch permissions (public, explicit and/or implicit), the architecture and the operating system and a block device mapping of all volumes required.
  • AMIs are regional based but can be copied between regions which also copies any volume snapshots.
  • If you create an AMI from an instance, block devices will have snapshots. AMI will then reference those snapshots. AMI is just a metadata containing those details including the additional volumes created before.
  • When you copy the AMI, it copies the snapshot references as well across the region. Then you’ll be able to launch a new instance. There are no permissions stored in the AMI. By default, it contains only the implicit permission that allows the creator of the AMI.
  • Instance Store AMI: configuring the root volume of the instance. Volume type will be set to instance store for both root and additional volumes. You need to bundle storage that instance store uses and put it in S3 when creating an AMI from instances with instance store. You can’t perform snapshot on instance store volumes.

ENI - Elastic Network Interface

  • You are allocating Security Groups to a specific ENI within an EC2 instance. You are able to add additional ENIs to the instance to be able to have more than 1 network interfaces. Security Groups are only associated to the network interface, not the instance itself.

vCPU

  • A vCPU is typically a hyper-threaded physical core. Normally if you have a process blocked on I/O and waiting on an user, with hyper-threading, CPU can wait and process the tasks as they coming in. Disable hyper-threading if you need to do compute heavy such as financial risk calculations and engineering calculations

Resource Allocation

  • If you are running the largest instance, you almost get the whole physical server allocated to yourself. All vCPUs are dedicated to you. Consistent experience every time you use it no matter what happens on the hardware. Network resources are partitioned to avoid “noisy neighbors” within AWS’ data centre

Timekeeping

  • Used for processing interrupts, getting date and performance counter on the instance. Most AMIs are going to use the Xen clock source because it’s compatible with every instance. TSC clock source: Handled by bare-metal, talking to the physical processor. CPU counter, accessible from userspace

P-state and C-state Control

  • If you are running an application that requires very high clock speed, C-state is useful because specific cores will turbo boost to 300MHz higher clock frequencies by entering other cores deeper idle states. P-State: Allows you to set the clock speed at a constant rate. Useful scenario for this is gaming.

T2 Instances

  • Great general purpose instances. Great for burstable CPU performance. Small databases, websites and development workloads. You start with baseline level of performance and you are gonna get that all the time. Magic of T2 is all about burst credits that allows you to burst above that baseline. A CPU credit provides the performance of a full CPU core for one minute. An instance earns CPU credits at a steady rate and consumes credit when active. There are 2 CloudWatch metrics for this: CPUCreditUsage and CPUCreditBalance (useful on AutoScaling considerations)

X1 Instances

  • Biggest with 2TB of DRAM and 128 vCPUs. Big in-memory DB and Big Data Processing and HPC

NUMA (Non-uniform memory access)

  • When you have that much memory, effective management of it even more important. On any system with multiple sockets, accessing the memory in the socket close to you is always going to be faster than in a remote socket within the physical infrastructure. QPI links connect two sockets for this purpose. For example, in x1.32xlarge with 4 sockets, things can get complex. You need to have 1 QPI to each socket as opposed to r3… which the connection will be slower

SR-IOV (Single Route IO Virtualisation)

  • This feature allows physical network device to be directly exposed to your OS. Packets no need to go to the hypervisor, you are directly talking to the bare-metal server. Application -> Sockets -> NIC Driver -> CPU Scheduling -> SR-IOV Network Device

Elastic Network Adapter

  • It is launched with the X1 instance. It offers you 20Gbits of network performance as compared to 10 when you get with Enhanced Networking. All traffic is limited to 5Gb/s when exiting EC2 instance, therefore use placement groups when necessary for better performance.

Placement Groups

  • Cluster Placement Group: specialised instances in a group, it can’t span AZ. It locates all the instances in close proximity. Highest level of throughput and lowest latency. Capacity is allocated in advance. It’s better to provision the same type of instances to get the best performance. If the question asks for the best of best performance, it’s going to be the cluster placement group. You can get 10G flow between two instances.
  • Partition Placement Group: Partitions in this case are isolated infrastructure blocks in a given AZ. If you want to provision a large number of instances and if you need to ensure high Availability, it’s partition placement group. Used in replicated, large scale workloads. HDFS, Hbase, Cassandra kind of systems.
  • Spread Placement Group: Similar to partition placement groups but with a less critical application consisting of smaller number of instances and spreads the infrastructure across AZs or a single AZ.

EBS Performance

  • Also a factor of instance sizes. EBS Optimisation creates a dedicated path for EBS traffic that separates from standard network traffic. EBS backed instances can be stopped and then you can reboot both your instance and the volume. You will not lose the data on this instance if it’s stopped. By default, root volumes will be deleted on termination, however with EBS volumes, you can tell AWS to keep the root device volume.

AutoScaling

  • AutoScaling overwrites the instance termination protection attribute and terminates the instance if it instructs the fleet to scale out.
  • If you use bootstrapping / user data functionality when spinning up EC2 instances, you have got flexibility but there will be a slight delay introduced compared to AMI baking. In the event of spinning up hundreds of EC2 instances with the same configuration, AMI baking is the right approach to take.
  • If you want to change the ASG configuration, you need to create a new one and associate with the ASG, you can’t change the existing one.
  • Launch Template is the new way to configure Auto Scaling now. You can pick AMI, specific type of EC2 instance, key pair name, VPC or classic, Security Groups and so on. You can access to advanced EC2 settings as well. They have been designed to support modularity and versioning. You can create a new one based on the existing version and make slight adjustments.
  • Default Cooldown: It’s the time period you set for the auto-scaled EC2 instances to finish performing certain actions before any other scaling activity takes place.
  • You can detach certain EC2 instances from an ASG or mark them as standby for any maintenance activities.

Misc Notes

  • If the spot instance is terminated by Amazon EC2, you will not be charged for a partial hour of usage, if terminated by yourself, you will be charged
  • AMIs are regional. You can only launch an AMI from the region in which it is stored. However you can copy AMI’s to other regions using the console, cmd or EC2 Amazon API
  • Suppose any single instance from all 3 accounts you have can get the benefit of AWS Reserved Instance pricing if they are running in the same zone and are of the same size, but only one instance at a time!

How to get the most out of EC2 Instances

  • Choose HVM AMIs
  • Timekeeping: use TCS
  • Leverage C state and P state controls
  • Monitor T2 instance CPU credits
  • Use a modern Linux OS
  • Consider NUMA balancing options for improved CPU utilisation
  • Persistent grants for I/O performance
  • Enhanced networking
  • Profile your app to the right EC2 instance types

Hibernating an EC2 Instance

  • Hibernating is a process to transfer data inside and outside RAM to AWS EBS root volume. That’s why the instance rool volume must be an Amazon EBS volume, not an instance store volume. The instance cannot be in an ASG or ECS. Also, the rool volume must be large enough so that RAM contents can be stored.