Deep Dive on EC2 Instances (Performance Optimization)
- Different things you can think about optimising the performance of your EC2 fleet: API, purchase options, networking and instances. AWS gave you to define the physical topology of your instances in 2011 with cc2 type of instances with “placement groups”. To find the right instance family, you need to look at what your application is constrained by. If a lot of memory, start with RX etc. Below some of the key features to improve EC2 performance
- T2 instances for Amazon EC2 will dramatically reduce costs for applications that can benefit from bursts of CPU power. Although the baseline performance of the t2.medium instance seems significantly lower than the c3.large instance, remember that it can burst above the performance of the c3.large instance. If your workloads don’t use the full CPU often, but occasionally need to burst, t2.medium can give you the performance you need while reducing the cost in comparison to using a c3.large instance
- Although enhanced networking does result in higher performance, lower latency, and lower jitter, these benefits happen between the EC2 instance and the EBS volume where your operating system is running. This would not give you lower latency in serving web content up to your end users on a significant scale. This would not easily scale horizontally; therefore, it may not be able to keep up with the unexpected heavy load. In addition, it lacks resiliency because placement groups cannot span across multiple Availability Zones.
- A vCPU is typically a hyper-threaded physical core. Normally if you have a process blocked on I/O and waiting on an user, with hyper-threading, CPU can wait and process the tasks as they coming in. Disable hyper-threading if you need to do compute heavy such as financial risk calculations and engineering calculations
- If you are running the largest instance, you almost get the whole physical server allocated to yourself. All vCPUs are dedicated to you. Consistent experience every time you use it no matter what happens on the hardware. Network resources are partitioned to avoid “noisy neighbors” within AWS’ data centre
- Used for processing interrupts, getting date and performance counter on the instance. Most AMIs are going to use the Xen clock source because it’s compatible with every instance. TSC clock source: Handled by bare-metal, talking to the physical processor. CPU counter, accessible from userspace
P-state and C-state Control
- If you are running an application that requires very high clock speed, C-state is useful because specific cores will turbo boost to 300MHz higher clock frequencies by entering other cores deeper idle states. P-State: Allows you to set the clock speed at a constant rate. Useful scenario for this is gaming.
- Great general purpose instances. Great for burstable CPU performance. Small databases, websites and development workloads. You start with baseline level of performance and you are gonna get that all the time. Magic of T2 is all about burst credits that allows you to burst above that baseline. A CPU credit provides the performance of a full CPU core for one minute. An instance earns CPU credits at a steady rate and consumes credit when active. There are 2 CloudWatch metrics for this: CPUCreditUsage and CPUCreditBalance (useful on AutoScaling considerations)
- Biggest with 2TB of DRAM and 128 vCPUs. Big in-memory DB and Big Data Processing and HPC
NUMA (Non-uniform memory access)
- When you have that much memory, effective management of it even more important. On any system with multiple sockets, accessing the memory in the socket close to you is always going to be faster than in a remote socket within the physical infrastructure. QPI links connect two sockets for this purpose. For example, in x1.32xlarge with 4 sockets, things can get complex. You need to have 1 QPI to each socket as opposed to r3… which the connection will be slower
SR-IOV (Single Route IO Virtualisation)
- This feature allows physical network device to be directly exposed to your OS. Packets no need to go to the hypervisor, you are directly talking to the bare-metal server. Application -> Sockets -> NIC Driver -> CPU Scheduling -> SR-IOV Network Device
Elastic Network Adapter
- It is launched with the X1 instance. It offers you 20Gbits of network performance as compared to 10 when you get with Enhanced Networking. All traffic is limited to 5Gb/s when exiting EC2 instance, therefore use placement groups when necessary for better performance.
- Also a factor of instance sizes. EBS Optimisation creates a dedicated path for EBS traffic that separates from standard network traffic. EBS backed instances can be stopped and then you can reboot both your instance and the volume. You will not lose the data on this instance if it’s stopped. By default, root volumes will be deleted on termination, however with EBS volumes, you can tell AWS to keep the root device volume.
- AutoScaling overwrites the instance termination protection attribute and terminates the instance if it instructs the fleet to scale out.
- If the spot instance is terminated by Amazon EC2, you will not be charged for a partial hour of usage, if terminated by yourself, you will be charged
- AMIs are regional. You can only launch an AMI from the region in which it is stored. However you can copy AMI’s to other regions using the console, cmd or EC2 Amazon API
How to get the most out of EC2 Instances
- Choose HVM AMIs
- Timekeeping: use TCS
- Leverage C state and P state controls
- Monitor T2 instance CPU credits
- Use a modern Linux OS
- Consider NUMA balancing options for improved CPU utilisation
- Persistent grants for I/O performance
- Enhanced networking
- Profile your app to the right EC2 instance types