Another Day, Another Billion Flows - How VPC technology works?

Misc Notes

  • Public IP of your instance is being held on the array of Internet Gateway. IGW array controls all the IPs. It deals with IPs and managing packets.

VPC Encapsulation

  • AWS has its own proprietary VPC encapsulation to disambiguate the overlapping IP address. Which VPC, ENI (physical interface) all these info are on VPC encapsulation
  • Blackfoot Edge Devices: Translate from the inside VPC encapsulation and translate the packets to the outside normal IP networking, it has an embedded router which performs Encapsulate and de-encapsulate. It will NAT the internal IP to the public IP addresses.

Encapsulation in Direct Connect

  • It encapsulates the packet to make it ready for VLANs as Direct Connect is represented as a physical VLAN

Gateway Endpoints — S3 / DynamoDB Endpoints

  • Enables you to directly connect these services to your VPC without traversing the internet.
  • Completely private access to the S3 endpoints. Support you want to enable access to your S3 buckets only from a specific Endpoint ID but not from public internet.
  • You associate Gateway Endpoint with a subnet route table and prefix list added to that route table for subnets. These list use Gateway Endpoint as the target.
  • Endpoint policy can be used to restrict access to a specific S3 bucket for example.
  • It’s only Regional!

Interface Endpoints vs Gateway Endpoints

  • Gateway Endpoints cover S3 and DynamoDB and highly available within a region by design. They don’t need to utilise DNS but route tables.
  • Interface Endpoints are specific to a VPC and a specific subnet whereas Gateway Endpoints are not. They do not use route tables whereas Gateway Endpoints use route tables. It’s only TCP and IPv4.
  • Interface Endpoints occupy a physical network interface in a specific subnet. Gateway Endpoints are highly available. Interface endpoints do occupy a single AZ. These are physical entities… utilises a product called PrivateLink where yourself, 3rd parties or AWS can provide services to this product over a secure private comms channel.
  • Interface Endpoints need to utilise security groups whereas Gateway Endpoints utilise IAM resource policies.
  • Interface Endpoints are a network level services. They provide for service default specific DNS names (such as public IP of SQS via sqs.eu-east-1.amazon etc.). However, they provide unique/private DNS names to resolve to the internal private IP of the interface endpoint as well. That way, you don’t need to modify application code to access public AWS services.
  • Interface Endpoints have Regional DNS for all the AZs or Zonal DNS only for 1 AZ.
  • Applications use these DNS or they can use PrivateDNS overrides the default DNS for services.

PrivateLink

  • Provides connectivity between VPCs, AWS services and on-premises applications securely on the Amazon network.
  • Where you have an app that you want to securely provide to another AWS account or you want to consume a service from a provider within another AWS account.
  • Endpoint Services are created and access is granted.
  • For HA, you need to deploy multiple endpoints.
  • Supports only TCP & IPv4.
  • Private DNS is supported with verified domains.

Mapping Service

  • Identifies which ENI corresponds to which instance and physical host at the moment. It also manages the routes. Embedded routers within physical host talks to Mapping Service. This is not like ARP, much more enhanced. Look ups are really fast. It is a distributed web service that handles mapping between customers VPC routers and IPs and physical destinations on the wire

VPC Flow logs

  • They are ENI aggregated data
  • Flow logs only provide metadata about the traffic such as version, account-id, interface-id, srcaddr, dstaddr, srcport, dstport, protocol, packets, bytes, start, end, action and log-status. They don’t provide the actual data itself.
  • DHCP, AWS DNS queries, Metadata queries (EC2 instance metada), Licence Activation Requests are not logged within VPC Flow Logs
  • You can place monitoring points; (monitoring points will be always ENIs)
    • VPC level
    • Subnet level
    • Specific ENI level
  • Flow logs can be sent to CloudWatch or even into S3 bucket. You can use other tooling like Amazon Athena to interact with flow logs from S3.
  • You can configure ‘accept’, ‘deny’ or ‘all’ traffic for flow logs.
  • You need to select an IAM role that has permissions to store the data, either CloudWatch logs or S3 buckets.
  • Flow Logs are not realtime…
  • You will have a log file for each ENI
  • Reject / Accept - if you see an unusual distribution, then there might be an anomly between NACL or Security Groups.

Network Load Balancer

  • It is embedded in the VPC network. Can load balance flows natively and transparently in the VPC network

HyperPlane

  • On the VPC network along with EC2 physical hosts + Blackfoot. You can make an EC2 instance a HyperPlane. They do state tracking etc. It is a distributed system, hyperplane nodes makes transactional decisions and share state in tens of microseconds.
  • For NAT: HyperPlane guarantees that connections to the same destination IP/destination port pair have a unique source port. Example of this is the software updates. Let’s say Ubuntu update, so many instances will want to hit the same IP with the same port at the same time
  • For NLB: When the first packet comes in, HyperPlane selects the target instance or container that should handle a connection. It does this by keeping the state info

HyperPlane and Shuffle Sharding

  • Let’s say you have 8 HyperPlane nodes. All 8 shared by all customers. The problem with that is the noisy neighbour problem, busy customer floods all 8 nodes! You can shard it to 4 groups of 2! Now 1/4 of your customers have a bad day, it’s better. Shuffle Sharding: you just give 3 of 8 randomly to a customer. Another customer, 3 nodes again but 1 overlap between these two. Potential overlap decreases by the number of nodes increase!!!

Enhanced Networking

  • A lot of application connections look like request/replies. Web services call to a load balancer, pulling data from an application server. There is a strong correlation in between the latency and the underlying packet rate you can obtain from the network. Packet per second capability of the instances really matters in this case. With enhanced networking, AWS uses a technology called SR-IOV (Single Route I/O Virtualisation) which makes it possible to bypass some of the virtualisation layer and have your instance directly connects to the physical NIC of the infrastructure. Your instance needs to run a driver which knows how to use SR-IOV. Virtual function needs to run on the physical NIC on the AWS side when you turn on the enhanced networking mode. Enhanced networking reduces the jitter on the network which further reduces the latency.

VPC Peering

  • You can design your architecture in a way that shared services across your organisation would be based on VPC peering. Use cases are Authentication, Monitoring, Logging, Remote administration and Scanning
  • Provides Infrastructure Zoning by segregating Dev, Test and Production VPCs and you can have identical VPCs, can literally use the same IP space and have the same experience everywhere

VPC peering for VPC-to-VPC connectivity

  • You can use across accounts. You have to specify the peer owner account’s ID.
  • Using peering ID for handshaking to confirm the peering connection between 2 VPCs. Once it’s created, update route tables to say that any traffic in between goes through the peering connection. VPC peer as a route target!
  • Security groups not supported: You need to specify security group rules by IP Prefix
  • No transit capability for VPN, AWS Direct Connect, or 3rd VPCs. It cannot access VPC C from VPC A via VPC B. You need to create a direct peering and also Peer VPC address ranges cannot overlap but you can peer with 2+ VPCs that themselves overlap
  • You can only reference security groups only within the same region

Transit Gateway

  • Transit GWs work by attaching themselves to other network types - VPC, Site-to-Site VPN and Direct Connect GW.
  • VPC peers are not transitive but only one to one. You need to use Transit Gateway for that purpose with ‘attachments’. You can have multiple route tables in Transit Gateway.
  • Attachment: connection from VPC to TGW.
  • Association: Route table used to route packes from attachment to an AWS VPC or a VPN.
  • Propagation: Route table where the attachment’s routes are installed.
  • You need to configure DNS settings for your VPC to connect to the public DNS address of the Bastion host on a peering connection if you don’t have Internet Gateway.
  • VPC peering does not extend to Direct Connect or VPN connections.
  • Security Group Referencing on Amazon VPC is not supported at launch. Spoke Amazon VPCs cannot reference security groups in other spokes connected to the same AWS Transit Gateway
  • Transit Gateway: it can support 1.25Gbps VPN bandwidth, you can utilise multiple VPN connections to aggregate the bandwidth, 50Gbps per VPC connection.
  • Transit Gateway: It’s quite flexible product solving problems. It allows you to aggregrate VPN tunnels to increase the bandwidth.
  • Cross-region connectivity: You can peer two transit gateways and route traffic between them, which includes IPv4 and IPv6 traffic.

VPC Sharing

  • In replace for VPC peering, now an account can create a VPC with a CIDR range and share with other accounts while keeping the ownership of the VPC, i.e. the owner account would only be able to modify the VPC but the shared accounts can create resources

Global Accelerator

  • Taking CloudFront (Edge Locations) with an “Anycast IP (Single)” sitting at the edge. This IP can be advertised from many of CloudFront POPs/EdgeLocations. Whenever users send a request they will be routed to the local Edge Location via this IP. This is shared across regions! (Elastic IPs are Region Level). You can use Accelerator instead of Route 53.
  • Everything occurs within AWS Global Network consisting fibre networks. Less hops, performance improvement.
  • CloudFront specifically moves the content whereas Global Accelerator moves the network as quickly as possible using Anycast IP addresses.
  • CloudFront is caching and HTTP/S focused whereas GA can work with TCP/UDP. GA does not cache content or HTTP/S capability. It’s a network product.

Routing and Private Connections

Create a VPC which does not overlap with your existing network on premises

  • VPN Connection: Running a few APIs through AWS CLI, SDK or your tools to connect the gateway to the VPC. Define Customer gateway and create a connection between those two.
  • AWS Direct Connect: Either direct fiber connection or one of the AWS APN partners who provide circuit backhaul from Direct Connect locations. From AWS standpoint, it is again API driven. Tell AWS how much bandwidth you need and to where, you can create a private virtual interface between on premises and AWS. This is optional if you don’t need a private connectivity. Use this particularly for hybrid applications
  • Configuring Routing Tables: AWS route tables are highly available and redundant and abstracted, therefore there isn’t second route table in your VPC.

Remote Connectivity Best Practices

  • By default when you create a VPN connection, AWS gives you 2 IPSec tunnels. Use BGP for failure recovery. BGP routed tunnel is recommended for high availability. BGP is just a routing protocol used between AWS VPN router and yours. BGP provides a hard layer 4 connectivity check. If anything happens, BGP will know and failover.
  • It is better to configure two VPN connections in between and protect against failure. 4 IPSec tunnels. You can also configure if you wish to have single ingress path. If you need to have high confidentiality, you need to go with Direct Connect which is hard-wired/provisioned for your instances. Direct Connect with VPN backup provides more robust availability

VPC with Private and Public Connectivity

For public connectivity

  • You need to create an internet gateway and attach it to the VPC. Internet gateway is highly available, no throughput constraints, no latency, no single point of failure

For Hybrid App Connectivity

  • If you have an app customer facing, you would need a direct path to your VPC from outside. You can create an Internet Gateway for that purpose. You point your route to the Internet Gateway.

Automatic Route Propagation

  • Any kind of prefixes that AWS receives from your Data Center that the customer advertised to the routing protocol to BGP, they are populated in to the VPC route table. Let’s say you have a subnet in your home network, and you decide to add an IP block, you add that route to your corporate network, that would automatically populated in the VPC route table as advertised.

Isolating connectivity by subnet

  • If you don’t want an AWS subnet to have any kind of connectivity to home network, create a new route table and associate with the subnet. As soon as you associate with that subnet, any traffic coming from the instances from that subnet, will be routed according to that route table and will not use the default route table.

Software VPN for VPC-to-VPC Connectivity

  • Useful for connecting VPCs across regions. Software firewall to the Internet: NAT/Firewall functionality which provides routing all traffic from subnets to the Internet via a firewall is conceptually similar.

Site to Site VPN

  • VPNs use the internet for communication AWS provides fully secure tunnels for this between virtual private gateway and customer gateway on premises (within AWS, you configure the logical presentation of this).
  • Static Routing: tell either side of the VPN connection what subnets are available at both end manually. Static routes (none propagated) are priorities over dynamic routes.
  • Dynamic Routing: You exchange this information (IP addressing and subnets) via BGP (Border Gateway Protocol). You would use dynamic VPNs in production systems. You need to provide ASN, unique identifier at both sides.
  • AWS side (VGW ot TGW) Endpoint Outside IP and Customer Gateway (CGW) Outside IP refer to the encrypted IPSEC encrypted channel whereas AWS and CGW inside IP establish BGP dynamic routing and raw data travel through the inside of the tunnel.
  • VPN is quick to set up and they are cheap. There is a per/hour charge for the active connection and also data transfer out charge and more economical than Direct Connect. VPNs do use encryption. Performance is limited depending on the internet and compute power of the gateways. Generally, it’s easier to set up and use than Direct Connect. Data in to the AWS is free of charge but you would incur costs for data out. Data out rate is much cheaper in Direct Connect (+port hour charges) than the Internet. So, in some cases you may want to implement a site to site VPN over a Direct Connect connection.
  • Optimisation of VPN connections: create a relay VGWs on AWS connecting to your onpremises. Traffic is stopping the middle VPC, therefore costs cheaper. Better solution? TRANSIT GATEWAY
  • VPN speed limit is 1.25Gbps. All VPN connections connecting to the same VGW has the same limitation.
  • Accelerated Site-to-Site VPN: Normally IpSec tunnels transit over the Public Internet through a number of network hops which results as variable performance, latency & consistency. AWS Global Accelerated Network of Edge Locations are used with this. 2 IPs are allocated across all Edge Locations and the closest is used within the location of the customer. Acceleration can be enabled when creating a TGW VPN attachment.

AWS Client VPN

  • Client Endpoint VPN: You have a user with open VPN client. The tunnel between user and the endpoint is TLS, not IPSec. Endpoint will get an IP from the VPC, and from there you can connect to onpremises, public endpoints and even internet.
  • If you are connecting to AWS via an Client VPN Endpoint, it has to offer authentication (Active Directory, MFA) and authorisation (Network based or SGs) services to access resources.
  • You need to apply Authorisation Rules to allow connections to certain CIDRs on AWS via your Client VPN endpoint
  • When you associate a Subnet from your VPC with the Client VPN Endpoint, you place an Elastic Network Interface on the subnet, pick an IP address from the subnet and the traffic will look like it’s coming from the subnet itself.

NAT Instance & Gateway Best Practices

NAT Instances

  • When creating a NAT instance, disable source destination check on the Instance
  • NAT instance must be in a public subnet to have a connectivity to the outside world. There must be a route out of the private subnet to the NAT instance, in order for this to work
  • If you are bottlenecking about the amount of traffic NAT instance supports, increase the instance size
  • You can create high availability using Autoscaling groups, multiple subnets in different AZs and a script to automate failover NAT instance is always behind a security group

NAT Gateways

  • NAT Gateways are not truly resilient by design. You place it in a particular AZ and it provides hardware level resiliency in the AZ. You do have multiple hardware NAT gateway nodes. Implement in every AZ as well.
  • This is preferred by big organisations as it can scale automatically up to 10Gbps and there is no need to patch as it’s a managed services
  • It is not associated with security groups
  • Automatically assigned a public IP
  • Remember to update your route tables as per NAT Gateway
  • No need to disable source/destination check
  • NAT Gateway brings per-flow stateful NAT to VPC. Many instances trying to hide behind one IP

Network ACLs

  • Your VPC automatically comes with a default network ACL and by default it allows all outbound and inbound traffic
  • Each subnet in your VPC is automatically associated with the default network ACL if you don’t associate with your network ACL
  • You can associate an ACL with multiple subnets; however, a subnet can be associated only with one network ACL
  • Network ACL rules are evaluated in order! Separate inbound and outbound rules! And each rule can either allow or deny traffic
  • Network ACLs are stateless, responses to allowed inbound traffic are subject to the rules for outbound traffic
  • Block specific IP addresses using network ACLs, not Security Groups

NATs vs Bastion Hosts

  • NAT is used to route traffic from instances in a private subnet to reach out to internet. However, people from the internet cannot SSH into those instances in the private subnet to administrator
  • If you want to do that, use jump boxes. Jump boxes let you privately SSH into your bastion host and initiate another connection to your instances through the private network. Bastion is used only for administration purposes. Harden the bastion server OS!

High Availability NAT - Squid Proxy Problem

  • Standard NAT inside of VPC is confined to a single instance, which could fail. You also need to perform large puts and gets to Amazon S3 Solution
  • Run Squid in proxy configuration in an AutoScaling Group. That AS Group would scale on network I/O. Put a private proxy layer in front of this (private load balancer). And then when you boot these EC2 instances, you will bootstrap and proxy all their connections http/s out through that particular private load balancer. So if Squid proxy fails, you get many more other instances.

Availability pattern for private connections

  • Good one is to use BGP for failure recovery, tight coupling. Every VPN connection is treated as an IPSec tunnel. 2 AZs for example.
  • A better availability scenario is to have 2 VPN connections from 2 routers at the edge of your network from your DCs. 4 BGP and IPSec tunnels, protects against customer gateway failure.
  • Use of Direct Connect to improve consistency in network performance and VPN as a backup. 2 Direct Connect connections from two DCs etc. Direct Connect connections are direct connections so you need to create two pairs of them for resiliency

  • Automatic route propagation from VGW: Listening your routes and propagating through the network. Some customers use MPLS cloud and connect to the direct connect. They will dynamically advertise prefixes etc. from customer DCs. With route propagation, there is no extra work.
  • When configuring your customer gateway to connect to your VPC, Internet Key Exchange, the IKE Security Association is established between your customer gateway and the virtual private gateway using a set of pre-shared keys