CloudFront Deep Dive
Creating and Working with Distributions
- You would use RTMP distribution only when you use Adobe.
- You are able to specify multiple origin types within an origin group in order to achieve a failover domain.
- CloudFront is able to filter only valid HTTPS requests from the viewer connection between end users and Edge Locations before it fetches any content from the origin itself. It essentially has 2 different layers of Security, WAF and Edge Filtering.
- By default, there is a SSL certificate matching the CloudFront’s default domain name. If you are using custom domain name, you need to import a certificate from ACM or use your own.
- If you want your viewers to use HTTPS and to use alternate domain names for your files, choose one of the options for how CloudFront serves HTTPS requests:
- If you are using any browser which does not support SNI (Server Name Indication), then you need to add the additional support which allocates dedicated IP address at each Edge Location specifically to serve the content.
- Use Server Name Indication (SNI) supported browser and this is the recommended option
- Security Policy Values
- Recommended one is TLS1.1_2016 - great balance of compatibility and security.
- Any request matching a specific path (images/* .jpg) will be forwarded to the specific origin with behavior setting. Suppose you want to forward all the request to a given bucket to serve the .jpg files. One scenario would be to differentiate between static and dynamic content serving.
- It’s used to invalidate the cache bound to a time.
- EC2 instances or on-premises web servers that are publicly accessible. Edge Locations need to be able to contact these services.
- When you use S3, it introduces limitations, the only option you have is the origin path (bucket), restrict access or even introduce custom headers/values. When you use S3, the origion and viewer protocols will be the same which is HTTPS. When you change it to custom origin, it will change. You will be able to add TLS protocol for origin. Between Edge Location and origin, you can force it to be HTTPS or it can match the viewer protocol.
- There are 2 connections. Viewer which is between the end user and the Edge Location and the Origin connection which is between the Edge Locations and Origin server.
- Viewer: SSL or HTTPS: you need to make sure to edit distribution settings to use trusted & publicy issued certificate within Edge Location. You cannot use self-signed certificates or created internally by an authority.
- Origin: if S3, by default it is ok. If you use ELB, you can use ACM certificate. If you use EC2 or on-premises server, again you need to use the public certificates. Origin name and certificate domain name must match with each other.
- If you have an ALB with a custom domain and want to set up a distribution from the origin to the viewer using HTTPS, follow these steps. Use a certificate signed by a trusted third-party certificate authority in the ALB, which is then imported into ACM. Set the Viewer Protocol Policy to Match Viewer to support both HTTP and HTTPS in CloudFront then use an SSL/TLS certificate from a third-party certificate authority which was imported to either ACM or the IAM certificate store.
- HTTPS between viewers and CloudFront
- You can use certificate issued by a trusted certificate authority or 3rd party providers
- You can use a certificate provided by AWS Certificate Manager
- HTTPS between CloudFront and a custom origin
- If the origin is not an ELB load balancer, such as AWS EC2, the certificate must be issued by a Trusted CA or other 3rd party
- If the origin is ELB, you can use a ACM cert.
S3 Security - Origin Access Identity
- S3 origins allow you to restrict access to the origins via Edge Locations by using Origin Access Identity (OAI). This is a construct where you can deploy with Edge Locations and adding a S3 bucket policy to deny any other public access but only the OAI entity. This is actually a bucket policy item where the principal is the OAI ID and you are giving the permission.
- Signed URLs allow an entity (generally an application) to create a URL which includes the necessary information to provide the holder of that URL with read/write access to an object, even if they have no permissions on that object. Cookies extend this, allowing access to an object/area/folder and don’t need a specifically formatted URL.
- Customers get this signed URL via an API!
- You can enable this via Behavior Settings!
- Trusted Signer: You need to assume that the behavior is private inside a distribution whenever you see that there is a trusted signer, it can’t be used for public. Behaviors are private, so you can set specific to a path!
- Signed URLs or Cookies
- At a distribution level, you can enable either whitelist restriction types where you are allowing specific countries or blacklist where you are denying specific countries to access the distribution. That’s only based on the IP of the location.
- 3rd party geo-restriction: Based off the signed URL concept. If you set the default behavior to be private, customer can request to access an object via the browser, that’s passed to an open internet / DNS to an application of some sort (API Gateway, Lambda etc), it would check a 3rd party geo-location source or it might use session information, account, browser etc. etc. If you want to deny, you can redirect the end user’s request via an error code. However, if you want to allow access, the app can generate the signed URLs. This is useful for any additional checks except IP of the location.
- This allows you to define and allocate a public key to cloudfront distribution. Encrypted from Edge Location all the way down. You can use this for personal identity, health, payment etc.
- Edge Locations are distributed globally in densely populated cities.
- You can also change the caching behaviour based on selected request headers.
- Cache hit: if the client request does not have to go back to the origin and receives a response from the edge location or the regional cache, then it’s a cache hit.
- Cache miss: It’s the opposite of above, when the request needs to fetch the data from the origin.
- You need to make sure the cache hit/miss ration needs to be optimised to get the best performance. You can edit the minimum, maximum and the default TTL values for the caches in distribution settings. The origin can specify the TTL value in the header for a specific data to be cached. Origins don’t supply any directives to CloudFront. 86400 seconds is 1 day and it’s the default. If you have got a large static media collection which does not change over time, you can increase the default TTL.
- Query Strings: CloudFront understands that you actually are requesting two different objects when you try to browse a website and specify if you want to view it in English or Frency by adding the query strings www.catpictures.com/index.html/?lang=fr or ?lang=en. The origin could response with two actual different objects. CloudFront needs to be aware that these could be two different objects. You can configure this in distribution settings, by default it’s configured as ‘none’. If you change it to forward the strings to the origin or cache, it will improve the performance. You can use query string whitelist to influence the caching behavior for strings that are in common for example in this particular case the request to view a page in certain language and you would want to whitelist the ‘language’ parameter in the URL. Whereas if you are just interacting with your application to request a particular data belonging to an ID, you would then forward it to the origin but not cache.
- You define functions outside of CloudFront and pushed out to Edge. You can trigger the function based on specific events of CloudFront such as Viewer Request/Response, Origin Request/Response and Origin Fetch. Function play with the data between the client and edge location or between the edge location and the origin.
- Use Cases
- You might utilise lambda@edge to inspect cookies coming from the client to the origin server to potentially deliver different versions of the response. Examples are A/B testing, different layouts of the website, different page format based on the location, quality of the images based on the device clients use.
Logging, Reporting and Monitoring
- Reporting & Analytics: Number of total requests, % of viewer requests, cache hit/miss and etc. All of these metrics per individual distribution. You can also download as .csv file as well. You can breakdown viewers based on device, OS, browsers, locations etc. Number of HTTP/S requests, location based. You can also filter by popular objects. You are able to create CloudWatch alarms based on metrics.
- Access Logging: You can enable this distributed logging and store in a S3 bucket with prefix settings, accessible for both web and rtmp distributions. Log files are quite detailed for audits or they can be used for event-driven systems.