Deep Dive on S3
2 ways to access S3 objects:
- Set proper public readable credentials and ACLs
- Mounting it into an EC2 with an FTP service
Data read/write consistency for S3
- Read after write consistency for PUTS of new objects
- Eventual consistency for overwrite PUTS and DELETE (can take some time to propagate)
S3 is a key value store (name of it) and value is the actual data
- If your requests to S3 are typically a mix of GET, PUT, DELETE, or GET Bucket (list objects), choosing appropriate key names for your objects will ensure better performance by providing low-latency access to the Amazon S3 index. It will also ensure scalability regardless of the number of requests you send per second.
- Workloads that are GET-intensive – If the bulk of your workload consists of GET requests, Amazon CloudFront is the recommended content delivery service.
- Metadata is not searchable in S3. It’s just the attribute. For searching, DynamoDB is recommended
How do you select the storage class for your use case?
- S3 Standard for Big Data Analysis, Content Distribution, Static Website Hosting
- Standard Infrequent Access for Backup & Archive, Disaster Recovery, File sync & share, Long-retained data
- Glacier for Long Term Archive, Digital preservation, magnetic tape replacement
- S3 Analytics works like an access pattern of your data and you can visualize on it. S3 also measures data age when it becomes infrequently accessed. Then you can also apply the lifecycle policy on the data
Automate Data Management Policies
- Lifecycle policy: Transition data to different storage classes. Expiration; delete objects after specified time. You can set policies based on bucket, prefix or even tags
Protect Your Data from Accidental Delete
- Versioning: Protects from unintended deletes or application logic failures. Every upload is created as a new version of the object.
AWS S3 Event Notifications
- Automate with trigger-based workflow. You can set up event notifications when objects are created via PUT, POST, COPY, Multipart Upload, or DELETE.
- You can do by filtering on prefixes and suffixes
- Publish push notification to SNS, SQS queue (worker fleet asynchronously) or Lambda function based on these events
Cross-region Replication in S3
- Many reasons to replicate data across regions => compliance reasons to put your data in different regions, enhance security by replicating, take advantage of spot instance pricing, low-latency access etc.
- You put a policy that it would tell S3 the destination region and bucket. Automatically and asynchronously replicated to the destination bucket. You can choose to replicate full bucket or prefixes. Or even to a different type of storage class in the replication policy
S3 Performance Optimisation
- S3 Transfer Acceleration: you may have customers uploading content to a centralised bucket and transfer large amount of data frequently. This leverages the AWS Edge Network that would automatically route your data to the closest endpoint/edge network. Travels the shortest distance in the public internet. TCP/HTTP protocol is used, no client software or anything. All you need to do is to enable S3 Transfer Acceleration
Parallelize PUTs with multipart uploads
- Allow you to put large objects within smaller parts. Parallelize the parts to get the most out of network bandwidth. You can also parallelize GETs too. For large objects, use range-based GETs to align your get ranges with your parts.
Higher Transaction per Second (TPS) by Distributing Key Names
- Use a key naming scheme with randomness at the beginning for high TPS. Most important if you regularly exceed 100TPS on a bucket. Avoid starting with a date or monotonically increasing numbers when defining key names
- S3 tags are key value pairs. You can set up IAM and Lifecycle policies with S3 tags, also metrics
Audit and Monitor Access - AWS Cloud Trail Data Events Use Cases
- Perform security analysis
- Meet your IT auditing and compliance needs
- Take immediate action on activity How it works
- Capture S3 object-level requests
- Enable at the bucket level
- Logs delivered to your S3 bucket
Monitor performance and operation
Amazon CloudWatch metrics for S3 generates metrics for data of your choice. You can create an alert based on an alarm
Amazon S3 doesn’t get configured as a Trusted Signer for CloudFront. A Trusted Signer is an AWS account with a CloudFront key pair. The CloudFront Behavior is then instructed to let that key pair create signed URLs.