Cost Optimisation on AWS

Use Reserved Instances for Stable Workloads: First run your infrastructure for 3 months and see the patterns
Use Consolidated Billing: gives you a lower price at the end
Economy of Architecture
- Trial and change during the lifetime of the system
- Radical changes are possible - driven by economics
- Transaction cost + operational cost
- What is your per transaction cost? Cost per query, user, processing unit? Do you know it and track it?
- Same or better outcome for the lowest cost
Operational Optimization
- System Administrator and DBA cost
- What is the admin effort for a minor DB version update?
- Self-Managed
- Back up primary and secondary server
- Backup server OS
- Assemble upgrade binaries
- Create change record
- Create rollback plan
- Rehearse in development
- Run against staging
- Run against production standby
- Verify and Failover
- RDS Managed - Admin effort
- Verify update windows
- Create notification change record
- Verify success in staging and production
Infrastructural Optimization
- Use AWS S3 Storage Tiers
- Take advantage of Infrequent Access
- Activate a policy: move data after 30 days…
- Revisit and Right-Size EC2
- Families change
- Workloads change
- Spot Instances
- 2 AutoScaling group behind an ELB, one with On-Demand and the other Spot fleet.
- Set your Spot price > Market < On-Demand will always give you the market rate at or under your bid
- Cloud Watch alarm on number of Spot Instances in a group. (GroupInServiceInstances)
- Scale On-Demand group if #Instances in Spot group drops below threshold
- Cache for Savings! Read-intensive workloads in particular
Architectural Optimization
- Code changes
- Architectural Tradeoffs
- S3 as a Static Web Site: actually it can run JavaScript
- Cost avoidance
- Web Server patching
- Capacity planning
- Security Scanning
- Content rollouts/updates
- Queues
- Not just for decoupling
- SQS Trigger AutoScaling Groups Based on Revenue: Let’s say you have a mobile app that takes photos and stored in S3. In this example, a single m4.large can process 1000 images in an hour. We don’t want paid customers to wait more than 10 minutes for their results. So you create two ASGs with two different SQS queues, one with free one with paid customers. If the messages visible in free queue is more than 1000, you scale up to 1 max. If you are a paid customers, it would use a different metric for the ASG, it’s the approximateageofoldestmessage, tells you how aged your queue, if it’s more than 600 (which is 10 minutes), it will scale up.
Simplify by Less Components and Less Cost
- For example; each transaction takes 3750ms. Assuming largest Lambda memory size (1536MB)
- S3 events with 2 buckets, image source, image result. Do an S3 event to trigger a lambda function whenever a picture is uploaded to the source image bucket and then the result in image result bucket. You can halve the TXN cost!
- You can even still keep using SQS queue for the “free” tier customers with ASG behind it with spinning up Spot instances to further reduce cost and Lambda function with the paid tier.
Database Economic Architecture
- Not everything is just relational or just NoSQL
- Caches are usually more efficient and cost effective
- Don’t conflate transactional DBs with analytical DBs
- You will typically have “hot spot” tables or datasets - look to put these in NoSQL or Cache storage
- Example: 10.000 IOPS (%80 READ and %20 WRITE)
- 3TB Capacity
- Multi-AZ MySQL RDS => RDS: $4.6K vs DynamoDB: 2.8K
- Immediate benefits

7 STEPS TO REDUCE COST IN AWS

1) Turn off unused instances

Developer, test, training instances
Instance start/stop
Tear down and build up all together
Instances are disposable
Automate as much as you can

2) Use Auto-Scaling

Launch Configuration with CloudFormation
Align your resources with demand
Use for less steep traffic increase
Scale up fast, scale down slow (you pay per the hour)
Set triggers: CPU, network, memory
Avoid multiple scaling triggers per Auto Scaling group

3) Reserved Instances

Pay in advance for long term usage when you predict the workload and save cost and RI instances are flexible
When you purchase, you are able to specify AZ or the Region. If you select AZ, the reservation will only be valid for that AZ. If you pick a region, it will be crossed all the AZs. When you pick the specific AZ, you also reserve the capacity and you get high priority instance start for each AZ you specified. However, if you only select Region, you will only benefit from cost saving.

4) Spot Instances

So many unused instances within AWS Infrastructure
You set the maximum price you would like to pay. Your instance is started if the spot price is lower

5) AWS S3 Reduced Redundancy / Infrequent Access

Also AWS Glacier storage class
Use life-cycle rules to automatically move objects that are old enough to Glacier

6) Optimize Amazon DynamoDB capacity units

How much capacity you need and when you expect peak?
Read/write capacity units determine the cost of DynamoDB
Use caching to save read capacity units. (Local ram on your machines to cache stuff locally)
Multiple tables to support varied access patterns. Use compression for large attribute values
Understand access patterns for time series data
Use SQS to buffer requests for DynamoDB. In front of the write request of your DynamoDB, SQS works to schedule
DynamoDB: AutoScaling for DynamoDB capacity units. Automatically adjusts capacity units over time as it measures your load and calculates the right amount of capacity units

7) Offload your architecture

Use Amazon Cloudfront: offload some of the traffic
Introduce caching: Useful for databases. Offload databases with caching
Leverage existing AWS
If data is transferred between instances in different AZs, each instance is charged for its data in and data out
You won’t be billed for stopped EC2 instances but S3 storage or EBS you will.
Cost allocation tags: have to be enabled individually.
There are AWS-Generated ones e.g. aws:createdBy or aws:cloudformation:stack-name. These are added to resources after they are enabled by AWS.
There are user-defined tags such as user:something and you can use them within Cost reporting. These can take up to 24hr.