Deep Dive on AWS RDS

Why would you use self-managed DBs on EC2 instances?

  • If you want to install a type of DB which is not managed and supported by AWS. You have root level of access and full control over all OS and DB components.
  • Limitations: only single EC2 instance on a single AZ. Based on that, you don’t have any high availability. If AZ fails, you need to take the EBS volume and snapshot it to S3 but you won’t have active/passive failover capability. There will be an admin overhead as well for backup, DR etc. issues.

How do I decide between GP2 and IO1 instance type for RDS?

  • GP2 is a great choice but be aware of the burst credits on volumes < 1TB
  • Hitting credit-depletion results in IOPS drop - latency and queue depth metrics will spike until credits are replenished
  • Monitor BurstBalance to see percent of burst-bucket I/O
  • Think of GP2 burst rate and IOPS stated rate as maximum I/O rates. More IOPS are not necessarily better, it needs to be optimised

What happens during a Multi-AZ failover?

  • Primary and Standby EC2 instances that are running your Relational Database replicates physically the storage blocks. It’s a synchronous replication.
  • 3rd party is monitoring these primary and secondary instances. Once that observer loses connectivity between these two. Then initiates the failover and secondary becomes the primary. You put a new entry to the DNS table so when your application is disconnected, queries the DNS again and reconnect to the new primary. If you are caching data, you would need to set the TTL value as low as possible because of failover scenarios
  • Standby can’t be directly used! It cannot be used for performance benefits, just an availability improvement
  • Failover takes 60-120 seconds and it’s within the same region only.
  • Backups can be taken from Standby - which enhances performance.

Why would I use Read Replica?

  • Primary goal is to relieve pressure on your primary database. Offload app’s read heavy workloads to ReadReplicas as most of the apps are read-heavy.
  • You can use cross-region Read Replica.
  • You can also use it for Disaster Recovery purposes for global resilience.
  • You can also upgrade the Read Replica to a new engine version in case you don’t wanna impact your primary database. You can run tests
  • CloudWatch metric for asynchronous latency/lag = ReplicaLag. You would get up to a minute of lag within MySQL but within AWS Aurora, it would be minimal. Any log can be exported to CloudWatch.
  • You can have 5x direct RR per DB instance.

Backups in RDS

  • Two types: Automatic Backups and Manual Snapshots
  • Default retention period of automated backups is 7 days. If you created from CLI or API is 1 day, and up to 35 days maximum.
  • Transaction logs are stored every 5 minutes in Amazon S3 to support point-in-time recovery (PITR). Then ship those logs to S3. Back to the point where you want to! This is as part of automatic backup window. This is the actual operational data. This means 5 min RPO!!!
  • Amazon RDS backups leverage Amazon EBS Snapshots stored in S3 managed by RDS
  • When should I use automated backups as opposed to snapshots?
  • If you delete a DB instance, you have the option to preserve the backups but they do have a retention period as always.
Automated Backups Manual Snapshots
Specify backup retention window per instance (7day default) Manually created through AWS console, cli or RDS API
Deleted after window Always there
Supports PITR Restores to snapshot
Good for disaster recovery Use for testing, final copy before deleting a database, non-prod/test environments

How do I restore a backup?

  • Restoring creates an entirely new database instance with a new endpoint name while the old one is running.
  • New volumes are hydrated from AWS S3. Making the volume initially ready for the database. Blocks are pulled in from S3 when they are needed. Restoring can be slow until some of those blocks come in.
  • You can use a larger instance for that initial restore.

Securing your AWS RDS?

  • Secure by default. Network isolation with VPC. AWS IAM based resource-level permission controls. Encryption at rest using KMS or Oracle/Microsoft TDE. Use SSL protection for data in transit. Encryption handled within RDS DB engine.
  • How can I save money on my RDS Database?
  • AWS Reserved Instances up to %60 discount for 4 years etc. It’s just a billing commitment, not literally a reserved instance for you
  • Size flexibility: if you are running a r4.large and you wanna scale up to r4.xlarge, AWS count the Reserved Instances for the area for large against usage of the larger type. RI flexibility to get better RI utilisation
  • RI Utilisation report: of the RIs purchased, how many are being utilized and how much so this works for RDS together with EC2!
  • You can also start/stop a database, while it’s not running, you only pay for the storage!!! Now in single AZ DBs applicable
  • You can associate a policy to a user or roles that maps that IAM identity onto the local RDS user. That’s via generate-db-auth-token with 15 min validity which can be used in place of a DB user password. Note that this is only for authentication. Authorisation is controlled by the DB Engine itself.

Scalability of RDS

  • Either you increase the instance size for compute purposes
  • Or increase the storage attached to the RDS instantiation
  • You can increase RDS’ size but cannot decrease. Decrease would need a defragmentation etc. not a software would be able to figure out


  • Be aware that there are different types of licensing model when you configure the DB engine. You can also select different major or minor DB versions depending on the app requirements.
  • Instance types do mirror the traditional instance types.
  • Parameter and Option Groups: Ways to configure your databases instances. Lower level DB options that you would normally have to be a root to change those parameters.
  • You can always adjust the configuration after you create the DB, even maintenance and backup windows.

Aurora - RDS Engine

  • Compatible to MySQL and PostgreSQL and massively improved performance.
  • You have got primary and replicas. Primary performs read/write operations. AWS intends to do master/master configuration. Replica instances can perform reads from the shared storage volume. It’s not RR as in RDS, they do actually access the same storage volume. That’s why performance is increased.
  • You can store up to 64TB but you are only billed on how much storage you use as opposed to RDS.
  • With Aurora, you are able to define Replicas that occupy the same region and you can use MySQL cross-region read replication.
  • Cluster Endpoint Points to primary for read/write operations.
  • Reader Endpoint Directing your app to talk to all of the RRs as a pool to scale among the cluster.
  • The Aurora DB instance cluster share the same underlying storage platform, common cluster storage volume across the entire set of AZs. DB servers don’t need to do the replication in between them.
  • You can enable IAM authentication.
  • Backtrack: This window determines how far back in time you could go. Aurora tries to retain enough log info to support that window of time.
  • Fast Clone: It can create a new DB based on the existing cluster storage volume based on the differences between clone and primary cluster.
  • Backup: It’s continuous and incremental to S3. There is no performance impact at all, you can restore to any point of time!
  • Backtrack: Rollback to a point before a corruption occurs, using the same cluster.
  • Parallel Query: Optimisation feature! Parallelises a lot of compute and IO involved in intensive queries.
  • AutoScaling: You can scale up and down the Aurora Read Replica Cluster based on average CPU utilisation or average number of connections
  • Aurora Global Databases: Extension of an idea of cross-region read replica.
  • Aurora Serverless: Limitation: still based on the same architecture with infrastructure / server based architecture. Consider you might be invoking hundreds of Lambda functions per second and they had to initiate a MySQL connection, it could impact the performance.
    • ACU - Aurora Capacity Units with min and max ACU. Cluster automatically adjusts based on load.
    • It removes the server need. It introduces the concept of Data API which allows you to access the DB over standard APIs. You don’t need to worry about the size of the instances but you need to define an Aurora Capacity Unit (1vCPU and 2GB RAM). You need to define the min and max value of the units.
    • Pausing of the cluster: after certain time, you can remove all of the allocated compute in order not to get billed but only for storage. There will be a snapshot ready to be automatically re-create the cluster.
    • There is a shared proxy layer that accept from your users/apps and brokers the connections to forward to the pool of warm instances.
    • It’s only in one AZ! There will be failover time.
  • Multi-Master: All of the instances are capable of Read/Write operations. There is no cluster endpoint exist, no load-balancer. App connects to one or all of the instances inside this cluster. Any write is committed to all of the storage nodes in the cluster. Each instance either confirms and rejects the proposed change/write.
    • Changes are replicated to other master nodes. They update in-memory cache storage as well.

AWS Athena

  • Athena is a serverless product that is capable of querying many structure, semi-structured and unstructured data formats stored in SQL. It allows you to define a schema with a set of table definitions over the top of data in S3. Whenever you run a query, Athena overlays the table definitions over the top of the data on S3 and gives you access to that data using the schema what’s so called ‘schema on read’ technology.
  • You pay only for the amount of data you process based on the queries and storage on S3. There is no need for ETL.
  • The more the size of the data on S3, Athena is quick to query without ETL and without any servers defined. A really use case is to read bulk of data.
  • Athena can be used to analyse CloudTrail, CloudFront, All types of Load Balancers and Amazon VPC Flow logs.
  • Athena can perform Federated Query on other data sources.
  • AWS Glue Data Catalog & Web Server Logs.
  • Querying AWS logs - VPC Flow logs, CloudTrail, ELB logs, cost reports etc.
  • Serverless cost conscious.