AWS Cloud Backup and Disaster Recovery with Partner Solutions

  • Backups are not archive, just a copy of the data point in time. They are searchable and in a catalog
  • Archived data is the only authoritative copy of the data

Hybrid backup to the Cloud

  • Backup data flows from on-premises into AWS object storage services
  • Customer trust is gained
  • Easy to scale out - elasticity! Easier to scale back but if you scale up hard to scale down to a smaller instance size
  • Failover: RDS endpoint - DNS name! AWS take control of failing that DNS name over from primary to secondary. You need to set up connection string, that’s it.
  • RDS backs up from your secondary which avoids I/O suspension to the primary
  • You can force a failover from one AZ to another by rebooting your instance
  • Read Replica is just the read only copies of your database. Use cases?
  • Scaling beyond the compute or I/O capacity of a single DB instance for read-heavy database workloads. This excess read traffic can be directed to one or more Read Replicas
  • Serving read traffic while the source DB instance is unavailable. If your source DB instance cannot take I/O requests, you can direct traffic to RR
  • Business reporting or data warehousing scenarios: you may want to run your report queries to run against a RR, rather than your primary back, production DB
  • Multi-AZ is important: if enabled, there is no I/O impact when taking a snapshot because it’s done from the secondary.
  • Up to 5 RR for mysql and postgres. Only in mysql that you can have RR in different regions
  • Replication is asynchronous. No need to wait for an ack that it’s done. No need for two way handshake.
  • RR can be built off Multi-AZ but RR themselves cannot be Multi-AZ
  • Key metric is the Replica Lag to look at the Cloudwatch.
  • Service that you can log as root are Elastic Beanstalk, Elastic MapReduce, OpsWork and EC2. Services that you cannot: RDS, DynamoDB, S3 and Glacier
  • Pre-Warming the ELB: AWS staff can preconfigure the ELB to have the appropriate level of capacity based on expected traffic. Used in certain scenarios, such as when flash traffic is expected, or in the case where a load test cannot be configured to gradually increase traffic