Night Mode LabsBlue Book
Cloud Foundations

Backup and Disaster Recovery

Backup and disaster recovery practices should be designed around business capability, not only infrastructure. A backup that cannot be restored in time is not a control.

Recovery objectives

Define recovery targets for each critical capability.

  • RTO: maximum acceptable recovery time.
  • RPO: maximum acceptable data loss.
  • MTD: maximum tolerable downtime.
  • Degraded mode: what can continue with partial functionality?

Backup baseline

Every critical datastore or stateful system should define:

  • Backup frequency and retention.
  • Encryption and access controls.
  • Cross-region or cross-account copy requirements.
  • Restore procedure and owners.
  • Monitoring for backup success and age.
  • Evidence of periodic restore tests.

DR strategies

Common strategies include:

  • Backup and restore for lower-cost recovery.
  • Pilot light for minimal warm infrastructure.
  • Warm standby for faster recovery.
  • Active-active when the business need justifies complexity.

Choose the simplest strategy that meets the recovery objective.

Restore testing

Watchouts

  • Untested restore procedures fail during real incidents.
  • Backups can replicate corruption or accidental deletion.
  • Access to backups must be restricted and audited.
  • DR runbooks need decision owners, not just technical commands.
  • Cross-region recovery may still depend on global identity, DNS, or deployment systems.

On this page