Cloud Foundations
Backup and Disaster Recovery
Backup and disaster recovery practices should be designed around business capability, not only infrastructure. A backup that cannot be restored in time is not a control.
Recovery objectives
Define recovery targets for each critical capability.
- RTO: maximum acceptable recovery time.
- RPO: maximum acceptable data loss.
- MTD: maximum tolerable downtime.
- Degraded mode: what can continue with partial functionality?
Backup baseline
Every critical datastore or stateful system should define:
- Backup frequency and retention.
- Encryption and access controls.
- Cross-region or cross-account copy requirements.
- Restore procedure and owners.
- Monitoring for backup success and age.
- Evidence of periodic restore tests.
DR strategies
Common strategies include:
- Backup and restore for lower-cost recovery.
- Pilot light for minimal warm infrastructure.
- Warm standby for faster recovery.
- Active-active when the business need justifies complexity.
Choose the simplest strategy that meets the recovery objective.
Restore testing
Watchouts
- Untested restore procedures fail during real incidents.
- Backups can replicate corruption or accidental deletion.
- Access to backups must be restricted and audited.
- DR runbooks need decision owners, not just technical commands.
- Cross-region recovery may still depend on global identity, DNS, or deployment systems.