Backup and Disaster Recovery

Backup and disaster recovery practices should be designed around business capability, not only infrastructure. A backup that cannot be restored in time is not a control.

Recovery objectives

Define recovery targets for each critical capability.

RTO: maximum acceptable recovery time.
RPO: maximum acceptable data loss.
MTD: maximum tolerable downtime.
Degraded mode: what can continue with partial functionality?

Backup baseline

Every critical datastore or stateful system should define:

Backup frequency and retention.
Encryption and access controls.
Cross-region or cross-account copy requirements.
Restore procedure and owners.
Monitoring for backup success and age.
Evidence of periodic restore tests.

DR strategies

Common strategies include:

Backup and restore for lower-cost recovery.
Pilot light for minimal warm infrastructure.
Warm standby for faster recovery.
Active-active when the business need justifies complexity.

Choose the simplest strategy that meets the recovery objective.

Restore testing

Watchouts

Untested restore procedures fail during real incidents.
Backups can replicate corruption or accidental deletion.
Access to backups must be restricted and audited.
DR runbooks need decision owners, not just technical commands.
Cross-region recovery may still depend on global identity, DNS, or deployment systems.