Checklists
Incident Review Checklist
Use this checklist after incidents to turn response experience into system improvement. The review should focus on learning and prevention, not blame.
Facts
- Incident start and end times are recorded.
- Impacted users, services, and regions are identified.
- Detection source is documented.
- Timeline includes key decisions and mitigations.
- Recent changes are reviewed.
Response
- Incident roles were assigned or gaps are noted.
- Escalation path worked or gaps are captured.
- Communications were timely or gaps are captured.
- Runbooks helped or missing steps are identified.
- Access issues are recorded.
Technical learning
- Root causes and contributing factors are documented.
- Missing alerts or noisy alerts are identified.
- Dashboard, log, or trace gaps are captured.
- Rollback, restore, or mitigation gaps are captured.
- Dependency failure behavior is understood.
Follow-up
- Action items have owners and due dates.
- High-risk actions are prioritized.
- Accepted risks are explicit.
- Service catalog, runbooks, and docs are updated.
- Follow-up review date is scheduled.