Incident Memory
Blameless postmortems that actually preserve learning
Training 010 · Core Practices
Time: 20–30 minutes
Core stance
Incidents are inevitable.
Forgetting why they happened is optional.
Incident memory is the practice of preserving causal understanding, not assigning fault.
Why this lesson exists
Many organizations run postmortems, yet still:
- Repeat the same failures
- Lose context after a few months
- Treat incidents as embarrassing anomalies
- Optimize reports for defensibility instead of learning
The problem is not the postmortem ritual.
It is the absence of memory continuity.
What incident memory is (and is not)
Incident memory is
- Causal, not narrative
- Durable beyond the people involved
- Accessible to future operators
- Explicit about assumptions and conditions
Incident memory is not
- A blame assignment
- A performance evaluation
- A legal defense memo
- A checklist exercise
Learning dies when incidents are treated as personal failures instead of system signals.
Why postmortems usually fail
Postmortems often fail because:
- They focus on timeline, not causality
- They stop at “human error”
- They are written once and never revisited
- They are stored but never retrieved
This creates the illusion of learning without its benefits.
The incident memory pattern
A continuity-safe incident memory answers five questions:
What failed?
(Observed behavior, not interpretation)Why did it fail?
(Causal chain, including system and context)What assumptions were wrong or stressed?
(What we believed that no longer holds)What changed as a result?
(Decisions, safeguards, boundaries)What would cause this to be revisited?
(Conditions, not dates)
If these are preserved, learning survives turnover.
Blameless does not mean consequence-free
Blameless means:
- We do not punish people for system failures
- We do not erase responsibility
- We do not avoid hard truths
Accountability remains—but it targets systems and decisions, not individuals.
Incident memory and AI
AI systems:
- Fail in non-obvious ways
- Mask causality with performance
- Scale small errors quickly
Without incident memory:
- AI mistakes repeat silently
- Confidence replaces understanding
- Oversight erodes
Incident memory creates:
- Explainable failure
- Safer iteration
- Defensible automation
Exercises
Drill 1 — Rewrite an Old Incident
Pick a past incident report.
Rewrite it to clearly answer:
- Why it failed
- What assumption broke
- What changed
Ignore the timeline if needed.
Drill 2 — Assumption Capture
During your next incident discussion, ask:
“What did we assume that turned out not to be true?”
Write that down explicitly.
Drill 3 — Memory Placement
Decide where incident memory should live so it is:
- Discoverable
- Trusted
- Revisitable
Move one incident there.
FAQ
Isn’t this just SRE practice?
SRE techniques are one implementation. Incident memory applies to all failures, not just outages.
Won’t this create legal risk?
In practice, clear causal understanding reduces repeated harm and exposure.
Who owns incident memory?
The incident owner captures it. Continuity ensures it persists.
Suggested next step
Take one recent incident.
Preserve its causal learning using the five-question pattern.
That single act prevents recurrence.
Next: Training 011 — AI Mandates & Boundaries
How to prevent silent scope expansion in automated systems.