Diagnostic loop — trace → cluster → root cause → eval

**Theory .** A repeatable workflow for turning production failures into improvements: collect traces, cluster them into failure patterns, find root causes, a…

Theory . A repeatable workflow for turning production failures into
improvements: collect traces, cluster them into failure patterns, find root causes, and
generate evaluation cases from real failures. Students learn that production is where
you discover what to test for offline, and that traces become test cases.

Use cases . A team that cut a recurring class of agent errors by clustering a
week of traces and discovering a single bad prompt instruction; a flaky-tool pattern
surfaced only once traces were grouped rather than read one by one.

Practical exercises .

Concept-check: given ~20 mock traces, cluster them into 3–4 failure patterns and
name the likely root cause of each.
Applied: take one clustered failure and write a concrete evaluation case (input,
expected behaviour, pass/fail check) that would catch a regression of it.

Diagnostic loop — trace → cluster → root cause → eval

Sources