Browse topics Hub · essay · articles · FAQ · glossary

Production traces → test cases

**Theory .** Closing the loop between observability and evaluation. Students learn to convert real production failures into permanent evaluation cases and to…

Theory . Closing the loop between observability and evaluation. Students learn
to convert real production failures into permanent evaluation cases and to enforce
quality gates in CI so regressions do not ship — testing across the distribution of
paths an agent actually takes rather than a single happy path.

Use cases . A regression caught in CI because a past production failure was
promoted to an eval; the difference between a happy-path-only suite and one built from
the real path distribution.

Practical exercises .

  • Concept-check: convert one production trace into a minimal eval case.
  • Applied: assemble a small eval set (5–8 cases) from a batch of traces and wire it
    as a CI gate that blocks merge on failure (provided harness + sample repo).

Sources