Block capstone — put into practice
**Brief.** Take a provided agent that has a known-but-undisclosed silent bug. Instrument it with tracing, run a batch of tasks, read the traces to locate the…
**Brief.** Take a provided agent that has a known-but-undisclosed silent bug. Instrument it with tracing, run a batch of tasks, read the traces to locate the…
Brief. Take a provided agent that has a known-but-undisclosed silent bug. Instrument
it with tracing, run a batch of tasks, read the traces to locate the failure, classify
it against the six failure modes, find the root cause, and convert it into a permanent
eval case wired as a CI quality gate that blocks the regression. Deliver a short
write-up: what the trace showed, which failure mode, the root cause, and the gate you
added.
Tier scaling. Basic version: one task, one obvious failure, concept-level trace
reading and a single eval case. Basic+: a small trace batch and the full
trace→cluster→root-cause→eval loop. Basic++ / Production: a realistic trace corpus, a
multi-case eval set, a CI gate on a sample repo, and a chosen tooling platform — this
is the same exercise that appears as the EB-6 Use case in the Production track
(Section 4.3).