Dark-factory failure modes and maintenance agents
Where autonomous pipelines break. The headline risk is **evaluator overconfidence** — an evaluator that keeps approving subtly flawed code because the criter…
Where autonomous pipelines break. The headline risk is **evaluator overconfidence** — an evaluator that keeps approving subtly flawed code because the criter…
Where autonomous pipelines break. The headline risk is evaluator overconfidence
— an evaluator that keeps approving subtly flawed code because the criteria do not
catch the flaw — mitigated by running each scenario several times with a majority
threshold, an overall pass bar, a human audit of the first auto-merged changes, and
the fact that the existing CI/CD pipeline still runs after merge. Students also learn
to manage retry-cost blowup (hard attempt caps, token monitoring) and slow codebase
drift, the latter addressed with maintenance agents — background jobs that scan
for stale docs and inconsistent patterns and file small cleanup changes through the
same gate, a kind of garbage collection for the codebase. Connects directly to the
silent-failure and observability material in EB-6.