Tooling landscape — LangSmith, Langfuse, Phoenix, Braintrust

**Theory .** A practical survey of the observability and evaluation platforms, organised by what distinguishes them — framework coupling, evaluation depth, s…

Theory . A practical survey of the observability and evaluation platforms,
organised by what distinguishes them — framework coupling, evaluation depth,
self-hosting, and attribution (knowing which agent, model version, and cost produced a
given output). Students learn to choose based on the full debugging lifecycle, not just
trace inspection.

Use cases . Matching tool to need: a self-hosting-mandatory regulated team; a
team that needs cost attribution per agent; a team that wants tight LangGraph coupling
versus one that wants vendor-neutral OpenTelemetry.

Practical exercises .

Concept-check: given three team profiles, recommend a platform for each and say
why.
Applied: instrument the same sample agent on one chosen platform and produce a
one-screen dashboard of traces, cost, and failures.

Tooling landscape — LangSmith, Langfuse, Phoenix, Braintrust

Sources