← Back to all sparks
B

Braintrust

DEVOPS
Velocity0.0

Braintrust is making LLM observability painless to adopt — auto-instrumentation across every major language.

llm-observabilityauto-instrumentationagent-tracesevalsdeveloper-experience
Current state
Braintrust's recent run is dominated by zero-code instrumentation work: Python, Ruby, Go, and TypeScript all gained auto-instrumentation, and topics automatically classify logs without manual schema work. The product is also deepening agent-tooling integrations with Claude Code and Temporal, and adding operational features like trace translation, member session history, and dataset tagging. Monthly SDK releases continue with steady model-coverage updates.
Where it's heading
The trajectory is unambiguous: Braintrust is making LLM evals and observability frictionless to start with — drop a SDK, get traces — and then deeper to live in for engineers running multi-step agents. Auto-instrumentation across four languages plus structured topic-classification of logs lowers the start-up cost. The Claude Code and Temporal integrations show Braintrust is positioning to observe long-running agentic workflows specifically, not just one-shot chat completions.
Prediction
Expect more agent-framework integrations (LangGraph, CrewAI, OpenAI Agents SDK if not already covered) and richer agent-aware UI — span trees that group reasoning steps, replay-from-step, automatic eval generation from production traces. The member-activity work hints at SOC 2/enterprise compliance pressure that will shape additional governance features.

Recent moves

  1. 1mo ago

    ​Translate message content in traces

    Trace messages can now be translated in-place across English, Spanish, French, German, Japanese, and others. Useful for debugging multilingual agents — a quietly important feature for teams shipping AI to international customers.

    View source ↗
  2. 2mo ago

    ​Member activity and session history

    Member activity (last-active, IP, location, browser, OS) and session history land for organization owners. Combined with new dataset tagging and starring, these are governance features aimed at larger customers.

    View source ↗
  3. 3mo ago

    ​TypeScript auto-instrumentation

    ⚡ SPARK

    TypeScript auto-instrumentation lands alongside Topics — automated classification of logs to surface patterns without manual schemas. Together they cut the on-ramp to LLM observability dramatically and represent the most directional move in the recent run.

    View source ↗
  4. 4mo ago

    ​Auto-instrumentation for Python, Ruby, and Go

    ⚡ SPARK

    Auto-instrumentation arrives for Python, Ruby, and Go simultaneously, alongside a Temporal integration for durable execution. This is the foundation release the subsequent TypeScript auto-instrumentation builds on, and it directly enables the trajectory of frictionless adoption.

    View source ↗
  5. 5mo ago

    ​Claude Code integration

    Claude Code integration: sessions get traced automatically, with Claude able to query logs and fetch experiment results. Closes a useful loop for teams that already use both products and points to where Braintrust sees agentic-coding workflows landing.

    View source ↗
  6. 6mo ago

    Python SDK 0.3.8: experiments page, trace timeline, dataset schemas

    Python SDK 0.3.8 bundles experiment list page upgrades, trace timeline UI work, and dataset schema visuals. Routine SDK release shape — small but in-line with the product's monthly cadence.

    View source ↗