Braintrust

Name: Braintrust
Brand: Braintrust

DEVOPS

Velocity0.0

Braintrust is making LLM observability painless to adopt — auto-instrumentation across every major language.

llm-observabilityauto-instrumentationagent-tracesevalsdeveloper-experience

◆Current state

Braintrust's recent run is dominated by zero-code instrumentation work: Python, Ruby, Go, and TypeScript all gained auto-instrumentation, and topics automatically classify logs without manual schema work. The product is also deepening agent-tooling integrations with Claude Code and Temporal, and adding operational features like trace translation, member session history, and dataset tagging. Monthly SDK releases continue with steady model-coverage updates.

◆Where it's heading

The trajectory is unambiguous: Braintrust is making LLM evals and observability frictionless to start with — drop a SDK, get traces — and then deeper to live in for engineers running multi-step agents. Auto-instrumentation across four languages plus structured topic-classification of logs lowers the start-up cost. The Claude Code and Temporal integrations show Braintrust is positioning to observe long-running agentic workflows specifically, not just one-shot chat completions.

◆Prediction

Expect more agent-framework integrations (LangGraph, CrewAI, OpenAI Agents SDK if not already covered) and richer agent-aware UI — span trees that group reasoning steps, replay-from-step, automatic eval generation from production traces. The member-activity work hints at SOC 2/enterprise compliance pressure that will shape additional governance features.

◆Recent moves

1mo ago
Translate message content in traces
Trace messages can now be translated in-place across English, Spanish, French, German, Japanese, and others. Useful for debugging multilingual agents — a quietly important feature for teams shipping AI to international customers.
View source ↗
2mo ago
Member activity and session history
Member activity (last-active, IP, location, browser, OS) and session history land for organization owners. Combined with new dataset tagging and starring, these are governance features aimed at larger customers.
View source ↗
3mo ago
TypeScript auto-instrumentation
⚡ SPARK
TypeScript auto-instrumentation lands alongside Topics — automated classification of logs to surface patterns without manual schemas. Together they cut the on-ramp to LLM observability dramatically and represent the most directional move in the recent run.
View source ↗
4mo ago
Auto-instrumentation for Python, Ruby, and Go
⚡ SPARK
Auto-instrumentation arrives for Python, Ruby, and Go simultaneously, alongside a Temporal integration for durable execution. This is the foundation release the subsequent TypeScript auto-instrumentation builds on, and it directly enables the trajectory of frictionless adoption.
View source ↗
5mo ago
Claude Code integration
Claude Code integration: sessions get traced automatically, with Claude able to query logs and fetch experiment results. Closes a useful loop for teams that already use both products and points to where Braintrust sees agentic-coding workflows landing.
View source ↗
6mo ago
Python SDK 0.3.8: experiments page, trace timeline, dataset schemas
Python SDK 0.3.8 bundles experiment list page upgrades, trace timeline UI work, and dataset schema visuals. Routine SDK release shape — small but in-line with the product's monthly cadence.
View source ↗

Braintrust is making LLM observability painless to adopt — auto-instrumentation across every major language.

◆Recent moves

​Translate message content in traces

​Member activity and session history

​TypeScript auto-instrumentation

​Auto-instrumentation for Python, Ruby, and Go

​Claude Code integration

Python SDK 0.3.8: experiments page, trace timeline, dataset schemas

Translate message content in traces

Member activity and session history

TypeScript auto-instrumentation

Auto-instrumentation for Python, Ruby, and Go

Claude Code integration