Home Posts Agentic Observability Platforms [2026 Cheat Sheet]
Developer Reference

Agentic Observability Platforms [2026 Cheat Sheet]

Agentic Observability Platforms [2026 Cheat Sheet]
Dillip Chowdary
Dillip Chowdary
Tech Entrepreneur & Innovator · May 08, 2026 · 13 min read

Bottom Line

All three platforms now cover tracing, evaluation, and prompt iteration, so the real decision is workflow fit: LangSmith is the most integrated app-dev stack, Phoenix is the most OTEL-native and self-hosting-friendly, and W&B Weave is the cleanest bridge for teams already living in W&B.

Key Takeaways

  • As of May 08, 2026, all three platforms support tracing plus evaluation workflows.
  • Phoenix stands out for OTEL/OpenInference-first instrumentation and open-source self-hosting.
  • LangSmith has the tightest built-in loop across tracing, datasets, prompts, and experiments.
  • W&B Weave is strongest when your org already uses W&B projects, entities, and evaluation habits.
  • If trace payloads may include secrets or PII, sanitize them before ingestion.

As of May 08, 2026, the practical gap between agentic observability platforms is no longer “who has tracing?” but “which operating model fits your team?” LangSmith, Arize Phoenix, and W&B Weave all cover traces and evals; the differences show up in instrumentation style, workflow shape, and how naturally each product fits your existing stack. This cheat sheet focuses on those decision points, plus copy-ready setup references you can filter live.

  • All three platforms now cover tracing, evaluation, and prompt iteration loops.
  • Phoenix is the most explicit bet on OpenTelemetry and OpenInference.
  • LangSmith is the most opinionated end-to-end agent application workflow.
  • Weave feels strongest when your team already operates inside W&B.
  • Before shipping traces, scrub sensitive fields with a Data Masking Tool.

Decision matrix below is an inference from official docs, not vendor marketing copy.

DimensionLangSmithArize PhoenixW&B WeaveEdge
Core postureIntegrated agent app dev platformOTEL-native observability and evalsObservability plus evals in W&B workflowDepends
Tracing modelProjects, traces, runs, threadsOTLP spans, projects, sessionsOps, calls, traces inside W&B projectsPhoenix
Prompt iterationPlayground, prompt engineering, StudioPrompt management, playground, replayPlayground and version trackingLangSmith
Evaluation loopOffline and online evaluation workflowsEvals plus datasets and experimentsScorers, judges, and production feedbackDepends
Self-host postureCloud, hybrid, self-hosted options documentedStrong OSS and self-host storyDocs center on W&B account and project workflowPhoenix
Best fitApp teams shipping agent products fastInfra-minded teams wanting open instrumentationTeams already standardized on W&BDepends

At a Glance

Bottom Line

LangSmith is the easiest all-in-one choice for many agent teams, Phoenix is the cleanest open instrumentation choice, and Weave is the right answer when observability should plug into existing W&B habits rather than replace them.

What the official docs make clear

  • LangSmith explicitly organizes work around observability, evaluation, prompt engineering, and deployment.
  • Phoenix explicitly centers tracing, evaluation, prompt engineering, and datasets/experiments on top of OpenTelemetry.
  • Weave explicitly centers tracing, evaluations, version tracking, feedback, and production monitoring for LLM apps.

One naming trap to fix early

  • In 2026, “Arize” is ambiguous in public docs: Phoenix is the open-source observability product, while Arize AX is the enterprise AI engineering platform.
  • If your team says “we use Arize,” pin down whether they mean Phoenix or AX before you wire instrumentation.

When to Choose Each

Choose LangSmith when:

  • You want tracing, datasets, evaluators, prompts, and experiments to live in one workflow.
  • You are already building with LangChain or LangGraph and want fast instrumentation via environment variables and wrappers.
  • You care about both offline evaluation and online evaluation in a single product model.
  • You want a more app-centric workflow than a raw telemetry-centric one.

Choose Phoenix when:

  • You want the most explicit OTLP, OpenTelemetry, and OpenInference path.
  • You want open-source posture and documented self-hosting on Docker, Kubernetes, or your own cloud.
  • You want prompt replay, datasets, and experiments without giving up an infra-friendly tracing model.
  • Your team prefers instrumentation that is easy to move or standardize across providers.

Choose Weave when:

  • Your org already uses Weights & Biases teams, entities, or evaluation workflows.
  • You want function-level tracking with weave.op and project-centric logging via weave.init().
  • You want prompts, versions, traces, scorers, and feedback to sit next to the rest of your W&B work.
  • You need a lighter bridge from LLM app tracing into an existing W&B culture.
Watch out: The fastest way to create expensive observability debt is to ship traces before deciding what data is safe to retain. Prompt bodies, retrieved documents, tool inputs, and user messages often contain material you should redact or hash first.

Searchable Reference

Use the filter box to narrow commands, snippets, or platform notes. Keyboard shortcuts are wired below.

ShortcutAction
/Focus the live filter
EscClear the filter and blur the input
jJump to the next <h2> section
kJump to the previous <h2> section

Install and authenticate

LangSmith

pip install -U langsmith openai

export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="<your-langsmith-api-key>"
export OPENAI_API_KEY="<your-openai-api-key>"
export LANGSMITH_PROJECT="my-app"

Phoenix

pip install arize-phoenix openinference-instrumentation-openai openai

W&B Weave

pip install weave openai

export WANDB_API_KEY="<your_api_key>"

Basic tracing

LangSmith

from openai import OpenAI
from langsmith.wrappers import wrap_openai
from langsmith import traceable

client = wrap_openai(OpenAI())

@traceable
def assistant(question: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4.1-mini",
        messages=[{"role": "user", "content": question}],
    )
    return response.choices[0].message.content

Phoenix

from phoenix.otel import register

tracer_provider = register(
    project_name="my-llm-app",
    auto_instrument=True,
)

W&B Weave

import weave
from openai import OpenAI

client = OpenAI()
weave.init('your-team/traces-quickstart')

@weave.op()
def ask_model(prompt: str):
    return client.chat.completions.create(
        model='gpt-4o',
        messages=[{'role': 'user', 'content': prompt}],
    )

CLI and query workflows

LangSmith CLI

langsmith project list
langsmith trace list --project my-app --limit 5
langsmith run list --project my-app --run-type llm --include-metadata
langsmith dataset list
langsmith experiment list --dataset my-eval-set

Phoenix TypeScript instrumentation

npm install @arizeai/openinference-instrumentation-openai

Weave TypeScript install

npm install weave openai

Configuration

LangSmith configuration notes

  • LANGSMITH_TRACING enables tracing.
  • LANGSMITH_PROJECT routes traces into a named project.
  • LANGSMITH_WORKSPACE_ID is relevant when an API key is linked to multiple workspaces.
  • LANGSMITH_ENDPOINT matters for self-hosted or hybrid deployments.

Phoenix configuration notes

  • Phoenix docs now strongly emphasize using phoenix.otel and OTEL-aware defaults.
  • auto_instrument=True activates installed OpenInference instrumentors automatically.
  • For cloud setups, docs show PHOENIX_API_KEY and PHOENIX_COLLECTOR_ENDPOINT as the core connection values.
  • If you batch spans, flush on shutdown so data is not left in the exporter queue.

Weave configuration notes

  • WANDB_API_KEY can log you in non-interactively.
  • weave.init('entity/project') is the project-routing primitive that matters first.
  • WEAVE_PARALLELISM controls worker parallelism.
  • WEAVE_PRINT_CALL_LINK=false disables terminal call-link output.
# Weave runtime environment variables
export WEAVE_PARALLELISM=10
export WEAVE_PRINT_CALL_LINK=false
Pro tip: Normalize your env var strategy before rollout. One shared .env.example for tracing, eval, and provider keys removes a lot of false platform friction during trials.

Advanced Usage

Evaluation strategy that travels across all three

  • Start with a small hand-built dataset of failures, not a giant synthetic benchmark.
  • Separate offline evaluation from online monitoring so regressions and live drift are not mixed together.
  • Track retrieval quality, tool selection, formatting validity, and final answer quality as separate signals.
  • Promote bad production traces into eval datasets quickly; all three platforms support some version of that feedback loop.

Where each platform pulls ahead in advanced workflows

  • LangSmith: strongest when you want traces, evaluators, prompts, and experiment comparison in one application workflow.
  • Phoenix: strongest when you want open instrumentation and portability around OTLP and OpenInference.
  • Weave: strongest when you want LLM observability to feel like an extension of existing W&B evaluation and project habits.

Security and retention checklist

  • Decide whether prompts, retrieved chunks, and tool arguments are safe to store before you enable tracing broadly.
  • Mask secrets and identifiers at the edge, not after ingestion.
  • Keep one documented policy for trace retention, evaluator retention, and dataset retention.
  • If you share snippets across teams, run them through the Code Formatter before pasting into internal docs or runbooks.

Official Sources

Frequently Asked Questions

Is LangSmith or Arize Phoenix better for OpenTelemetry-first teams? +
Arize Phoenix is the clearer fit if your team wants an OpenTelemetry and OTLP-native workflow. Its docs center instrumentation around phoenix.otel and OpenInference, while LangSmith is more application-workflow-centric.
Does W&B Weave replace LangSmith for agent evaluation? +
Not automatically. Weave covers tracing and evaluation well, but it fits best when your organization already uses Weights & Biases projects, teams, and evaluation patterns. If your team wants a tighter agent-app workflow with datasets, prompts, and tracing in one product language, LangSmith is often the cleaner default.
What does 'Arize' mean in 2026: Phoenix or AX? +
It can mean either, and that ambiguity matters. Public docs distinguish Phoenix as the open-source observability product and Arize AX as the enterprise AI engineering platform, so teams should name the product explicitly before implementing tracing.
Which platform is easiest to self-host for agent observability? +
Phoenix has the strongest public self-host posture in this comparison, with docs pointing to deployment on Docker, Kubernetes, or your own cloud. LangSmith also documents cloud, hybrid, and self-hosted setup options, while Weave documentation is more centered on the managed W&B account workflow.
What data should I avoid sending into observability traces? +
Avoid raw secrets, access tokens, customer identifiers, and any document payload you cannot legally retain. In practice, the highest-risk fields are full prompts, retrieved context chunks, tool arguments, and user message bodies; sanitize them before ingestion, not after.

Get Engineering Deep-Dives in Your Inbox

Weekly breakdowns of architecture, security, and developer tooling — no fluff.

Found this useful? Share it.