Is LangSmith or Arize Phoenix better for OpenTelemetry-first teams?

Arize Phoenix is the clearer fit if your team wants an OpenTelemetry and OTLP-native workflow. Its docs center instrumentation around phoenix.otel and OpenInference, while LangSmith is more application-workflow-centric.

Does W&B Weave replace LangSmith for agent evaluation?

Not automatically. Weave covers tracing and evaluation well, but it fits best when your organization already uses Weights & Biases projects, teams, and evaluation patterns. If your team wants a tighter agent-app workflow with datasets, prompts, and tracing in one product language, LangSmith is often the cleaner default.

What does 'Arize' mean in 2026: Phoenix or AX?

It can mean either, and that ambiguity matters. Public docs distinguish Phoenix as the open-source observability product and Arize AX as the enterprise AI engineering platform, so teams should name the product explicitly before implementing tracing.

Which platform is easiest to self-host for agent observability?

Phoenix has the strongest public self-host posture in this comparison, with docs pointing to deployment on Docker, Kubernetes, or your own cloud. LangSmith also documents cloud, hybrid, and self-hosted setup options, while Weave documentation is more centered on the managed W&B account workflow.

What data should I avoid sending into observability traces?

Avoid raw secrets, access tokens, customer identifiers, and any document payload you cannot legally retain. In practice, the highest-risk fields are full prompts, retrieved context chunks, tool arguments, and user message bodies; sanitize them before ingestion, not after.

Agentic Observability Platforms [2026 Cheat Sheet]

As of May 08, 2026, the practical gap between agentic observability platforms is no longer “who has tracing?” but “which operating model fits your team?” LangSmith, Arize Phoenix, and W&B Weave all cover traces and evals; the differences show up in instrumentation style, workflow shape, and how naturally each product fits your existing stack. This cheat sheet focuses on those decision points, plus copy-ready setup references you can filter live.

All three platforms now cover tracing, evaluation, and prompt iteration loops.
Phoenix is the most explicit bet on OpenTelemetry and OpenInference.
LangSmith is the most opinionated end-to-end agent application workflow.
Weave feels strongest when your team already operates inside W&B.
Before shipping traces, scrub sensitive fields with a Data Masking Tool.

Decision matrix below is an inference from official docs, not vendor marketing copy.

Dimension	LangSmith	Arize Phoenix	W&B Weave	Edge
Core posture	Integrated agent app dev platform	OTEL-native observability and evals	Observability plus evals in W&B workflow	Depends
Tracing model	Projects, traces, runs, threads	OTLP spans, projects, sessions	Ops, calls, traces inside W&B projects	Phoenix
Prompt iteration	Playground, prompt engineering, Studio	Prompt management, playground, replay	Playground and version tracking	LangSmith
Evaluation loop	Offline and online evaluation workflows	Evals plus datasets and experiments	Scorers, judges, and production feedback	Depends
Self-host posture	Cloud, hybrid, self-hosted options documented	Strong OSS and self-host story	Docs center on W&B account and project workflow	Phoenix
Best fit	App teams shipping agent products fast	Infra-minded teams wanting open instrumentation	Teams already standardized on W&B	Depends

At a Glance

Bottom Line

LangSmith is the easiest all-in-one choice for many agent teams, Phoenix is the cleanest open instrumentation choice, and Weave is the right answer when observability should plug into existing W&B habits rather than replace them.

What the official docs make clear

LangSmith explicitly organizes work around observability, evaluation, prompt engineering, and deployment.
Phoenix explicitly centers tracing, evaluation, prompt engineering, and datasets/experiments on top of OpenTelemetry.
Weave explicitly centers tracing, evaluations, version tracking, feedback, and production monitoring for LLM apps.

One naming trap to fix early

In 2026, “Arize” is ambiguous in public docs: Phoenix is the open-source observability product, while Arize AX is the enterprise AI engineering platform.
If your team says “we use Arize,” pin down whether they mean Phoenix or AX before you wire instrumentation.

When to Choose Each

Choose LangSmith when:

You want tracing, datasets, evaluators, prompts, and experiments to live in one workflow.
You are already building with LangChain or LangGraph and want fast instrumentation via environment variables and wrappers.
You care about both offline evaluation and online evaluation in a single product model.
You want a more app-centric workflow than a raw telemetry-centric one.

Choose Phoenix when:

You want the most explicit OTLP, OpenTelemetry, and OpenInference path.
You want open-source posture and documented self-hosting on Docker, Kubernetes, or your own cloud.
You want prompt replay, datasets, and experiments without giving up an infra-friendly tracing model.
Your team prefers instrumentation that is easy to move or standardize across providers.

Choose Weave when:

Your org already uses Weights & Biases teams, entities, or evaluation workflows.
You want function-level tracking with weave.op and project-centric logging via weave.init().
You want prompts, versions, traces, scorers, and feedback to sit next to the rest of your W&B work.
You need a lighter bridge from LLM app tracing into an existing W&B culture.

Watch out: The fastest way to create expensive observability debt is to ship traces before deciding what data is safe to retain. Prompt bodies, retrieved documents, tool inputs, and user messages often contain material you should redact or hash first.

Searchable Reference

Use the filter box to narrow commands, snippets, or platform notes. Keyboard shortcuts are wired below.

Shortcut	Action
`/`	Focus the live filter
`Esc`	Clear the filter and blur the input
`j`	Jump to the next `<h2>` section
`k`	Jump to the previous `<h2>` section

Install and authenticate

LangSmith

pip install -U langsmith openai

export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="<your-langsmith-api-key>"
export OPENAI_API_KEY="<your-openai-api-key>"
export LANGSMITH_PROJECT="my-app"

Phoenix

pip install arize-phoenix openinference-instrumentation-openai openai

W&B Weave

pip install weave openai

export WANDB_API_KEY="<your_api_key>"

Basic tracing

LangSmith

from openai import OpenAI
from langsmith.wrappers import wrap_openai
from langsmith import traceable

client = wrap_openai(OpenAI())

@traceable
def assistant(question: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4.1-mini",
        messages=[{"role": "user", "content": question}],
    )
    return response.choices[0].message.content

Phoenix

from phoenix.otel import register

tracer_provider = register(
    project_name="my-llm-app",
    auto_instrument=True,
)

W&B Weave

import weave
from openai import OpenAI

client = OpenAI()
weave.init('your-team/traces-quickstart')

@weave.op()
def ask_model(prompt: str):
    return client.chat.completions.create(
        model='gpt-4o',
        messages=[{'role': 'user', 'content': prompt}],
    )

CLI and query workflows

LangSmith CLI

langsmith project list
langsmith trace list --project my-app --limit 5
langsmith run list --project my-app --run-type llm --include-metadata
langsmith dataset list
langsmith experiment list --dataset my-eval-set

Phoenix TypeScript instrumentation

npm install @arizeai/openinference-instrumentation-openai

Weave TypeScript install

npm install weave openai

Configuration

LangSmith configuration notes

LANGSMITH_TRACING enables tracing.
LANGSMITH_PROJECT routes traces into a named project.
LANGSMITH_WORKSPACE_ID is relevant when an API key is linked to multiple workspaces.
LANGSMITH_ENDPOINT matters for self-hosted or hybrid deployments.

Phoenix configuration notes

Phoenix docs now strongly emphasize using phoenix.otel and OTEL-aware defaults.
auto_instrument=True activates installed OpenInference instrumentors automatically.
For cloud setups, docs show PHOENIX_API_KEY and PHOENIX_COLLECTOR_ENDPOINT as the core connection values.
If you batch spans, flush on shutdown so data is not left in the exporter queue.

Weave configuration notes

WANDB_API_KEY can log you in non-interactively.
weave.init('entity/project') is the project-routing primitive that matters first.
WEAVE_PARALLELISM controls worker parallelism.
WEAVE_PRINT_CALL_LINK=false disables terminal call-link output.

# Weave runtime environment variables
export WEAVE_PARALLELISM=10
export WEAVE_PRINT_CALL_LINK=false

Pro tip: Normalize your env var strategy before rollout. One shared .env.example for tracing, eval, and provider keys removes a lot of false platform friction during trials.

Advanced Usage

Evaluation strategy that travels across all three

Start with a small hand-built dataset of failures, not a giant synthetic benchmark.
Separate offline evaluation from online monitoring so regressions and live drift are not mixed together.
Track retrieval quality, tool selection, formatting validity, and final answer quality as separate signals.
Promote bad production traces into eval datasets quickly; all three platforms support some version of that feedback loop.

Where each platform pulls ahead in advanced workflows

LangSmith: strongest when you want traces, evaluators, prompts, and experiment comparison in one application workflow.
Phoenix: strongest when you want open instrumentation and portability around OTLP and OpenInference.
Weave: strongest when you want LLM observability to feel like an extension of existing W&B evaluation and project habits.

Security and retention checklist

Decide whether prompts, retrieved chunks, and tool arguments are safe to store before you enable tracing broadly.
Mask secrets and identifiers at the edge, not after ingestion.
Keep one documented policy for trace retention, evaluator retention, and dataset retention.
If you share snippets across teams, run them through the Code Formatter before pasting into internal docs or runbooks.

Agentic Observability Platforms [2026 Cheat Sheet]

Bottom Line

At a Glance

Bottom Line

What the official docs make clear

One naming trap to fix early

When to Choose Each

Choose LangSmith when:

Choose Phoenix when:

Choose Weave when:

Searchable Reference

Install and authenticate

LangSmith

Phoenix

W&B Weave

Basic tracing

LangSmith

Phoenix

W&B Weave

CLI and query workflows

LangSmith CLI

Phoenix TypeScript instrumentation

Weave TypeScript install

Configuration

LangSmith configuration notes

Phoenix configuration notes

Weave configuration notes

Advanced Usage

Evaluation strategy that travels across all three

Where each platform pulls ahead in advanced workflows

Security and retention checklist

Official Sources

Frequently Asked Questions

Get Engineering Deep-Dives in Your Inbox