Home Posts Agent Observability Checklist [Developer Cheat Sheet]
Developer Reference

Agent Observability Checklist [Developer Cheat Sheet]

Agent Observability Checklist [Developer Cheat Sheet]
Dillip Chowdary
Dillip Chowdary
Tech Entrepreneur & Innovator · June 18, 2026 · 10 min read

Bottom Line

Agent observability is complete only when traces, tool logs, cost meters, and replay bundles connect to the same workflow ID. That shared ID turns vague failures into reproducible engineering work.

Key Takeaways

  • One trace per user task; spans for model calls, tools, guardrails, handoffs, and retries.
  • Cost attribution belongs on model spans, not only monthly provider invoices.
  • Replay bundles need sanitized inputs, fixed fixtures, prompt versions, and assertions.
  • Use OTLP for portable export, then add vendor-specific search and eval features.

Agent observability is the operating checklist for systems that reason, call tools, spend tokens, and fail in ways ordinary HTTP logs cannot explain. This reference gives you a production-ready map for June 18, 2026: trace every agent step, capture structured tool logs, meter cost per request, and save replay bundles that reproduce failures without exposing private data.

  • Capture one trace per user task, with spans for model calls, tool calls, guardrails, and handoffs.
  • Keep prompts, tool inputs, and outputs redacted by default; store replay fixtures separately from secrets.
  • Meter tokens and wall time at the span level so routing, retries, and tools can be cost-attributed.
  • Use OTLP for portable telemetry, then add vendor features for search, evals, and replay.

Agent Observability Checklist

Bottom Line

An observable agent has a trace tree, correlated tool logs, cost meters, and a sanitized replay bundle for every important failure path. If any one is missing, debugging turns into guesswork.

Minimum Signals

  • Trace root: one workflow trace per user request, job, conversation turn, or scheduled task.
  • Span taxonomy: create spans for planning, model calls, retrieval, tool calls, guardrails, handoffs, retries, and final response formatting.
  • Tool log: record tool name, arguments hash, permission scope, exit status, duration, result size, and sanitized error text.
  • Cost meter: attach input tokens, output tokens, cached tokens when available, model name, provider, retry count, and route choice to the model span.
  • Replay bundle: preserve prompt template version, normalized inputs, retrieved document IDs, tool fixtures, model settings, and seed or deterministic controls when supported.
  • Privacy gate: redact secrets, credentials, personal data, raw customer payloads, and proprietary context before telemetry leaves the process.

Use portable telemetry first. The OpenTelemetry OTLP exporter configuration defines endpoints, headers, timeouts, and protocols for traces, metrics, and logs. Framework-specific tracing can sit on top: OpenAI Agents SDK tracing records generations, tool calls, handoffs, guardrails, and custom events, while LangSmith tracing provides project-based trace inspection.

Watch out: Failure replay is not a reason to store raw secrets. Mask payloads before export; for quick test fixtures, TechBytes' Data Masking Tool helps sanitize sample JSON before it lands in docs, tickets, or repro cases.

Live Search JS Filter

Cheat sheets work best when engineers can narrow the list fast. Drop this filter into an internal runbook page and tag each row with the signal, owner, and severity.

<label for='obs-filter'>Filter checklist</label>
<input id='obs-filter' type='search' placeholder='trace, cost, replay, tool log' autocomplete='off'>

<ul id='obs-list'>
  <li data-tags='trace model cost'>Model span includes provider, model, tokens, latency, retry count</li>
  <li data-tags='tool log replay'>Tool call stores sanitized args hash, exit status, fixture pointer</li>
  <li data-tags='privacy security'>Exporter redacts secrets before transport</li>
</ul>

<script>
const input = document.querySelector('#obs-filter');
const items = [...document.querySelectorAll('#obs-list li')];
input.addEventListener('input', () => {
  const q = input.value.trim().toLowerCase();
  for (const item of items) {
    const text = `${item.textContent} ${item.dataset.tags}`.toLowerCase();
    item.hidden = q && !text.includes(q);
  }
});
</script>

Filterable Fields

  • Signal: trace, metric, log, event, replay, eval, alert.
  • Owner: platform, AI engineering, security, data, product, support.
  • Severity: blocker, high, medium, low, hygiene.
  • Runtime: Node.js, Python, browser, worker, batch, queue consumer.

Keyboard Shortcuts Table

For an internal trace explorer, wire shortcuts to navigation and replay actions. Keep destructive actions behind confirmation and respect focused form fields.

ShortcutActionUse when
/Focus trace searchJump from a failure report to a trace ID, user ID hash, or workflow name.
j / kNext or previous spanMove through a trace tree without losing detail-pane focus.
eOpen error spanSkip straight to the first failed tool, model, or guardrail span.
cCopy trace linkPaste a stable permalink into an incident, pull request, or support ticket.
rOpen replay bundleInspect the sanitized fixture that reproduces the failure.
?Show shortcut helpExpose shortcuts without putting instructional text in the main workflow.

Commands Grouped By Purpose

Install And Bootstrap

These commands use documented OpenTelemetry JavaScript setup and standard shell tools. Replace placeholder endpoints and keys with your own values.

npm install --save @opentelemetry/api
npm install --save @opentelemetry/auto-instrumentations-node
export OTEL_SERVICE_NAME='agent-worker'
export OTEL_TRACES_EXPORTER='otlp'
export OTEL_EXPORTER_OTLP_ENDPOINT='http://localhost:4318'
export OTEL_EXPORTER_OTLP_PROTOCOL='http/protobuf'
export NODE_OPTIONS='--require @opentelemetry/auto-instrumentations-node/register'
node app.js

Trace Lookup

  • By trace ID: search the trace backend for the root trace, then inspect model and tool child spans.
  • By request ID: correlate application logs to the trace root using a shared request identifier.
  • By workflow: group traces by workflow name to compare planner, retrieval, and tool latency across releases.
TRACE_ID='trace_abc123'
rg "$TRACE_ID" ./logs ./replays

Tool Log Triage

jq 'select(.type == "tool_call" and .status != "ok") | {trace_id, tool, duration_ms, status, error}' agent-events.jsonl

Cost Rollups

jq -s '
  map(select(.type == "model_call"))
  | group_by(.model)
  | map({
      model: .[0].model,
      calls: length,
      input_tokens: map(.input_tokens // 0) | add,
      output_tokens: map(.output_tokens // 0) | add
    })
' agent-events.jsonl

Replay Bundle Creation

mkdir -p replays/$TRACE_ID
jq --arg trace "$TRACE_ID" 'select(.trace_id == $trace)' agent-events.jsonl > replays/$TRACE_ID/events.jsonl
cp prompts/customer-support.current.md replays/$TRACE_ID/prompt.md
cp fixtures/retrieval-results.json replays/$TRACE_ID/retrieval.json

Configuration

Environment Variables

VariablePurposeNotes
OTEL_SERVICE_NAMENames the emitting serviceUse stable names such as agent-api, agent-worker, or agent-evals.
OTEL_EXPORTER_OTLP_ENDPOINTSets the base OTLP endpointOfficial defaults are SDK-dependent; set it explicitly per environment.
OTEL_EXPORTER_OTLP_TRACES_ENDPOINTOverrides trace export endpointUse when traces, metrics, and logs route to different collectors.
OTEL_EXPORTER_OTLP_HEADERSAdds exporter headersKeep tokens in a secret manager, not in committed config.
OTEL_EXPORTER_OTLP_TIMEOUTControls exporter timeout in millisecondsSet low enough that telemetry cannot stall the agent hot path.
OTEL_LOG_LEVELControls OpenTelemetry diagnostic loggingUse debug briefly during instrumentation work; production should stay quiet.
LANGSMITH_TRACINGEnables LangSmith tracingSet to true when using LangSmith projects.
OPENAI_AGENTS_DISABLE_TRACINGDisables OpenAI Agents SDK tracingSet to 1 only when policy or environment requires it.

Span Attribute Checklist

  • Identity: trace ID, span ID, parent span ID, workflow name, tenant hash, request hash.
  • Model: provider, model name, temperature, max output setting, input tokens, output tokens, finish reason.
  • Tool: tool name, version, permission scope, argument schema version, exit code, result bytes.
  • Retriever: index name, query hash, top-k setting, returned document IDs, score range.
  • Policy: guardrail name, decision, blocked reason, human review status.
  • Release: app version, prompt version, routing policy version, deployment region.

Advanced Usage

Failure Replay Contract

A replay bundle should be deterministic enough for debugging and sanitized enough for broad engineering access. Treat it as a contract between production, CI, and incident review.

  • Manifest: include trace ID, created time, app version, prompt version, model settings, and owners.
  • Inputs: store normalized user input after masking and policy classification.
  • Context: save retrieval document IDs and fixed snippets, not a live query that can drift.
  • Tools: capture fixture responses for external APIs, file reads, database rows, and permission checks.
  • Assertions: define expected failure, expected fix behavior, or expected guardrail decision.
{
  "trace_id": "trace_abc123",
  "workflow": "support_refund_agent",
  "release_channel": "current",
  "prompt_version": "customer-support.current",
  "model": "configured-by-runtime",
  "fixtures": {
    "retrieval": "retrieval.json",
    "tools": "tools.jsonl"
  },
  "assertions": [
    "refund_tool is not called without order ownership",
    "final_response includes escalation path"
  ]
}

Alerting Rules

  • Cost spike: page when cost per successful workflow exceeds the rolling budget threshold.
  • Tool failure: alert on a rising error rate for tools that mutate state or affect money movement.
  • Replay gap: open a ticket when a failed production trace has no replay bundle after the retention delay.
  • Trace break: alert when root traces lack model or tool child spans after a deploy.
  • Privacy violation: block export and notify security when telemetry contains forbidden key patterns.
Pro tip: Add a replay smoke test to CI for each incident class. It turns production failures into regression coverage instead of one-off debugging notes.

Review Cadence

  1. Review top cost traces weekly and decide whether routing, caching, or prompt shape should change.
  2. Sample failed tool traces daily until tool contracts and permissions stabilize.
  3. Run replay bundles before model, prompt, retriever, and tool schema changes.
  4. Audit redaction rules monthly with security and support examples.
  5. Delete expired telemetry and replay fixtures according to retention policy.

Frequently Asked Questions

What should I trace in an AI agent? +
Trace the full workflow, not just the LLM call. A useful trace has spans for planning, model calls, retrieval, tool calls, guardrails, handoffs, retries, and final response formatting, all tied to one workflow ID.
How do I log agent tool calls safely? +
Log tool metadata instead of raw sensitive payloads. Store the tool name, schema version, permission scope, arguments hash, duration, status, result size, and sanitized error text; keep secrets and private data out of the telemetry path.
Where should token cost tracking live? +
Attach token and cost metadata to the model span and roll it up to the workflow trace. That lets you explain cost by model, route, retry, tenant, feature, and release instead of relying only on provider invoices.
What is failure replay for agents? +
Failure replay is a sanitized fixture that can reproduce a production agent failure outside production. It usually includes normalized input, prompt version, model settings, retrieved context, tool fixtures, and assertions for the expected behavior.
Should I use OpenTelemetry or a dedicated LLM observability tool? +
Use OpenTelemetry for portable traces, metrics, logs, and exporter configuration. Add a dedicated LLM or agent observability tool when you need richer trace search, prompt inspection, eval workflows, and replay UX.

Get Engineering Deep-Dives in Your Inbox

Weekly breakdowns of architecture, security, and developer tooling — no fluff.

Found this useful? Share it.