Home Posts Microsoft Foundry Agent Runtime Explained [2026] Guide
AI Engineering

Microsoft Foundry Agent Runtime Explained [2026] Guide

Microsoft Foundry Agent Runtime Explained [2026] Guide
Dillip Chowdary
Dillip Chowdary
Tech Entrepreneur & Innovator · June 12, 2026 · 8 min read

Bottom Line

Microsoft Foundry Agent Service is becoming an operational runtime for AI agents, not just a model wrapper. Its value is the managed layer around memory, tools, identity, tracing, and governed data placement.

Key Takeaways

  • Prompt agents are managed; hosted agents run custom orchestration code in preview.
  • Memory stores support long-term memory with item CRUD, TTL defaults, and remember-or-forget behavior.
  • Tool choice can be auto, required, or none, giving teams deterministic control over tool calls.
  • Tracing captures tool use, retries, latency, token use, and cost signals for agent runs.
  • Standard setup places Cosmos DB, Storage, and AI Search in customer-owned Azure resources.

Microsoft Foundry Agent Service has moved from a model endpoint wrapper toward a managed runtime for stateful agents: prompt agents that Microsoft runs for you, and hosted agents where your containerized orchestration code runs behind a managed endpoint. The important shift is operational, not cosmetic. Memory, tool execution, trace capture, identity, storage placement, and publishing controls are becoming first-class runtime concerns instead of code every team rebuilds around an LLM API.

  • Prompt agents run fully managed; hosted agents run your container or source package in preview.
  • Memory is long-term, preview, and organized through managed memory stores with CRUD and TTL controls.
  • Tooling spans File Search, Code Interpreter, Web Search, OpenAPI, Azure AI Search, MCP, and custom functions.
  • Tracing captures inputs, outputs, tool calls, retries, latency, token use, and cost signals.
  • Standard setup places storage, Cosmos DB, and AI Search in your Azure resources for stronger governance.

The Lead

Bottom Line

Foundry Agent Service is best understood as an agent control plane plus runtime: Microsoft handles state, tools, identity, publishing, and telemetry while teams decide how much orchestration code they still need to own.

The June 2026 architecture story is that agents are no longer just prompts with function calls. Microsoft Foundry Agent Service defines deployable agent versions, routes requests through the Responses API, persists conversation state, connects tools, and exposes observability hooks that operations teams can inspect. Most production failures in agent systems happen outside the model: stale retrieval, unsafe tool calls, hidden latency, weak identity boundaries, or memory that remembers the wrong thing.

Foundry gives teams two operating modes. Prompt agents are the managed path: define instructions, model, tools, safety controls, and publish an endpoint without maintaining app compute. Hosted agents, currently preview, are the code-owning path: package an agent as a container image, or in preview upload source code, and let the service provide endpoint management, scaling, identity, session persistence, logs, and traces.

The product is also opinionated about enterprise boundaries. A basic setup stores state in Microsoft-managed resources. A standard setup lets the organization bring Azure Storage for files, Azure Cosmos DB for conversation and operational state, Azure AI Search for vector stores, Key Vault, and optionally Application Insights.

Architecture & Implementation

Runtime components: agents, conversations, responses

The new developer model centers on three runtime primitives: agents, conversations, and responses. An agent combines model selection, instructions, parameters, tools, and governance configuration. A conversation persists multi-turn history. A response is the generated output for an input, optionally involving retrieval, tool execution, streaming, or background work.

  • Agent versioning snapshots changes so teams can compare or roll back behavior.
  • Publishing promotes an agent to a stable managed resource with endpoint access.
  • Distribution can expose published agents through Microsoft 365 Copilot, Teams, Entra Agent Registry, and supported invocation protocols.
  • Responses API becomes the common entry point for generation and agent execution.

Memory: long-term continuity with lifecycle controls

Foundry memory is a managed preview capability for long-term memory. It is not the same as the short context window inside a single chat turn. The memory system extracts durable facts from interactions, consolidates overlapping information, and retrieves relevant memories later so the agent can personalize future responses.

Microsoft describes three memory types:

  • User profile memory: durable preferences and user context, such as language, product defaults, or accessibility needs.
  • Chat summary memory: distilled summaries of prior conversation topics and threads.
  • Procedural memory: reusable routines and operating patterns inferred from previous work.

The operationally important detail is lifecycle control. Memory stores support create, read, update, list, and delete operations for individual items. Store-level default retention can set a default TTL for new memory entries, and direct remember-or-forget behavior lets explicit user instructions synchronize with the memory store.

Watch out: Long-term memory creates a new attack surface. Prompt injection can become memory corruption if untrusted content is stored as a durable preference, rule, or workflow.

A good implementation pattern is to separate knowledge sources from personal memory. Use Foundry IQ or Azure AI Search for curated organizational content, File Search for user-provided documents, and memory for user-specific facts that should persist. Before storing production logs, trace exports, or support transcripts, strip secrets and personal data with a workflow such as TechBytes' Data Masking Tool.

Tools: from retrieval to action

Tools are the runtime boundary where an agent leaves pure language generation and touches data, code, or external services. Foundry's tool surface includes File Search, Code Interpreter, Web Search, Azure AI Search, OpenAPI, MCP, custom functions, and agent-to-agent patterns in supported scenarios.

Tool behavior should be configured deliberately rather than left entirely to model judgment. Foundry supports tool_choice values of auto, required, and none. That gives architects a simple but powerful control: let the model decide, force a tool path for grounded answers, or prevent tool use for sensitive interactions.

  • File Search is appropriate for uploaded PDFs, Word files, code files, and other documents up to 512 MB without managing separate search infrastructure.
  • Azure AI Search is better for governed retrieval over an existing index, including vector, semantic, and hybrid query modes.
  • Code Interpreter runs Python in a sandbox for analysis, chart generation, and iterative computation, with separate session billing.
  • OpenAPI connects agents to HTTP APIs described by OpenAPI 3.0 or OpenAPI 3.1 specs with operation IDs.
  • Web Search grounds responses in current public web information, but administrators can disable it at the subscription level.

For hosted agents, Microsoft does not inject every tool into your process. Hosted agents access Foundry-managed tools through a Toolbox MCP endpoint, and your code connects with standard MCP client libraries. That preserves framework choice while still letting the platform manage credentials and tool infrastructure.

Identity, storage, and network placement

The governance model starts with identity. Hosted agents receive a dedicated Microsoft Entra agent identity. Prompt agents and tool connections can use project managed identity or configured connections. For private Azure AI Search, Microsoft requires Microsoft Entra project managed identity rather than key-based authentication.

Storage placement is the next decision. In the standard setup, Foundry uses customer-owned Azure resources: Azure Cosmos DB for conversation state and agent definitions, Azure Storage for files and attachments, and Azure AI Search for vector stores. Microsoft manages the schemas and service use, but the resources live in the customer's subscription. Standard setup also supports customer-managed keys where the underlying services support them.

Networking adds another layer. Virtual network isolation relies on Azure Container Apps infrastructure and requires a delegated agent runtime subnet. Microsoft documents /24 as the recommended subnet size and /27 as the minimum. Each Foundry resource needs its own dedicated agent runtime subnet.

Benchmarks & Metrics

Foundry Agent Service does not publish a universal benchmark that says an agent run should complete in a fixed number of seconds. Latency depends on the model, reasoning path, retrieval, tool count, network placement, and downstream APIs. The useful metrics are runtime observability metrics, not synthetic leaderboard scores.

For hosted agents, Microsoft documents concrete operational limits and behaviors:

  • 15 minutes: hosted agent compute is deprovisioned after the idle timeout.
  • 30 minutes: maximum connection duration for hosted-agent log streaming.
  • 2 minutes: idle timeout for the logstream connection.
  • 1 hour: default active duration for a Code Interpreter session.
  • 30 minutes: Code Interpreter idle timeout inside that default session window.

Observability is where the runtime becomes debuggable. Foundry tracing captures the full request journey as traces and spans. It records inputs, outputs, tool usage, retries, latency, duration, token consumption, and cost signals. Prompt-agent tracing is generally available, while hosted, workflow, and external-agent tracing remain preview.

Framework support is uneven by design. Agents built with Microsoft Agent Framework or Semantic Kernel automatically emit traces when tracing is enabled for the Foundry project. LangChain, LangGraph, and OpenAI Agents SDK integrations require explicit OpenTelemetry setup and Azure Monitor export configuration.

A practical production dashboard should track:

  • End-to-end latency: total response time by agent version, model, tool path, and user segment.
  • Tool latency: time spent in File Search, Azure AI Search, OpenAPI calls, Code Interpreter, or MCP tools.
  • Tool failure rate: failed calls, empty retrievals, authorization failures, and retry counts.
  • Token and cost signals: prompt, output, retrieval, and tool-session costs by workflow.
  • Memory events: create, update, delete, remember, forget, and retrieval actions by scope.
  • Safety interventions: content filtering, blocked tools, and policy-driven fallbacks.

The benchmark that matters is not only median latency. For agents, the costly failures hide in the tail: a tool timeout that forces a retry loop, a retrieval miss that produces an unsupported answer, or a memory recall that pulls stale context into a high-value workflow.

Strategic Impact

Foundry Agent Service is Microsoft's attempt to make agent deployment look less like experimental scripting and more like cloud application operations. The strategic value is not that it offers a new way to call a model. It offers a managed envelope around the unstable parts of agent systems: state, tools, traceability, publishing, identity, and data placement.

That changes the build-versus-buy calculation for platform teams. A team can still build orchestration with Agent Framework, LangGraph, Semantic Kernel, OpenAI Agents SDK, Anthropic Agent SDK, GitHub Copilot SDK, or custom code. But the hosting, endpoint, scale, identity, and observability layers can move into Foundry.

The governance implications are larger than the developer ergonomics. Agents make decisions over time, call tools, and accumulate context. Change management therefore needs to cover memory policy, tool permission, model selection, network path, data retention, tracing access, and incident response.

  • For developers: the runtime reduces boilerplate around state and tool plumbing, but it increases the need for versioned agent definitions and testable tool contracts.
  • For security teams: Entra identities, RBAC, private networking, standard setup resources, and subscription-level tool controls create enforceable boundaries.
  • For data teams: Azure AI Search and memory stores turn retrieval quality and data lifecycle policy into first-class design decisions.
  • For operations teams: traces and Application Insights integration make agent behavior inspectable enough for on-call workflows.

The most important architectural discipline is to treat tools as production dependencies. An agent that can call payroll, search legal documents, run Python, and browse the web is not just a chatbot. It is an application with privileges.

Road Ahead

The roadmap signal is clear: Microsoft is converging agent development, model access, observability, and governance under Foundry. The new service already uses the Responses API as a common entry point, supports broader tool coverage than the older classic experience, and offers hosted agents for teams that need custom orchestration code without self-managing infrastructure.

The near-term caution is preview maturity. Memory, hosted agents, workflow agents, external-agent tracing, some tool combinations, source-code deployment, and agent-to-agent capabilities are not all equally production-hardened. Architects should label those dependencies explicitly in design reviews and avoid building compliance claims on preview behavior that Microsoft says can change.

A pragmatic adoption path looks like this:

  1. Start with a prompt agent for a narrow workflow with bounded tools and a clear evaluation set.
  2. Add File Search or Azure AI Search only after source quality, citations, and retrieval failure handling are tested.
  3. Enable tracing before broad rollout, not after the first production incident.
  4. Move to standard setup when data ownership, CMK, private networking, or auditability becomes a requirement.
  5. Adopt hosted agents when orchestration logic, framework choice, or custom protocols justify owning code.
  6. Introduce memory last, with retention, deletion, and corruption-mitigation controls already designed.

Foundry Agent Service will not remove the hard parts of agent engineering. It does, however, relocate many of them into managed Azure constructs that security, platform, and operations teams already understand. That is the real architectural milestone: agents are becoming deployable systems with memory, tools, telemetry, and governance boundaries instead of isolated model calls hidden inside application code.

Frequently Asked Questions

What is Microsoft Foundry Agent Service used for? +
Microsoft Foundry Agent Service is used to build, deploy, and operate AI agents on Azure. It manages agent versions, conversations, responses, tools, identity, observability, and optional customer-owned storage for production workloads.
How does memory work in Foundry Agent Service? +
Foundry memory is a preview long-term memory system backed by managed memory stores. It can store user profile memory, chat summaries, and procedural memory, with item-level CRUD operations and default TTL controls.
What tools can Microsoft Foundry agents use? +
Foundry agents can use tools such as File Search, Code Interpreter, Web Search, Azure AI Search, OpenAPI, MCP, and custom functions. Tool use can be controlled with tool_choice values such as auto, required, and none.
Does Foundry Agent Service support observability? +
Yes. Foundry tracing records inputs, outputs, tool calls, retries, latency, duration, token usage, and cost signals. Prompt-agent tracing is generally available, while hosted, workflow, and external-agent tracing are still preview.
When should I use hosted agents instead of prompt agents? +
Use prompt agents when the workflow can be expressed through managed instructions, tools, and model configuration. Use hosted agents when you need custom orchestration code, a specific agent framework, custom protocols, or containerized runtime behavior.

Get Engineering Deep-Dives in Your Inbox

Weekly breakdowns of architecture, security, and developer tooling — no fluff.

Found this useful? Share it.