AI Infrastructure

NVIDIA DSX OS and Vera CPU Make AI Factories an Operations Problem

Published June 05, 2026 by Dillip Chowdary

NVIDIA's late-May technical blog updates put the operating layer of AI factories in focus. DSX OS is described as open, modular software for running token-producing infrastructure at scale, while Vera CPU targets the orchestration load created by agentic workloads.

Why software matters

Large AI systems do not fail only because a model is slow. They fail because queues back up, GPU placement is inefficient, prefill and decode paths are poorly separated, tool calls are unpredictable, and observability arrives too late.

DSX OS points to the need for a real operations layer across accelerated clusters. That layer has to reason about scheduling, workload isolation, telemetry, fault recovery, and capacity planning.

The CPU signal

Agentic inference increases CPU-side work. Every agent session can involve tool routing, retrieval, sandbox coordination, policy checks, file operations, and result validation. Vera CPU is positioned for that surrounding orchestration, not just classic application serving.

For developers, the takeaway is architectural. Model calls should be wrapped in explicit queues, timeout budgets, trace identifiers, and backpressure rules. Otherwise, an application can spend heavily on GPUs and still deliver poor user latency.

Builder takeaway

AI factory performance is full-stack performance. Token throughput depends on software operations as much as accelerator specifications.

Source: NVIDIA Technical Blog →