NVIDIA's late-May technical blog updates put the operating layer of AI factories in focus. DSX OS is described as open, modular software for running token-producing infrastructure at scale, while Vera CPU targets the orchestration load created by agentic workloads.
Why software matters
Large AI systems do not fail only because a model is slow. They fail because queues back up, GPU placement is inefficient, prefill and decode paths are poorly separated, tool calls are unpredictable, and observability arrives too late.
DSX OS points to the need for a real operations layer across accelerated clusters. That layer has to reason about scheduling, workload isolation, telemetry, fault recovery, and capacity planning.
The CPU signal
Agentic inference increases CPU-side work. Every agent session can involve tool routing, retrieval, sandbox coordination, policy checks, file operations, and result validation. Vera CPU is positioned for that surrounding orchestration, not just classic application serving.
For developers, the takeaway is architectural. Model calls should be wrapped in explicit queues, timeout budgets, trace identifiers, and backpressure rules. Otherwise, an application can spend heavily on GPUs and still deliver poor user latency.
Builder takeaway
AI factory performance is full-stack performance. Token throughput depends on software operations as much as accelerator specifications.