[Standard] CNCF Kubernetes AI Conformance v1.35: The AI Era

For years, running AI workloads on Kubernetes felt like forcing a square peg into a round hole. While K8s was designed for stateless web microservices, AI models require stateful, high-throughput, and hardware-dependent environments. Today, the CNCF (Cloud Native Computing Foundation) has officially bridged that gap with the release of Kubernetes v1.35 and the accompanying AI Conformance v1.35 standard.

Dynamic Resource Allocation (DRA) goes GA

The headline feature of v1.35 is the General Availability of Dynamic Resource Allocation (DRA). Before DRA, managing GPUs in Kubernetes was limited to simple integer requests (e.g., nvidia.com/gpu: 1). This led to massive resource fragmentation and underutilization. DRA allows for much more granular control, enabling pods to request specific hardware features like NVLink topologies, fractional GPU slicing (MIG/MPS), and even specific memory bandwidth profiles.

Technically, DRA introduces a new resource claim model that decouples device discovery from the core kubelet. This allows vendors to write specialized resource drivers that can handle the complex "gang scheduling" required for large-scale distributed training jobs. If a pod needs 8 GPUs connected via a specific high-speed interconnect, DRA ensures that the scheduler only places that pod on a node that meets the exact topological requirements.

The AI Conformance Program

Alongside the software release, the CNCF has launched the AI Conformance Program. Similar to the standard Kubernetes conformance tests, this program ensures that cloud providers (AWS, Google, Azure) provide a consistent experience for AI developers. A "CNCF AI Certified" cluster must support specific APIs for VRAM isolation, automated driver lifecycle management, and native integration with the Kueue job queueing system.

This is a major win for Sovereign Cloud providers. By adhering to the AI Conformance standard, smaller providers can offer a "Kubernetes-native" AI experience that is compatible with the same Helm charts and Kueflow pipelines used in the major public clouds. This reduces vendor lock-in and allows enterprises to move their training workloads to the most cost-effective region without rewriting their infrastructure code.

Technical Insight: Multi-Cluster AI Mesh

Kubernetes v1.35 also introduces alpha support for Multi-Cluster AI Mesh. This allows a single training job to span across multiple physical clusters, utilizing Submariner or Cilium ClusterMesh to handle the high-speed cross-cluster networking required for gradient synchronization.

Optimizing for Inference: Sidecar Containers and WASM

While training gets the headlines, v1.35 also brings significant improvements for AI Inference. The new Inference Sidecar pattern allows for model weights to be loaded from a shared volume into a sidecar container, reducing pod startup times (Cold Starts) by up to 80%. There is also improved support for WebAssembly (Wasm) runtimes, which are increasingly being used to run lightweight models at the edge with minimal overhead.

With v1.35, Kubernetes has officially transitioned from a container orchestrator to an AI Operating System. The ecosystem is now focused on "Day 2" operations: observability for GPU metrics via Prometheus, automated scaling based on inference latency, and secure multi-tenancy for shared AI clusters. The road to Artificial General Intelligence (AGI) is being paved with YAML.

Kubernetes v1.35: The AI Conformance Leap

Dynamic Resource Allocation (DRA) goes GA

The AI Conformance Program

Technical Insight: Multi-Cluster AI Mesh

Optimizing for Inference: Sidecar Containers and WASM

Upgrading your K8s Clusters?