CVE-2026-15902: GPU Orchestrator Logic Flaws [2026]
Bottom Line
As of May 6, 2026, CVE-2026-15902 does not appear in public CVE records, but the underlying attack class is real and already documented in NVIDIA KAI Scheduler CVE-2026-24176. The critical lesson is that a scheduler-level tenant-binding flaw can undermine GPU isolation before the workload ever reaches the device.
Key Takeaways
- ›No public MITRE/NVD record matched CVE-2026-15902 on May 6, 2026.
- ›NVIDIA fixed the adjacent cross-namespace flaw in KAI Scheduler v0.13.0.
- ›The verified weakness is improper authorization via cross-namespace pod references.
- ›Multi-tenant risk comes from control-plane trust breaks, not only GPU driver bugs.
- ›Admission checks must bind namespace, queue, claim, and reservation as one security unit.
The headline CVE here needs one correction before any serious analysis starts: as of May 6, 2026, there is no public MITRE or NVD record for CVE-2026-15902. That does not make the threat model hypothetical. A closely matching, vendor-confirmed case already exists in NVIDIA KAI Scheduler's April 2026 security bulletin, where CVE-2026-24176 describes improper authorization through cross-namespace pod references in a multi-tenant GPU scheduler.
CVE Summary Card
Bottom Line
The missing public record for CVE-2026-15902 is itself a reminder to verify identifiers before triage. The actionable security story is the verified NVIDIA case: broken namespace-to-resource authorization in a GPU orchestrator can let one tenant tamper with another tenant's scheduling state.
- Requested identifier: CVE-2026-15902
- Public status on May 6, 2026: no matching public CVE entry located in MITRE/NVD searches
- Verified adjacent case: CVE-2026-24176 in NVIDIA KAI Scheduler
- Vendor description: improper authorization through cross-namespace pod references
- Severity: 4.3 Medium with vector AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:L/A:N per NVIDIA
- Affected versions: all versions prior to v0.13.0
- Remediation: update to v0.13.0 or later
Why spend time on a medium-severity scheduler bug? Because CVSS often underrates orchestrator flaws in shared AI platforms. A namespace-scoping error in a CPU-only batch scheduler is already dangerous. In a GPU control plane, it can become the first move in a larger chain involving unfair placement, claim hijacking, noisy-neighbor denial, or indirect data exposure via shared reservation state.
The official anchors for this analysis are NVIDIA's bulletin, the KAI Scheduler repository, and package metadata showing v0.13.0 published on Go Packages on March 2, 2026.
Vulnerable Code Anatomy
Where the trust boundary actually sits
Multi-tenant GPU orchestrators do much more than place pods. They map queue labels, pod groups, reservation objects, dynamic resource claims, admission mutations, and binder actions into a single state graph. The security boundary is not the GPU device node alone. It is the chain of references that says which tenant is allowed to consume, mutate, or wait on a particular reservation.
KAI Scheduler's public docs and issue traces expose the moving parts clearly enough to reason about the flaw class:
- The scheduler supports GPU Sharing, Dynamic Resource Allocation, queues, and PodGroups.
- The control plane includes components such as admission, binder, scheduler, and podgrouper.
- Public issue traces reference binder paths such as
pkg/binder/binding/resourcereservation/resource_reservation.go, which is where reservation state is coordinated.
That architecture is efficient, but it creates a classic logic-flaw hazard: one controller resolves a reference, another authorizes it, and a third performs the action. If any stage treats namespace, queue, owner, or claim identity as optional context instead of mandatory identity, the system can accept a cross-tenant edge that should never exist.
Illustrative vulnerable pattern
The following pseudocode is conceptual, not vendor source, but it captures the failure mode implied by the advisory:
func authorizeReservationRef(req PodSpec) bool {
ref := req.annotations["reservationRef"]
obj := reservations.Get(ref.name) // name-only lookup
if obj.queue == req.labels["kai.scheduler/queue"] {
return true
}
return false
}The mistake is subtle:
- The lookup is effectively cluster-wide or insufficiently scoped.
- The authorization decision trusts a user-controlled or weakly bound queue label.
- The object identity is not tied to namespace + queue + owner + claim as a single composite key.
Once you see the pattern, the exploit path is obvious even without device-level escape primitives. The attacker does not need to break CUDA memory isolation first. They only need to coerce the control plane into believing a foreign reservation belongs to their workload.
What changed in the fixed line
NVIDIA's public guidance is simply “upgrade to v0.13.0 or later,” but later release notes also show a broader hardening direction. The v0.13.0 line includes changes such as blocking pods with shared DRA GPU claims that lack a queue label or have a mismatched queue label. That is exactly the kind of tightening you would expect after discovering that identity fields were too loosely coupled.
Attack Timeline
- March 2, 2026: package metadata shows KAI Scheduler v0.13.0 published, establishing the fixed version line.
- April 2026: NVIDIA publishes its KAI Scheduler security bulletin covering CVE-2026-24176 and CVE-2026-24177.
- April 21, 2026: NVIDIA's bulletin revision history lists its initial release on this date.
- May 6, 2026: searches for CVE-2026-15902 still do not surface a public MITRE or NVD record, so defenders should avoid pinning response workflows to that identifier alone.
The practical takeaway from the timeline is that release artifacts can precede or outlast clean public vulnerability indexing. If your AI platform depends on third-party schedulers, package versions and vendor bulletins often matter more than whether every downstream database has caught up.
Exploitation Walkthrough
Preconditions
- The attacker already has legitimate access to one tenant namespace.
- The cluster uses a GPU-aware scheduler with reservation or claim indirection.
- Authorization is checked on references, but not on the full tenant context behind those references.
Conceptual attack sequence
- The attacker studies accepted workload fields such as labels, annotations, PodGroup references, reservation names, or DRA claim handles.
- They identify a field path where the scheduler dereferences an object by name or loose selector rather than by strict namespace-owner binding.
- They submit a crafted workload that points at, collides with, or shadows another tenant's scheduler object.
- The admission layer accepts the object because the reference looks structurally valid.
- The binder or scheduler resolves the reference and updates scheduling state as if the attacker's workload were entitled to it.
- The result is one of three outcomes: unauthorized placement, reservation theft, or state tampering that starves or blocks another tenant.
Why this matters even without VRAM reads
Security teams often prioritize GPU bugs only when they promise direct memory disclosure. That is too narrow. A scheduling flaw can still cause serious damage:
- Availability impact: an attacker can delay, starve, or deadlock expensive training and inference jobs.
- Integrity impact: tenant B's job may run under an altered reservation topology or resource envelope.
- Confidentiality side effects: once placement guarantees break, downstream assumptions about data locality, dedicated nodes, or trusted peers may also fail.
In other words, the orchestrator is part of the trusted computing base. If it lies about ownership, the rest of the platform inherits that lie.
Hardening Guide
Immediate remediation
- Upgrade KAI Scheduler to v0.13.0 or later wherever NVIDIA's bulletin applies.
- Audit workloads for queue-label drift, especially where queue identity is set by user manifests rather than admission policy.
- Review RBAC on scheduler CRDs, reservation objects, and any API used by binder or pod-group controllers.
Control-plane defenses that actually matter
- Use composite authorization keys. Every reservation lookup should bind namespace + queue + owner UID + claim UID.
- Deny cross-namespace references by default. Make exceptions explicit, typed, and logged.
- Move trust from labels to server-owned fields. User-editable labels are useful selectors, not identity roots.
- Re-validate at action time. Authorization at admission is not enough if the binder can act on stale or reinterpreted references later.
- Write negative tests. Most orchestration teams test that valid references work. Fewer test that near-valid foreign references fail.
Operational hygiene for AI platforms
- Separate workload namespaces from scheduler system namespaces, following KAI's own installation guidance.
- Tag audit events when reservation creation, claim binding, and GPU assignment cross controller boundaries.
- Mask real tenant identifiers in shared incident artifacts before sending traces across teams; TechBytes' Data Masking Tool is useful when support logs contain namespace, queue, or dataset names.
- Continuously diff effective authorization state against desired tenancy policy, not just Kubernetes YAML.
Architectural Lessons
1. GPU security starts above the GPU
The industry still frames AI infrastructure risk around drivers, runtimes, and device isolation. Those layers matter, but modern GPU platforms are orchestrator-heavy systems. A scheduler that misbinds tenant identity can nullify perfect device isolation by sending the wrong work, claim, or reservation into the wrong execution path.
2. Reference graphs are attack surfaces
Kubernetes-native AI platforms are built from graphs: pods reference groups, groups reference queues, queues imply quotas, claims point to devices, binders create reservations, and controllers reconcile all of it asynchronously. Every edge in that graph is an authorization decision. If even one edge uses convenience semantics such as name-only lookup, inferred namespace, or mutable label trust, attackers get room to maneuver.
3. Medium CVSS can still mean high blast radius
CVE-2026-24176 is not scored like a critical RCE, but blast radius is environment-dependent. In a premium multi-tenant GPU cluster, a single unauthorized reservation mutation can disrupt jobs worth far more than a typical “medium” label suggests.
4. Release notes are security documents
One of the most useful signals in this case is not the CVE text itself but the surrounding release behavior: a fixed version line, stronger queue-label validation, and continued tightening around shared GPU claims. For platform teams, that is a reminder to read scheduler changelogs the same way they read kernel advisories.
The durable lesson is simple. Whether the identifier eventually lands as CVE-2026-15902, stays private, or turns out to be a mistaken reference, the exploit class is already here in public: multi-tenant GPU orchestrators fail dangerously when tenant identity is reconstructed from weak references instead of enforced as a first-class security boundary.
Frequently Asked Questions
Is CVE-2026-15902 a real public CVE as of May 6, 2026? +
What was actually vulnerable in NVIDIA KAI Scheduler? +
Why are scheduler logic flaws dangerous in multi-tenant GPU clusters? +
How should teams harden GPU orchestrators against this class of bug? +
namespace, queue, owner UID, and claim identity together; reject cross-namespace edges by default; and add end-to-end negative tests that submit foreign references and expect a hard deny.Get Engineering Deep-Dives in Your Inbox
Weekly breakdowns of architecture, security, and developer tooling — no fluff.