Is CVE-2026-9912 publicly listed anywhere?

As of May 10, 2026, no public entry was visible on CVE.org or NVD for CVE-2026-9912. That means the identifier should be treated as unverified until a CNA or vendor publishes a record.

Can GPU shared memory really leak data across workloads?

Yes, in principle. Shared or local memory is a fast scratchpad, and if code reads elements that were never initialized, those bytes can reflect prior state unless the runtime or kernel guarantees scrubbing. Whether that crosses a real trust boundary depends on scheduler reuse, tenant isolation, and what data touched the buffer.

Why does JIT compilation matter for a memory leak bug?

Because PTX can be compiled by the driver at runtime, optimizer behavior is part of the deployed system. A defensive clear that appears in source may be transformed if the compiler concludes it is redundant, so teams need to validate generated behavior after driver changes.

How do I test for uninitialized shared-memory reads in CUDA?

Use NVIDIA Compute Sanitizer and run compute-sanitizer --tool initcheck --initcheck-address-space shared against kernels that rely on shared memory. Pair that with racecheck and synccheck, because synchronization bugs often hide the same root cause.

CVE-2026-9912 [Deep Dive]: GPU Shared Memory JIT Leak

A claimed issue tracked as CVE-2026-9912 describes information leakage from GPU shared memory after JIT-compiler optimization. The catch is important: as of May 10, 2026, there is no public record for that identifier on CVE.org or NVD. That does not make the underlying class imaginary. It makes this a useful case study in how accelerator runtimes, compiler passes, and scratchpad memory semantics can quietly turn a correctness bug into a tenant-isolation problem.

CVE Summary Card

Bottom Line

The identifier CVE-2026-9912 is not publicly documented as of May 10, 2026, but the claimed failure mode is credible enough that GPU teams should audit it now: if stale shared memory survives and an optimizer removes or weakens an intended clear, secrets can cross trust boundaries.

Identifier: CVE-2026-9912
Public status on May 10, 2026: no visible public record on CVE.org or NVD
Claimed issue type: information disclosure via GPU shared or local memory
Suspected trigger: JIT optimization interacting with incomplete initialization or divergent control flow
Likely impact if real: disclosure of remnants from prior work-items, blocks, kernels, or tenants depending on scheduler and isolation boundaries
What is verified from official docs: PTX can be JIT-compiled by the driver, CUDA shared memory persists for kernel execution, OpenCL exposes optional local-memory initialization controls, and NVIDIA tooling can inspect shared-memory initialization issues

The verified background matters. NVIDIA's CUDA documentation states that PTX may be JIT-compiled at application load time by the device driver, and the CUDA programming guide describes shared memory as a per-block scratchpad physically located on each SM. Khronos also documents clkhrinitialize_memory, an OpenCL extension specifically meant to initialize local and private memory before kernel execution. Those are not rumors; they are official design surfaces that make this class of bug plausible.

Vulnerable Code Anatomy

Why this class is dangerous

GPU programmers often treat shared memory as a fast temporary cache. Security breaks appear when code assumes one of these conditions without enforcing it:

Every element of the shared array is written before any thread reads it.
The compiler will preserve a defensive clear because the programmer wrote one.
Divergent control flow cannot expose stale bytes from inactive lanes.
Per-kernel scratch state cannot become visible outside the intended trust boundary.

The official CUDA guide emphasizes that shared memory is accessible by all threads in a block and that developers must avoid data races with __syncthreads(). That is a correctness statement, but in multi-tenant systems it becomes a security statement too.

A representative buggy pattern

__global__ void score_kernel(const uint32_t* secret, uint32_t* out, int live) {
    __shared__ uint32_t tile[256];
    int tid = threadIdx.x;

    // Only some lanes write.
    if (tid < live) {
        tile[tid] = secret[tid];
    }

    __syncthreads();

    // Later logic assumes the whole tile is initialized.
    if (tid < 256) {
        out[tid] = tile[tid];
    }
}

This is not an exploit and not a working PoC. It is the minimal anatomy of the problem. If live is smaller than the tile width, part of tile is never initialized. If a cleanup or zero-fill path was intended but later optimized away, the kernel can export stale bytes that were never meant to be observable.

Where the compiler enters the picture

A JIT compiler becomes relevant when developers rely on a pattern like this:

A pre-clear loop is emitted but only under a predicate the optimizer later proves redundant.
A memset-like initialization is hoisted, merged, or removed because later stores appear to dominate all reads.
One architecture path is ahead-of-time compiled while another falls back to PTX and is optimized by a newer driver.
Control-flow simplification changes which lanes are considered active before a barrier.

That last point is what makes JIT behavior operationally uncomfortable: the same source kernel may pass review, then behave differently after a driver update because the compiler is part of the deployed product.

Attack Timeline

What can actually be verified

May 10, 2026: no public CVE.org record was visible for CVE-2026-9912.
May 10, 2026: no public NVD detail entry was visible for CVE-2026-9912.
May 10, 2026: no matching vendor advisory was found in official CUDA or Khronos references using that identifier.
May 10, 2026: the underlying technical backdrop is public and current: driver-side PTX JIT is documented, shared memory behavior is documented, and OpenCL memory-initialization controls are documented.

That means the public timeline is not an exploit timeline yet. It is a verification timeline. For defenders, that distinction matters. A missing public record should not trigger complacency, but it should stop teams from repeating unsourced severity claims, invented version ranges, or fake vendor guidance.

Watch out: When a CVE ID is not publicly documented, do not attach guessed CVSS scores, affected driver versions, or exploitability labels to it. Use the uncertainty as a reason to audit the class of bug, not to manufacture facts.

Exploitation Walkthrough

Prerequisites an attacker would likely need

Ability to submit or influence GPU kernels in a shared runtime, service, browser, ML platform, or plugin model.
Reuse of the same physical GPU resources across trust boundaries without strong scrubbing guarantees.
A code path where shared or local memory is only partially initialized before readback.
A compilation path where JIT optimization changes the intended initialization behavior.
A way to move leaked bytes into global memory, host-visible buffers, logs, or model outputs.

Conceptual attack chain

An attacker identifies a kernel that uses shared memory as a tile, staging buffer, reduction cache, or temporary key schedule.
The attacker forces an execution shape where only part of the shared allocation is written, often by manipulating dimensions, predicates, or sparse work.
The runtime compiles the PTX on load, and an optimization pass removes, narrows, or reorders a defensive initialization that the developer thought was guaranteed.
A later read copies the partially stale scratchpad into a host-visible result buffer.
The attacker repeats the query to sample remnants and correlate bytes with previous workloads.

In a real incident, the practical target would be whatever high-value data briefly touches the GPU scratchpad: prompt fragments, embedding chunks, decrypted tokens, intermediate tensor slices, or application-specific secrets. That is why the blast radius depends less on the elegance of the bug and more on deployment architecture.

If your platform allows untrusted tenant code to run on the same accelerator fleet as sensitive inference or data-processing jobs, a shared-memory disclosure bug is not just a kernel bug. It is an isolation failure. Teams handling reproduction artifacts should also sanitize traces, dumps, and copied payload fragments before sharing them; TechBytes' Data Masking Tool is useful for redacting customer fields during incident handling.

Hardening Guide

Code-level fixes

Initialize the full shared allocation on every path, not just the lanes expected to be live.
Place initialization before the first read and follow it with a synchronization primitive such as __syncthreads() where required.
Avoid reading inactive or tail elements from tiles, reductions, and staging buffers.
Prefer explicit bounds-aware loops over clever control flow that depends on optimizer reasoning.
Review dead-store elimination risk whenever a clear is written only for safety, not for visible program output.

Compiler and runtime controls

For high-assurance paths, favor reproducible ahead-of-time binaries over relying purely on forward-compatible PTX.
Pin driver and toolkit versions in production where security-sensitive kernels are involved.
Re-test kernels after every driver upgrade that can change JIT behavior.
On OpenCL stacks, evaluate clkhrinitialize_memory and CL_CONTEXT_MEMORY_INITIALIZE_LOCAL_KHR where supported.

Detection and validation

Run NVIDIA Compute Sanitizer with --tool initcheck --initcheck-address-space shared during CI for kernels that use shared memory heavily.
Use racecheck and synccheck alongside init checks, because race conditions often mask as intermittent disclosure bugs.
Maintain regression tests that vary block sizes, sparsity, tail lengths, and divergent branches rather than just happy-path tensor shapes.
Audit logs and outputs for low-entropy fragments or repeated cross-request residue that may indicate stale scratchpad export.

Deployment controls

Separate untrusted GPU workloads from sensitive ones at the device, node, or pool level whenever possible.
Reduce code-injection surfaces that let users upload arbitrary kernels, custom ops, or unsafe plugins.
Treat accelerator fleets as shared compute with memory-hygiene obligations, not as opaque coprocessors.

Pro tip: Security reviews for GPU code should inspect the generated path, not just the source. If a kernel ships as PTX, the deployed compiler is effectively part of your trusted computing base.

Architectural Lessons

What this case teaches even without a public advisory

Scratchpad memory is a security boundary. Shared and local memory are often described as performance primitives, but in multi-tenant systems they also hold sensitive intermediate state.
Undefined initialization is not a minor bug. On CPUs it may surface as flaky correctness; on GPUs it can become structured cross-request disclosure.
JIT compilers expand the trusted surface. NVIDIA documents that PTX can be compiled by the driver at runtime, so compiler behavior is part of operational risk management.
Portability features can complicate assurance. Shipping PTX improves forward compatibility, but it also means security-relevant code generation can change after deployment.
Memory initialization should be explicit policy. Khronos exposing an extension for local-memory initialization is a signal that this is important enough to standardize.

The practical takeaway is straightforward. Even though CVE-2026-9912 is not publicly published as of May 10, 2026, engineering teams should use the claim as a forcing function to review shared-memory assumptions, inspect their PTX JIT paths, and decide which workloads truly deserve stronger accelerator isolation. Waiting for a perfect advisory is the wrong threshold when the memory model already tells you where the edge is.

For teams that want to review the official architecture backdrop, the relevant starting points are NVIDIA's CUDA programming guide on shared memory, NVIDIA's documentation on driver-side PTX JIT compilation, and Khronos documentation for clkhrinitialize_memory.

CVE-2026-9912 [Deep Dive]: GPU Shared Memory JIT Leak

Bottom Line

CVE Summary Card

Bottom Line

Vulnerable Code Anatomy

Why this class is dangerous

A representative buggy pattern

Where the compiler enters the picture

Attack Timeline

What can actually be verified

Exploitation Walkthrough

Prerequisites an attacker would likely need

Conceptual attack chain

Hardening Guide

Code-level fixes

Compiler and runtime controls

Detection and validation

Deployment controls

Architectural Lessons

What this case teaches even without a public advisory

Frequently Asked Questions

Get Engineering Deep-Dives in Your Inbox