NVIDIA Blackwell Ultra: MLPerf Records & the Marvell Optical Interconnect Deal

NVIDIA has once again redefined the ceiling of AI performance with the release of the Blackwell Ultra (B300) benchmarks in the latest MLPerf Training v6.0 suite. The B300 has shattered every existing record, particularly on the DeepSeek-R1 reasoning model, showing a staggering 2.7x performance gain over the original Blackwell (B200) launched just a year ago. This leap is attributed to the new FP4 Precision Engine and a massive architectural pivot toward optical networking.

In a move that solidifies its dominance over the data center, NVIDIA also announced a $2 billion strategic alliance with Marvell. This deal focuses on integrating Marvell's optical interconnect technology directly into the NVLink-6 fabric. This isn't just a hardware refresh; it's a fundamental shift from copper to photons, aiming to eliminate the I/O bottleneck that has plagued trillion-parameter model training.

MLPerf Breakdown: The FP4 Revolution

The headline story of the MLPerf results is the effectiveness of FP4 (4-bit Floating Point) precision. Previously thought to be too low-fidelity for complex reasoning, NVIDIA's Quasar-4 Quantization algorithm allows the Blackwell Ultra to maintain 99.9% accuracy on DeepSeek-R1 while doubling the throughput compared to FP8. This is a game-changer for Inference-Time Compute, where the cost of tokens is the primary constraint.

In the Llama 3.5 (400B) training benchmark, a cluster of 32,768 Blackwell Ultra GPUs completed the training run in record time, achieving a **Model FLOPs Utilization (MFU)** of 68%. This is unheard of at this scale and is largely due to the improved HBM4 memory bandwidth, which now hits a peak of 8.5 TB/s per GPU. The "Ultra" branding is well-earned, as the thermal management system has also been redesigned to support a 1200W TDP per socket.

Blackwell Ultra (B300) Specs

Transistors: 416 Billion (Dual-Die)
FP4 Performance: 40 PetaFLOPS
HBM4 Capacity: 288GB
Interconnect: 2.4 Tbps NVLink-6 (Optical Ready)

The Marvell Deal: Lighting Up the AI Fabric

The bottleneck for the next generation of AI "factories" is no longer the GPU itself, but the connectivity between racks. Copper cables are reaching their physical limits in terms of distance and signal integrity. The **NVIDIA-Marvell alliance** aims to solve this by moving the optical transceivers onto the GPU package itself. This Silicon Photonics approach reduces latency by 40% and power consumption by 30%.

Marvell's Tenebris optical DSPs will be used to drive the new **1.6T and 3.2T optical links** that will debut with the **Vera Rubin** architecture in 2027. By securing this deal now, NVIDIA is effectively locking out competitors like AMD and Intel from the highest-performance optical interconnect supply chain for the next 24 months. This is a strategic "moat-building" exercise at the hardware layer.

DeepSeek-R1 Optimization: The New Benchmark

The choice of DeepSeek-R1 as a primary MLPerf target is a nod to the growing importance of Reasoning-as-a-Service. Unlike standard LLMs, reasoning models require high **KV-cache** performance and frequent synchronization across the GPU cluster. The Blackwell Ultra's **Confidential Computing** engine allows for these large-scale syncs to happen securely, a requirement for the enterprise customers NVIDIA is targeting.

With the B300, NVIDIA is also launching the NV-Reasoner SDK, a set of libraries optimized for Monte Carlo Tree Search (MCTS) and other "thinking" algorithms. By co-designing the hardware and the reasoning libraries, NVIDIA ensures that any developer building on Blackwell Ultra gets a "turn-key" performance advantage that is nearly impossible to replicate on generic hardware.

Technical Insight: FP4 vs. FP8

Is FP4 the new gold standard for inference? Read our deep dive into Quantization Scaling Laws to understand why NVIDIA is betting the future of Blackwell on 4-bit math.

Read the Benchmarks →

Conclusion: The Infrastructure Lead Widens

The NVIDIA Blackwell Ultra is more than just a spec-bump; it is the first "optical-era" GPU. By shattering MLPerf records and securing the Marvell optical deal, NVIDIA has ensured that the road to Artificial General Intelligence continues to run through their silicon. The 2.7x gain on DeepSeek-R1 proves that architectural innovation is still far from the point of diminishing returns.

As we look toward the 2027 launch of Vera Rubin, the Blackwell Ultra serves as the perfect bridge, bringing **Silicon Photonics** and **FP4 math** into the mainstream. For the data center operators and AI labs, the message is clear: if you want to train the world's most intelligent models, there is only one choice.