NVIDIA Jetson Thor: Local Benchmarks for Mistral & Gemma 3

NVIDIA has released updated benchmarks for its **Jetson Thor** robotics platform, showcasing a new era of "Local-First" generative AI. By running frontier models like **Mistral 3** and **Gemma 3** natively on-device, NVIDIA is targeting industrial environments where cloud latency and data security are non-negotiable barriers.

Technical Performance: Token Throughput

The Thor architecture, based on the Blackwell Blackwell GPU, features a dedicated Transformer Engine that optimizes local inference. Benchmarks show:

Mistral 3 (7B): 42 tokens per second (tps) at 4-bit quantization.
Gemma 3 (2B): 115 tokens per second, enabling near-instantaneous decision loops for autonomous mobile robots (AMRs).
Energy Efficiency: 80% reduction in power-per-inference compared to cloud-connected edge gateways.

Physical AI: The GTC Vision

At the heart of this release is the concept of **Physical AI**. By eliminating the cloud link, Jetson Thor allows robots to perform multimodal reasoning—interpreting visual sensors and voice commands simultaneously—within a single on-device compute cycle. This is critical for human-robot interaction in manufacturing and healthcare.