The Silicon Schism: Why OpenAI is Testing Cerebras and Groq for Inference

Beyond the GPU: The Race for Inference Efficiency

Internal reports from within OpenAI suggest a growing strategic shift away from general-purpose GPUs for large-scale inference tasks. The company has reportedly begun pilot programs with specialized silicon from Cerebras and Groq to power its high-demand services like Codex and ChatGPT. While NVIDIA remains the king of training, the sheer cost and energy consumption of running trillion-parameter models on H100s is driving labs to seek highly-optimized inference-specific hardware.

This "Silicon Schism" could redefine the AI hardware market in 2026. Startups like Groq offer deterministic latency and massive throughput through their LPU (Language Processing Unit) architecture, which aligns perfectly with OpenAI\s need for rapid agentic responses. As infrastructure evolves, the roles of engineers will shift from managing clusters to optimizing model-to-hardware mappings. Use our Job Replacement Checker to see how these hardware shifts impact your career in AI engineering. Read more on Morningstar →