Tech

OpenAI launches custom Jalapeño chip to cut inference costs and reduce Nvidia reliance

Early testing shows improved performance-per-watt, marking a strategic shift toward vertical integration in infrastructure as OpenAI seeks to optimise the economics of AI.

Author

Owen Mercer

Markets and Finance Editor

Published

Draft

Source: Hacker News · original

Artificial Intelligence Media Research

Related coverage

Explore Artificial Intelligence coverage Explore Media coverage Explore Research coverage More from the Tech desk

Tech

No image available

Partnering with Broadcom, the AI giant unveils its first bespoke processor designed for real-time coding workloads

OpenAI has officially unveiled Jalapeño, its first custom-built inference processor developed in collaboration with Broadcom. Designed specifically for the company’s inference systems, the chip utilised OpenAI’s own AI models during its development phase. Early testing indicates significantly improved performance-per-watt compared to current state-of-the-art alternatives, with a strategic focus on reducing operating costs for real-time coding models.

The partnership was formally announced in October, though OpenAI’s plans for custom silicon have long been rumoured as a method to decrease dependence on Nvidia’s GPUs. Google and Amazon have previously adopted similar strategies, building custom chips often referred to as “AI accelerators” to speed up machine learning workloads. Jalapeño is specifically engineered for inference, the process of running pre-built AI models in response to user commands, rather than the training phase.

OpenAI president Greg Brockman outlined the rationale behind the development on the company’s in-house podcast. He stated that the company sought to accelerate underserved workloads by leveraging its deep understanding of specific operational demands. “We’ve really been looking for specific workloads that are underserved, [and asking] how can we build something that will be able to accelerate what’s possible?” Brockman said.

While performance-intensive tasks such as pre-training are expected to continue relying on Nvidia hardware, optimising inference systems is viewed as a critical factor in the economics of AI. OpenAI emphasised that even small reductions in inference costs could significantly improve the company’s bottom line, particularly when running agentic products like Codex. The chip is still undergoing testing, and final performance metrics have not yet been confirmed.

This move represents a broader effort by OpenAI to design infrastructure across the entire stack. The company is developing chip architecture, kernels, memory systems, networking, scheduling, and deployment systems to ensure each layer is optimised around the same goal: making models faster, more reliable, and more affordable for users.

OpenAI launches custom Jalapeño chip to cut inference costs and reduce Nvidia reliance

More from Tech

US and Chinese AI experts urge cooperation to avert 'Chernobyl moment'

Rockstar Games Confirms GTA VI Physical Copies Will Be Digital-Only

Hulu’s default data settings quietly degrade picture quality for subscribers