OpenAI launches custom Jalapeño chip to cut inference costs and reduce Nvidia reliance
Early testing shows improved performance-per-watt, marking a strategic shift toward vertical integration in infrastructure as OpenAI seeks to optimise the economics of AI.
OpenAI has officially unveiled Jalapeño, its first custom-built inference processor developed in collaboration with Broadcom. Designed specifically for the company’s inference systems, the chip utilised OpenAI’s own AI models during its development phase. Early testing indicates significantly improved performance-per-watt compared to current state-of-the-art alternatives, with a strategic focus on reducing operating costs for real-time coding models.
The partnership was formally announced in October, though OpenAI’s plans for custom silicon have long been rumoured as a method to decrease dependence on Nvidia’s GPUs. Google and Amazon have previously adopted similar strategies, building custom chips often referred to as “AI accelerators” to speed up machine learning workloads. Jalapeño is specifically engineered for inference, the process of running pre-built AI models in response to user commands, rather than the training phase.
OpenAI president Greg Brockman outlined the rationale behind the development on the company’s in-house podcast. He stated that the company sought to accelerate underserved workloads by leveraging its deep understanding of specific operational demands. “We’ve really been looking for specific workloads that are underserved, [and asking] how can we build something that will be able to accelerate what’s possible?” Brockman said.
While performance-intensive tasks such as pre-training are expected to continue relying on Nvidia hardware, optimising inference systems is viewed as a critical factor in the economics of AI. OpenAI emphasised that even small reductions in inference costs could significantly improve the company’s bottom line, particularly when running agentic products like Codex. The chip is still undergoing testing, and final performance metrics have not yet been confirmed.
This move represents a broader effort by OpenAI to design infrastructure across the entire stack. The company is developing chip architecture, kernels, memory systems, networking, scheduling, and deployment systems to ensure each layer is optimised around the same goal: making models faster, more reliable, and more affordable for users.


