OpenAI’s financial future depends significantly on managing infrastructure expenses, a pressing concern that spurred the creation of its custom “Jalapeño” chip. Built in partnership with Broadcom, this application-specific integrated circuit (ASIC) is a deliberate move to reduce the steep costs of relying on third-party hardware.
Nvidia enjoys profit margins estimated at 75% on its premium chips, whereas OpenAI faces slimmer margins—retaining about 33 cents in profit per dollar earned after covering enormous running costs. The economic challenge of operating large language models at a massive scale is formidable.
Maintaining ChatGPT’s server responsiveness cost OpenAI a colossal $8.4 billion last year. With weekly users now hitting 900 million, that running cost is expected to climb to roughly $14 billion this year. Looking ahead eight years, OpenAI has pledged approximately $1.4 trillion toward computing capacity—a huge gamble for a firm currently bringing in $25 billion in yearly income.
Crafting Hardware for LLM Inference
The OpenAI Jalapeño chip, labeled the company’s inaugural “Intelligence Processor,” is tailored specifically for large language model (LM) inference rather than broad AI tasks. OpenAI supplied the foundational architectural blueprint based on its own model plans and serving systems, while Broadcom oversaw the silicon design and high-speed networking integration.
TSMC manufactures the physical chips in Taiwan, and Celestica assembles the boards and rack systems. OpenAI reports that initial lab prototypes are already handling advanced workloads—including an unreleased GPT-5.3-Codex-Spark model—at the intended production speed and power levels.
Richard Ho, who leads OpenAI’s hardware division, explained that the design reduces unnecessary data movement to push actual utilization nearer to its theoretical maximum. Unlike general-purpose accelerators repurposed from older AI tasks, this architecture carefully balances processing power, memory, and networking resources to address the data-transfer bottlenecks inherent in interactive LLM serving.
To make this work at scale, the platform incorporates Broadcom’s Tomahawk networking silicon directly, enabling the custom processors to communicate across vast, clustered data center environments.
The Vertical Integration Flywheel
Venturing into custom silicon transforms OpenAI from a pure software entity into a vertically integrated infrastructure company. This full-stack approach covers the entire chain: chip design, software kernels, memory systems, network scheduling, and the end-user application layer. Similar to Apple’s seamless integration of proprietary hardware with iOS, OpenAI can now fine-tune its infrastructure around its precise internal model roadmaps.
This integration powers a self-reinforcing operational cycle. Better infrastructure efficiency reduces the cost of both training and deploying models. Cheaper serving enables superior, snappier products, which attracts more users and generates revenue that gets funneled back into developing the next wave of custom infrastructure.
Overcoming the Late-Mover Disadvantage
By launching its own chips, OpenAI steps into a field where key rivals have been refining proprietary hardware for close to ten years. Google started rolling out its Tensor Processing Units (TPUs) in 2015 and now manages about a quarter of global AI computing capacity outside Nvidia’s ecosystem.
Amazon has deployed over a million of its custom chips, while Meta and Microsoft continue expanding their own hardware infrastructures.
“Jalapeño is a key piece of our long-term, full-stack infrastructure strategy aimed at making compute more abundant,” stated Greg Brockman, OpenAI’s president and co-founder. “By designing more of the stack in-house, we can deliver greater intelligence more efficiently.”
To narrow this time gap, OpenAI fast-tracked the development timeline. The Jalapeño chip went from an initial concept to manufacturing tape-out—the final stage before physical production—in a mere nine months. The engineering teams hit this aggressive deadline by employing OpenAI’s own language models to automate and streamline parts of the hardware design workflow.
This establishes a distinctive feedback loop: the models being served to users are actively used to help construct the physical infrastructure that will power future versions. The first deployment of this hardware into data centers is slated to start by the close of 2026.
Broadcom CEO Hock Tan confirmed that the rollout will scale in tandem with infrastructure partners, including Microsoft, to gear up for gigawatt-scale data center integration.
(Photo by OpenAI)
See also: Omio scales travel product development using OpenAI models
Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events, click here for more information.
AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.



