OpenAI, Broadcom Unveil Custom Inference Chip Jalapeño for LLM Workloads

OpenAI and Broadcom have unveiled Jalapeño, a custom AI accelerator built for large language model (LLM) inference. The chip is OpenAI’s first in-house Intelligence Processor and reflects the company’s effort to build more of the infrastructure behind its AI products.

The companies said the chip was designed specifically for LLM inference rather than being adapted from existing AI accelerators. Engineering samples are already running machine learning workloads in the lab, including GPT-5.3-Codex-Spark, with OpenAI claiming early tests indicate the chip delivers performance per watt above current state-of-the-art systems.

Jalapeño is the first product in a multi-generation compute platform being developed by OpenAI and Broadcom, with deployments planned at gigawatt-scale data centres beginning in 2026 alongside partners including Microsoft.

“The world is moving to a compute-powered economy,” said Greg Brockman, president and co-founder of OpenAI. “Jalapeño is part of our long-term full-stack infrastructure strategy to make compute more abundant, resulting in AI which is faster, more reliable, more affordable for people and businesses, and can be used to solve more important problems.”

OpenAI said the chip was designed using insights from its model roadmap, serving systems, kernels and product requirements. Broadcom contributed silicon implementation and networking technologies, while Celestica worked on board, rack and system integration.

According to the companies, the architecture is intended to reduce data movement and better balance compute, memory and networking resources, allowing the hardware to operate closer to its theoretical performance limits.

“Jalapeño was designed from the ground up for LLM inference using detailed insights from our close collaboration with OpenAI researchers,” said Richard Ho, who leads OpenAI’s hardware program. “We optimised the architecture around the kernels, memory movement, networking, and serving patterns that matter most for frontier AI models.”

The announcement comes as AI companies seek alternatives to relying entirely on third-party accelerators for inference workloads. OpenAI said designing its own hardware allows it to optimise the stack across models, software, networking and infrastructure.

The company also claimed the accelerator moved from initial design to manufacturing tape-out in nine months, which it described as one of the fastest ASIC development cycles in advanced semiconductor development. OpenAI said its own AI models were used to accelerate parts of the chip design and optimisation process.

“The same models served to users are helping improve the infrastructure used to run future models,” the company said, adding that AI-assisted chip development could lower compute costs across the industry.

Broadcom CEO Hock Tan said the partnership extends beyond a single chip generation.

“Our collaboration with OpenAI represents a fundamental commitment to scaling the physical infrastructure required for the next decade of AI,” Tan said. “This is just the beginning of a multi-generation roadmap.”

OpenAI said improvements in inference efficiency could translate into faster ChatGPT responses, lower API costs, and more dependable access during periods of high demand. The company plans to share detailed technical performance data for Jalapeño in the coming months.

ALSO READ: The Playground is Closed: 10 Hard Truths from the Cisco AI Summit

Join Our Core Community

From Generic Models to Living Twins: A Practitioner’s Guide to ML in Design Workflows

Designing AI‑Ready Public Infrastructure: Global Lessons from India’s Aadhaar Builder

What “High-Risk AI” Actually Means for the Teams Running HR, Finance and Customer Ops

DXC’s LabX is Beating AI Theatre

Scaling Telehealth Without Scaling Fraud: The Case for an AI Trust Layer

Banks Are Drowning in Data and Starving for Insight

Unstructured Data, Deterministic Answers

Data Layer Precedes Compute, GPU Capacity in Sovereign AI

Why Data Reliability Now Governs Scaling GenAI

Cloud 3.0 and Data Sovereignty: Why Workload Placement Is Now a Strategic Decision