AWS, Cerebras Partner to Deliver Faster AI Inference

Amazon Web Services (AWS) and Cerebras Systems announced a partnership to introduce a new AI inference solution, which will be launched in the next couple of months.

It combines AWS Trainium chips and Cerebras CS-3 systems, delivered through Amazon Bedrock in AWS data centres.

The companies said the system separates inference workloads into two stages—prompt processing and output generation—and assigns each stage to different hardware.

AWS Trainium will process the prompt, while the Cerebras CS-3 will generate the output tokens. The two systems will be connected through AWS’s Elastic Fabric Adapter (EFA) networking.

“Inference is where AI delivers real value to customers, but speed remains a critical bottleneck for demanding workloads like real-time coding assistance and interactive applications,” said David Brown, Vice President of Compute and Machine Learning Services at AWS.

“What we’re building with Cerebras solves that: by splitting the inference workload across Trainium and CS-3, and connecting them with Amazon’s Elastic Fabric Adapter, each system does what it’s best at,” Brown said. “The result will be inference that’s an order of magnitude faster and higher performance than what’s available today.”

The system uses a technique called inference disaggregation, which separates inference into two phases: prefill and decode. Prefill processes the prompt and is parallel and compute intensive, while decode generates output tokens sequentially and relies more on memory bandwidth.

AWS said assigning the prefill stage to Trainium chips and the decode stage to the Cerebras CS-3 allows each architecture to handle the workload suited to its design.

“Partnering with AWS to build a disaggregated inference solution will bring the fastest inference to a global customer base,” said Andrew Feldman, Founder and Chief Executive Officer of Cerebras Systems. “Every enterprise around the world will be able to benefit from fast inference within their existing AWS environment.”

The system will run on AWS infrastructure built on the Nitro System, which the company said will maintain the same security and operational environment used across AWS services.

AWS said it will later offer open-source large language models and its Amazon Nova models on Cerebras hardware through Bedrock.

Trainium is AWS’s AI chip for training and inference workloads. The company said AI labs, including Anthropic and OpenAI, are using Trainium capacity on AWS infrastructure.

Cerebras’s CS-3 system is used by companies including OpenAI, Cognition, and Mistral for AI workloads such as code generation and reasoning models.

ALSO READ: NVIDIA Pumps $2 Bn in Nebius to Scale Hyperscale AI Cloud Infra

Join Our Core Community

Traceability Shifts Trust from Theoretical to Demonstrated

The AI Infra Deals That Defined Q1 2026

Inside the Orchestration Crisis: Why AI‑Driven Enterprises Need a Control Plane, Not More Tools

AI Sovereignty is Really About Managing Dependencies, Not Going it Alone

The Agentic AI Blast Radius: Capability, Control, and Consequences

Why Data Reliability Now Governs Scaling GenAI

Cloud 3.0 and Data Sovereignty: Why Workload Placement Is Now a Strategic Decision

Inside IBM’s 11 Billion Dollar Bet: What the Confluent Deal Reveals About AI’s Investment Paradox

“Synthetic Data Is Not the Ground Truth” — SandboxAQ’s VP of Engineering on Simulation’s Power and Limits

Data as the New Diagnostic: How Ahead Health is Turning Algorithms Into Preventive Care

OpenAI Announces Availability of GPT-5.5 & Codex on AWS

Lovable Brings Vibe-Coding to Smartphones with New Mobile App

GitHub Changes Copilot Pricing To Usage-Based Model Starting June

AWS Launches Desktop App for AI Assistant Amazon Quick

OpenAI Breaks Free From Microsoft Azure Exclusivity

AWS, Cerebras Partner to Deliver Faster AI Inference

Cerebras’s CS-3 system is used by companies including OpenAI, Cognition, and Mistral for AI workloads such as code generation and reasoning models.

OpenAI Announces Availability of GPT-5.5 & Codex on AWS

Lovable Brings Vibe-Coding to Smartphones with New Mobile App

Unpack More

OpenAI Announces Availability of GPT-5.5 & Codex on AWS

AWS Launches Desktop App for AI Assistant Amazon Quick

Amazon Ramps AI Investments with $200 Bn Capex Plan, Says Andy Jassy

Uber to Use AWS Chips for Trip Processing and AI Training

Why Data Reliability Now Governs Scaling GenAI

Middle East: The Sovereign AI Testbed US, EU and Asia Can Learn From

NVIDIA’s VP of Solutions Architecture on What It Actually Takes to Build a Sovereign AI Factory