Amazon Web Services (AWS) and Cerebras Systems announced a partnership to introduce a new AI inference solution, which will be launched in the next couple of months.
It combines AWS Trainium chips and Cerebras CS-3 systems, delivered through Amazon Bedrock in AWS data centres.
The companies said the system separates inference workloads into two stages—prompt processing and output generation—and assigns each stage to different hardware.
AWS Trainium will process the prompt, while the Cerebras CS-3 will generate the output tokens. The two systems will be connected through AWS’s Elastic Fabric Adapter (EFA) networking.
“Inference is where AI delivers real value to customers, but speed remains a critical bottleneck for demanding workloads like real-time coding assistance and interactive applications,” said David Brown, Vice President of Compute and Machine Learning Services at AWS.
“What we’re building with Cerebras solves that: by splitting the inference workload across Trainium and CS-3, and connecting them with Amazon’s Elastic Fabric Adapter, each system does what it’s best at,” Brown said. “The result will be inference that’s an order of magnitude faster and higher performance than what’s available today.”
The system uses a technique called inference disaggregation, which separates inference into two phases: prefill and decode. Prefill processes the prompt and is parallel and compute intensive, while decode generates output tokens sequentially and relies more on memory bandwidth.
AWS said assigning the prefill stage to Trainium chips and the decode stage to the Cerebras CS-3 allows each architecture to handle the workload suited to its design.
“Partnering with AWS to build a disaggregated inference solution will bring the fastest inference to a global customer base,” said Andrew Feldman, Founder and Chief Executive Officer of Cerebras Systems. “Every enterprise around the world will be able to benefit from fast inference within their existing AWS environment.”
The system will run on AWS infrastructure built on the Nitro System, which the company said will maintain the same security and operational environment used across AWS services.
AWS said it will later offer open-source large language models and its Amazon Nova models on Cerebras hardware through Bedrock.
Trainium is AWS’s AI chip for training and inference workloads. The company said AI labs, including Anthropic and OpenAI, are using Trainium capacity on AWS infrastructure.
Cerebras’s CS-3 system is used by companies including OpenAI, Cognition, and Mistral for AI workloads such as code generation and reasoning models.
ALSO READ: NVIDIA Pumps $2 Bn in Nebius to Scale Hyperscale AI Cloud Infra