Shopify Uses Qwen 3 to Cut Inference Cost by 75x

Shopify said it has replaced its merchant data extraction pipeline based on OpenAI’s GPT-5 with a multi-agent framework, utilising programmatic prompt optimisation to achieve superior performance on open-weight architecture.

It has reduced its per-unit operational costs for large language models by a factor of 75 while doubling the quality of its automated outputs.

Kshetrajna Raghavan, an applied machine learning engineer at the company, disclosed the metrics during a presentation at the Bay Area DSPy Meetup, detailing how the platform systematically overhauled its generative artificial intelligence architecture.

It achieved these results by migrating from a single-shot prompting approach using OpenAI’s GPT-5 to a multi-agent system powered by a self-hosted Qwen 3 model.

The company relies on these artificial intelligence tools to extract structured data from millions of highly customised, unstructured storefronts, answering questions about a merchant’s return policies, potential fraud signals, and tax categories.

Raghavan noted that relying on a monolithic system to process entire websites created unnecessary computational overhead and limited visibility.

“Moving from a single-shot GPT-5 prompt to a multi-agent architecture fundamentally changed our inference economics,” he said.

This architectural shift relies on DSPy, an open-source framework developed by Stanford University researchers that algorithmically optimises language model prompts and weights. By deploying DSPy, Shopify engineers transitioned away from manually crafted instructions to a programmatic system.

The engineering team deployed the open-weight Qwen 3 model—developed by Alibaba Cloud—within their own server infrastructure, directly eliminating API latency and third-party vendor markups.

Discussing the financial impact of this internal hosting strategy, Raghavan stated, “Replacing the proprietary API with a self-hosted Qwen 3 model reduced our per-unit LLM costs by 75 times.”

Rather than relying on one massive model to read an entire website simultaneously, the new programmatic system breaks the extraction process into smaller, specialised agent workflows.

In this setup, autonomous artificial intelligence modules—such as dedicated agents for fraud detection and tax coding—are given specific tools to browse a storefront and extract only the relevant data they need.

Furthermore, the specialised nature of these agents reduced hallucination rates and improved task adherence, the ML leader said.

Assessing the performance gains across their merchant-facing applications, Raghavan added that implementing the framework meant the engineering team “doubled our quality compared to the previous single-prompt baseline.”

ALSO READ: The Playground is Closed: 10 Hard Truths from the Cisco AI Summit

Join Our Core Community

Traceability Shifts Trust from Theoretical to Demonstrated

The AI Infra Deals That Defined Q1 2026

Inside the Orchestration Crisis: Why AI‑Driven Enterprises Need a Control Plane, Not More Tools

AI Sovereignty is Really About Managing Dependencies, Not Going it Alone

The Agentic AI Blast Radius: Capability, Control, and Consequences

Why Data Reliability Now Governs Scaling GenAI

Cloud 3.0 and Data Sovereignty: Why Workload Placement Is Now a Strategic Decision

Inside IBM’s 11 Billion Dollar Bet: What the Confluent Deal Reveals About AI’s Investment Paradox

“Synthetic Data Is Not the Ground Truth” — SandboxAQ’s VP of Engineering on Simulation’s Power and Limits

Data as the New Diagnostic: How Ahead Health is Turning Algorithms Into Preventive Care

OpenAI Announces Availability of GPT-5.5 & Codex on AWS

Lovable Brings Vibe-Coding to Smartphones with New Mobile App

GitHub Changes Copilot Pricing To Usage-Based Model Starting June

AWS Launches Desktop App for AI Assistant Amazon Quick

OpenAI Breaks Free From Microsoft Azure Exclusivity

Shopify Uses Qwen 3 to Cut Inference Cost by 75x

The engineering team deployed the open-weight Qwen 3 model within their own server infrastructure, directly eliminating API latency.

OpenAI Announces Availability of GPT-5.5 & Codex on AWS

Lovable Brings Vibe-Coding to Smartphones with New Mobile App

Unpack More

Shopify Builds Qwen3-32B Agent For 68% Cheaper Store Automations

Shopify Open-Sources Internal ML Platform Tangle

Microsoft Unveils Fara-7B Agentic Model

OpenAI’s New Partnership with Shopify, Etsy will Allow ChatGPT to Sell Products

Why Data Reliability Now Governs Scaling GenAI

Middle East: The Sovereign AI Testbed US, EU and Asia Can Learn From

NVIDIA’s VP of Solutions Architecture on What It Actually Takes to Build a Sovereign AI Factory