Shopify said it has replaced its merchant data extraction pipeline based on OpenAI’s GPT-5 with a multi-agent framework, utilising programmatic prompt optimisation to achieve superior performance on open-weight architecture.
It has reduced its per-unit operational costs for large language models by a factor of 75 while doubling the quality of its automated outputs.
Kshetrajna Raghavan, an applied machine learning engineer at the company, disclosed the metrics during a presentation at the Bay Area DSPy Meetup, detailing how the platform systematically overhauled its generative artificial intelligence architecture.
It achieved these results by migrating from a single-shot prompting approach using OpenAI’s GPT-5 to a multi-agent system powered by a self-hosted Qwen 3 model.
The company relies on these artificial intelligence tools to extract structured data from millions of highly customised, unstructured storefronts, answering questions about a merchant’s return policies, potential fraud signals, and tax categories.
Raghavan noted that relying on a monolithic system to process entire websites created unnecessary computational overhead and limited visibility.
“Moving from a single-shot GPT-5 prompt to a multi-agent architecture fundamentally changed our inference economics,” he said.
This architectural shift relies on DSPy, an open-source framework developed by Stanford University researchers that algorithmically optimises language model prompts and weights. By deploying DSPy, Shopify engineers transitioned away from manually crafted instructions to a programmatic system.
The engineering team deployed the open-weight Qwen 3 model—developed by Alibaba Cloud—within their own server infrastructure, directly eliminating API latency and third-party vendor markups.
Discussing the financial impact of this internal hosting strategy, Raghavan stated, “Replacing the proprietary API with a self-hosted Qwen 3 model reduced our per-unit LLM costs by 75 times.”
Rather than relying on one massive model to read an entire website simultaneously, the new programmatic system breaks the extraction process into smaller, specialised agent workflows.
In this setup, autonomous artificial intelligence modules—such as dedicated agents for fraud detection and tax coding—are given specific tools to browse a storefront and extract only the relevant data they need.
Furthermore, the specialised nature of these agents reduced hallucination rates and improved task adherence, the ML leader said.
Assessing the performance gains across their merchant-facing applications, Raghavan added that implementing the framework meant the engineering team “doubled our quality compared to the previous single-prompt baseline.”
ALSO READ: The Playground is Closed: 10 Hard Truths from the Cisco AI Summit
