NVIDIA GTC 2026: From GPUs to AI Factories

From agentic operating systems to shifting inference economics, we break down the most critical enterprise takeaways from Jensen Huang’s latest showcase.

Share

NVIDIA’s 2026 GPU Technology Conference (GTC) in San Jose marked a decisive shift from selling standalone GPUs to orchestrating full “AI factories” that industrialise token generation, inference, and agentic workflows at planetary scale. Across four days and more than 1,000 sessions, over 30,000 developers, researchers, and executives converged under the theme “It all starts here,” with millions more watching Jensen Huang’s two-hour keynote online. The core narrative: demand for AI infrastructure is accelerating toward a projected 1 trillion dollars in GPU and system orders through 2027, driven by the rise of agentic AI, structured enterprise data, and physical AI in robotics and autonomous systems.​

GTC 2026 crystallised three big arcs for AI and data leaders: the Vera Rubin platform and a roadmap beyond Blackwell for inference-centric compute; the emergence of AI factories and agentic operating systems such as OpenClaw and NemoClaw; and a rapidly maturing ecosystem spanning clouds, sovereign AI deployments, and robotics partners who are betting on NVIDIA’s stack as the default substrate for intelligent systems. For data and AI teams, the message was clear: infrastructure, models, and applications are converging into vertically integrated platforms where governance of structured data, operational discipline, and ecosystem choices matter as much as raw FLOPS.

The Keynote: Tokens, Trillion‑dollar Demand, and an Inference Pivot

Jensen Huang’s keynote opened with tokens as the new atomic unit of computing, positioning AI infrastructure as a global capacity build‑out to generate, manipulate, and reason over tokens across text, vision, simulation, and control tasks. He argued that computing demand has grown by a factor of one million in recent years, underpinned by more than 150 billion dollars in venture funding flowing into AI‑native startups that will ride on this infrastructure. Against that backdrop, Huang raised NVIDIA’s AI demand outlook to at least 1 trillion dollars of GPU and system orders from 2025 through 2027, doubling earlier projections.​

Crucially, Huang framed GTC 2026 as the moment when inference overtakes training as the dominant workload, with enterprises shifting from a handful of large training runs to continuous, latency‑sensitive inference for agentic systems in production. This “inference inflection” is reflected not just in marketing language but in hardware design choices: Vera Rubin, Groq LPUs, and rack‑scale DSX AI Factory blueprints are all tuned for deterministic, efficient token serving at scale rather than general‑purpose compute alone.

Vera Rubin: NVIDIA’s Next‑Generation Full‑stack AI Platform

The centerpiece of GTC 2026 was Vera Rubin, introduced as the full‑stack successor to Blackwell and the engine for agentic AI at rack scale. Vera Rubin is presented as an integrated platform comprising seven new chips, five rack‑scale system designs, and a supercomputer blueprint, all co‑engineered as a single vertically optimized system. NVIDIA emphasises that “Vera Rubin” refers to the entire stack rather than a single GPU: CPUs, GPUs, network fabrics, storage adapters, and LPUs are tuned together to maximise end‑to‑end inference throughput per watt.

At the silicon level, the platform includes the new Vera CPU for sequential, control‑heavy workloads, Rubin GPUs for accelerated tensor compute, NVLink 6 switches for high‑bandwidth interconnect, ConnectX‑9 SuperNICs, Spectrum‑6 Ethernet switches, BlueField‑4 DPUs, and Groq 3 language processing units (LPUs) for low‑latency decode paths. NVIDIA stated that all seven chips in the Vera Rubin stack are in full‑scale production, underscoring that this is not a distant roadmap but a deployable product line for hyperscalers and enterprises planning multi‑year capacity. Early cloud partners, including Google Cloud, plan to offer Vera Rubin NVL72 rack‑scale systems in the second half of 2026 as part of their AI hypercomputer architectures.

A Roadmap from Blackwell to Feynman

GTC 2026 also provided a clearer long‑term roadmap, giving enterprise buyers more confidence to commit to multi‑year AI factory investments. The Vera Rubin generation is positioned as the 2026–2027 platform for agentic AI and inference‑heavy workloads, while the subsequent Feynman architecture, expected around 2028, introduces components such as the Rosa CPU, LP40 LPUs, BlueField‑5 DPUs, and next‑generation networking with both copper and co‑packaged optics. This staged roadmap, from Blackwell through Vera Rubin and on to Feynman, is designed to give hyperscalers, sovereigns, and large enterprises a predictable cadence for planning capex cycles and data center designs.

Industry analysts noted that this degree of transparency around architectures and reference designs is critical as AI infrastructure transitions from opportunistic spending to industrial‑scale deployment decisions measured in tens of billions of dollars. One example highlighted at GTC was a 27 billion dollar infrastructure deal between Nebius Group and Meta, which analysts interpreted as emblematic of how AI factories are now funded and executed as multi‑year industrial projects rather than incremental IT upgrades.

ALSO READ: 10 Hard Truths from the Cisco AI Summit

AI Factories: From GPUs to Token‑generating Plants

A recurring phrase across the keynote and partner sessions was “AI factory,” describing full‑stack infrastructure whose sole purpose is to generate intelligence and tokens at scale. In this framing, chips are just one layer in a five‑layer “cake” that spans energy, silicon, systems, models, and applications, with NVIDIA aiming to capture value at each level rather than stopping at GPU sales. Rack‑scale Vera Rubin systems, DSX AI Factory blueprints, and Omniverse DSX reference designs exemplify this approach by providing reproducible patterns to build, simulate, and operate AI plants that optimise both performance and energy efficiency.

These factories are already delivering tens of exaFLOPS of AI performance in single deployments, with reports of DGX SuperPOD‑style configurations hitting around 50 exaFLOPS of inference and 35 exaFLOPS of training capacity. The focus on inference and energy optimisation reflects an industry‑wide recognition that the bottleneck is no longer just compute density, but sustainable, operationally manageable AI production capacity that aligns with power and cooling constraints. For AI & data leaders, this pushes the conversation toward throughput, latency, and cost per token as primary KPIs for infrastructure planning.

Agentic AI: OpenClaw, NemoClaw, and the Enterprise OS for Agents

Another major storyline at GTC 2026 was the formalisation of agentic AI as an enterprise‑grade software layer, with OpenClaw and NemoClaw positioned as the operating system and toolkit for building and governing AI agents. OpenClaw is described as an open agentic framework, while NemoClaw provides NVIDIA‑optimised tooling, orchestration, and integrations with the company’s Nemotron family of open models and Agent Toolkit, enabling enterprises to build agents they can own, inspect, and control.

Huang likened this agentic layer to Linux for the AI era, emphasising a modular, open‑source foundation with NVIDIA‑backed enterprise distributions running on top. Partners such as HPE announced new “AI‑Q” blueprints that use these components to help customers design customised AI agents with explicit controls over data residency, observability, and security, bridging the gap between research agents and production‑ready systems in regulated industries. For data leaders, this signals that agent orchestration, policy enforcement, and evaluation will increasingly be first‑class citizens in the AI stack, not just add‑on MLOps concerns.

Structured Data and Governance Move to Centre Stage

A notable theme in the keynote was the elevation of structured enterprise data, metadata, and governance from background plumbing to core enablers of reliable AI. NVIDIA explicitly argued that structured data is the “ground truth” for enterprise AI, highlighting that every successful agentic or generative use case showcased on stage depended on clean, well‑modeled, and access‑controlled data under the hood.​

For data teams, this reinforces the idea that investments in catalogs, lineage, quality monitoring, and policy management are not optional; they are prerequisites for leveraging NVIDIA’s infrastructure effectively. Combined with sovereign AI initiatives such as Azure Local and HPE’s private cloud AI factories, GTC 2026 painted a picture where data governance, security, and locality are intertwined with infrastructure decisions, influencing which workloads run in public clouds versus on‑premises or in regulated environments.

Ecosystem Moves: Clouds, Sovereign AI, and Hyperscale Partners

Cloud providers used GTC 2026 to signal aggressive adoption of Vera Rubin and Blackwell‑class GPUs as they race to attract AI workloads. Google Cloud announced plans to be among the first to offer Vera Rubin NVL72 rack‑scale systems in the second half of 2026, integrating them into its AI Hypercomputer architecture for reasoning and agentic workloads. Microsoft’s Azure Local highlighted expanded support for NVIDIA RTX PRO 6000 and 4500 Blackwell Server Edition GPUs in sovereign AI environments, allowing regulated industries and governments to run models on‑premises while keeping data, inference, and governance under their direct control.

AWS, meanwhile, was cited as committing to deploy more than 1 million NVIDIA GPUs across its global regions in 2026, underscoring how cloud GPU capacity has become a strategic differentiator. Beyond the hyperscalers, infrastructure and networking vendors such as HPE announced expanded “NVIDIA AI computing by HPE” portfolios, including AI factories built on Vera Rubin, updated ProLiant servers, and networking solutions using HPE Juniper routers with coherent optics for distributed AI deployments. Micron and other component vendors also used GTC to highlight high‑bandwidth memory and storage advances tuned for Rubin‑class systems, reinforcing that AI factories are now a multi‑vendor industrial ecosystem rather than a single‑vendor stack.

ALSO READ: Cloud 3.0 and Data Sovereignty: Why Workload Placement Is Now a Strategic Decision

Enterprise Hardware: From DGX Station to RTX PRO Blackwell Fleets

On the enterprise hardware front, NVIDIA refreshed its portfolio from deskside systems to data center‑scale clusters. The new DGX Station was introduced as a deskside supercomputer powered by the GB300 Grace Blackwell Ultra Desktop Superchip, combining a 72‑core Grace CPU with a Blackwell Ultra GPU over NVLink‑C2C to deliver up to 20 petaflops of AI compute and 748 gigabytes of coherent memory. This configuration is pitched as capable of running trillion‑parameter models locally, targeting research labs and enterprises that need frontier‑class experimentation without a full data center build‑out.

For mid‑size organisations, DGX Spark emerged as a compact, up‑to‑four‑node cluster design, providing a more approachable entry point into AI factory‑style deployments. On the workstation side, NVIDIA and partners including Dell, HP, and Lenovo unveiled RTX PRO Blackwell‑based systems, with configurations offering up to 4,000 TOPS of local AI performance and new RTX PRO 4500 Blackwell Server Edition GPUs delivering on the order of 100× speed‑ups for vision AI compared with previous generations. These systems are positioned as day‑zero‑ready for Nemotron and community models, enabling developers and creators to prototype and deploy AI workloads on the same silicon used in production clusters.

Gaming, DLSS 5, and the GeForce Narrative

Although GTC is now dominated by data center and enterprise narratives, Huang took time to remind the audience that “the house that GeForce made” is the foundation for CUDA and NVIDIA’s modern AI dominance. The keynote revisited GeForce’s history and then introduced DLSS 5, a new generation of deep‑learning super sampling that uses 3D‑guided neural rendering to enable real‑time, photoreal 4K performance on local hardware. While not the focus for enterprise buyers, this segment reinforced the company’s claim that advancements in gaming GPUs and rendering pipelines continue to feed directly into AI innovations for simulation, digital twins, and graphics‑rich AI applications.​

For data and AI leaders, the GeForce and DLSS story matters less for consumer gaming and more as evidence of NVIDIA’s end‑to‑end control over hardware and software stacks, from consumer devices to AI factories. That breadth gives NVIDIA leverage in developer mindshare and ecosystem lock‑in, as tools and libraries built for gaming often become stepping stones into enterprise simulation and Omniverse‑based workflows.​

Physical AI: Robots, Autonomous Vehicles, and Simulation

Physical AI—where models act in the real world through robots, vehicles, and industrial systems—was another high‑visibility theme. Huang declared that “the ChatGPT moment of self‑driving cars has arrived,” signaling confidence that autonomous driving systems have reached an inflection point similar to conversational AI’s 2022–2023 breakout. NVIDIA spotlighted its IGX Thor edge AI platform as generally available, with adopters including Caterpillar, Johnson & Johnson, CERN, and Planet Labs using it for industrial automation, scientific instrumentation, and Earth observation workloads.

In robotics, NVIDIA introduced Isaac Lab 3.0, built on the new Newton physics engine and PhysX SDK, for large‑scale robot learning on DGX‑class infrastructure, with support for multi‑physics simulation and complex dexterous manipulation. This was paired with the unveiling of Isaac GR00T N models (with early access plus an N2 preview) and Cosmos 3, a unified foundational model that combines world generation, vision reasoning, and action simulation to accelerate general robotic intelligence in complex environments. Major partnerships with industrial and humanoid robotics companies—ABB, Agility Robotics, Figure, FANUC, KUKA, Universal Robots, YASKAWA, and others—rounded out the narrative that NVIDIA intends to be the default platform for physical AI development and deployment.

ALSO READ: The Unspoken Prerequisite by AWS: Enterprise AI Must Solve Modernisation First

NVIDIA in Orbit: Space‑1 Vera Rubin

One of the more unexpected storylines at GTC 2026 was NVIDIA’s explicit push into space computing. The company confirmed that future systems like NVIDIA Space‑1 Vera Rubin are being designed as AI data centers in orbit, extending accelerated computing beyond terrestrial data centers for the first time. The idea is to colocate AI inference directly with space‑borne sensors and scientific instruments, reducing downlink requirements and enabling real‑time analysis of observational data.

The naming of the Vera Rubin architecture for the astronomer who helped uncover dark matter, and the future Rosa CPU honoring Rosalind Franklin, was framed as a deliberate link between NVIDIA’s technology roadmap and scientific discovery. For AI & Data Insider’s audience, the takeaway is less about near‑term workloads in orbit and more about the long‑range direction: NVIDIA is positioning accelerated computing as a utility that extends wherever data is generated, whether in factories, hospitals, edge devices, or space platforms.

What it Means for AI and Data Leaders

For enterprise AI and data leaders, GTC 2026 cements several strategic shifts. First, AI infrastructure decisions are no longer about picking a GPU SKU; they are about choosing into (or hedging against) a vertically integrated AI factory architecture that spans chips, systems, networking, and agentic software. The Vera Rubin stack, Groq LPUs, and DSX factory blueprints are designed to make NVIDIA’s defaults the easiest path to production, which has profound implications for vendor concentration, negotiation leverage, and long‑term portability.

Second, the agentic AI layer is being productised rapidly, with OpenClaw, NemoClaw, and partner “AI‑Q” blueprints giving enterprises templates for building controllable, observable agents on top of NVIDIA infrastructure. This will increase pressure on data teams to operationalise governance, evaluation, and safety policies as part of the agent lifecycle rather than treating them as bolt‑on reviews.

Third, the prominence of structured data and sovereign AI at GTC 2026 should be read as a strong signal that infrastructure choices cannot be decoupled from data residency, compliance, and governance strategies. As clouds roll out Vera Rubin‑class systems and vendors ship on‑prem AI factories, organisations will need to make explicit decisions about which data and workloads live where, and how to instrument them for observability and control.

Finally, the broader ecosystem—from humanoid robotics to space‑based AI data centers—underscores that NVIDIA is framing AI as a planetary‑scale utility. For AI & Data Insider readers, the critical next step is not just tracking NVIDIA’s product roadmap, but interrogating how this AI factory paradigm reshapes budgets, architectures, and operating models across data, infrastructure, and application teams over the next three to five years.

ALSO READ: Inside IBM’s 11 Billion Dollar Bet: What the Confluent Deal Reveals About AI’s Investment Paradox

Anushka Pandit
Anushka Pandit
Anushka is a Principal Correspondent at AI and Data Insider, with a knack for studying what's impacting the world and presenting it in the most compelling packaging to the audience. She merges her background in Computer Science with her expertise in media communications to shape tech journalism of contemporary times.

Related

Unpack More