NVIDIA’s VP of Solutions Architecture on What It Actually Takes to Build a Sovereign AI Factory

Saudi Arabia is scaling toward 500-megawatt AI campuses. Deutsche Telekom is building a billion-euro Industrial AI Cloud in Munich. But the hardest part isn't the money — it's the architecture. NVIDIA's Marc Hamilton on the first 90 days of a sovereign AI factory build.

Share

Across Europe and the Middle East, “AI factories” are moving from keynote slides to concrete, steel, and megawatt‑scale power contracts. Deutsche Telekom’s new Industrial AI Cloud in Munich, built on nearly 10,000 NVIDIA Blackwell GPUs and positioned as a sovereign “Deutschland Stack” for industry and government, is one early signal of how seriously European incumbents now take AI‑native infrastructure. In parallel, Saudi Arabia’s HUMAIN initiative is planning AI factories with up to 500 megawatts of NVIDIA‑powered capacity, anchored by an 18,000‑GPU Grace Blackwell supercomputer—numbers that would have sounded implausible even a few years ago.

At the same time, NVIDIA is pushing data‑centre design into unfamiliar territory. The forthcoming Vera Rubin NVL72 systems are built for warm‑water, direct liquid cooling at around 45 degrees Celsius, enabling operators to reduce reliance on traditional chiller plants and push rack power densities past 100 kW while still maintaining efficiency. On the design side, the Omniverse DSX blueprint now lets operators build full digital twins of entire AI sites—down to airflow, thermal behaviour, and electrical loading—for gigawatt‑class AI factories before a single pipe or cable is installed.

Few people sit closer to this shift than Marc Hamilton, Vice President of Solutions Architecture and Engineering at NVIDIA, who has helped design and deliver systems ranging from the Cambridge‑1 supercomputer for UK healthcare to next‑generation AI data centres for governments, telecoms, and enterprises worldwide. In this conversation with AI & Data Insider, Hamilton argues that sovereignty should be treated as a design advantage, not a constraint, and explains what CIOs and public‑sector technology leaders get right—and wrong—when they commit billions in capex to their first sovereign AI factory.

“Treat sovereignty as a design advantage, not a constraint”

Marc, you were deeply involved in overseeing the construction of some powerful supercomputers in the UK. We are seeing a massive push from nations and highly regulated multinational enterprises to build localised, sovereign AI grids. For a global CIO or government IT leader tasked with building a Sovereign AI Factory from the ground up today, what are the most critical architectural decisions they must make in the first 90 days to avoid scaling bottlenecks down the road?

Marc Hamilton: The first step is to treat sovereignty as a design advantage, not a constraint. Data residency, regulatory compliance, encryption, and audit‑trail requirements have to be embedded into the architecture from day one rather than bolted on at the edge.

Second, you want a modular reference architecture aligned to validated designs. Choosing a certified reference architecture—such as NVIDIA DGX SuperPOD or validated partner solutions—gives you repeatable, testable building blocks that scale without constant re‑engineering.

Finally, you need to lock down your network fabric and interconnect strategy very early. The single hardest thing to retrofit in a sovereign AI factory is the high‑speed interconnect layer. Decisions around NVLink domains, front‑end versus back‑end network segmentation, and confidential‑computing boundaries at the rack level all need to be made in the first 90 days, because they cascade into every power, cooling, and storage choice that follows.

ALSO READ: NVIDIA GTC 2026: From GPUs to AI Factories

Designing for 45°C Liquid Cooling and Megawatt‑class Racks

These localised sovereign deployments demand an incredible amount of power—for example, the AI factories NVIDIA is helping build in Saudi Arabia are projected to reach up to 500 megawatts of capacity. You have also argued that sustainable computing must be addressed as a macro, data centre scale problem rather than just at the server level. How must these sovereign facilities be physically engineered today to handle the extreme power densities and advanced liquid cooling requirements of upcoming platforms like the Vera Rubin architecture?

Hamilton: You need to design for 100% direct liquid cooling from the outset. The NVIDIA Vera Rubin NVL72 racks are designed to be fully liquid‑cooled using 45°C warm water. Power densities are already pushing past 120–130 kW per rack and heading toward megawatt‑class racks, so air‑only designs simply won’t scale.

At the electrical level, adopting an 800 VDC power‑distribution architecture is key to future‑proofing the backbone. NVIDIA is leading the industry transition to 800 VDC to support 1 MW‑plus IT racks reliably and efficiently.

The third pillar is using digital twins to co‑design the building and the IT stack at the same time. NVIDIA Omniverse DSX integrates architectural, power, mechanical, and electrical engineering teams into a single AI‑factory digital twin, so you can optimise airflow, liquid‑cooling loops, and power delivery before construction begins and continue to refine the design as the hardware roadmap evolves.

“Every byte must be served locally at GPU speed”

You’ve noted before that as AI models grow, the challenge isn’t just about compute, but about the sheer physics of moving massive amounts of data simultaneously. For a sovereign AI infrastructure where proprietary national or enterprise data strictly cannot cross borders, how should a CTO approach the data intelligence and storage layerr—such as integrating high-performance solutions like DDN— to ensure the localised GPU clusters aren’t left starving for data?

Hamilton: You need a parallel, AI‑optimised storage platform that can sustain GPU utilisation above 99%. When proprietary national data cannot leave the border, every byte has to be served locally at GPU speed, otherwise you’re wasting some of the most valuable compute on the planet.

Practically, that means a tiered data pipeline: a high‑performance parallel file system for training, an object store for archives, and a caching layer that shields the storage system from burst demand. That combination lets you feed training and inference at line‑rate without over‑provisioning your most expensive storage tier.

Just as important is unified observability and multi‑tenant isolation within the storage layer itself. Sovereign clouds typically serve multiple government agencies, research institutions, and enterprises simultaneously, so you need clear visibility into who is consuming what, and strong guardrails between tenants, without sacrificing throughput.

Making Upgrades a Maintenance Event, Not a Construction Project

NVIDIA has moved to a highly aggressive one-year hardware rhythm, transitioning rapidly from Blackwell to the upcoming Vera Rubin architecture. Given that nations and enterprises are currently investing billions of dollars in CapEx to stand up these Sovereign AI factories, how do you advise CTOs to build ‘future-proof’ AI data centers today so they aren’t forced into complete tear-downs every time a new GPU architecture is released?

Hamilton: The most important thing is to design the facility envelope—power, cooling, and structure—for multi‑generational use. The Vera Rubin NVL72 uses the same 45°C water temperature and similar airflow assumptions as Grace Blackwell, and the modular, cable‑free tray design cuts assembly time from over 1.5 hours to roughly five minutes. If the building shell, liquid‑cooling loops, and 800 VDC backbone are designed to spec, then a GPU‑tray refresh becomes a maintenance event, not a construction project.

Second, you should leverage NVIDIA Omniverse DSX to simulate upgrade paths before you commit capital. The DSX platform was built specifically to enable “multi‑generation” data centres—allowing engineering teams to model how a facility built for Blackwell today will accommodate Vera Rubin, Rubin Ultra, and future racks, under different power and cooling constraints.

Finally, decouple your storage, networking, and software investments from the GPU refresh cycle as much as possible. Storage platforms, high‑speed networking fabrics like InfiniBand and Spectrum‑X, and the NVIDIA software stack—CUDA, NIM, NeMo—are all designed to carry forward across GPU generations, so you’re not ripping and replacing the entire stack every year.

Structuring Sovereign AI Ecosystems for National Value

We are seeing Sovereign AI initiatives increasingly rely on public-private partnerships, where governments collaborate with local telecommunications providers, research institutions, and private enterprises to build sovereign clouds. What is the blueprint for successfully structuring joint AI ecosystems to serve national interests while simultaneously driving local enterprise innovation?

Hamilton: The most successful models anchor the ecosystem around a flagship national use case that clearly demonstrates public value, then open access to a broader community. For example, a national AI compute platform focused on healthcare can be stood up rapidly using a modular supercomputing architecture, with founding partners—hospitals, research institutes, pharmaceutical companies—receiving compute access while retaining all of their IP.

“The most successful sovereign AI ecosystems serve governments, enterprises, and start‑ups on the same infrastructure, with strict isolation but shared acceleration.”

From a partnership standpoint, you want telecom operators to provide sovereign infrastructure while NVIDIA and technology partners deliver the AI platform stack. This is now the dominant pattern in Europe. Deutsche Telekom’s Industrial AI Cloud in Munich is a good illustration: Telekom provides the physical infrastructure, while NVIDIA DGX and Omniverse platforms underpin the AI factory, SAP’s components form the “Deutschland Stack” for sovereignty and compliance, and an expanding ecosystem of partners including Siemens, Agile Robots, and others bring domain‑specific workloads.

ALSO READ: Cloud 3.0 and Data Sovereignty: Why Workload Placement Is Now a Strategic Decision

The final ingredient is a tiered access model: a sovereign‑government tier, an enterprise tier, and a startup/research tier—all running on the same physical infrastructure with strict multi‑tenant isolation. The most successful sovereign AI ecosystems are the ones that can serve multiple constituencies—public sector, large enterprises, and innovators—simultaneously, rather than building three separate, siloed platforms.

For enterprise and public‑sector leaders steering their own AI roadmaps, Hamilton’s message is both reassuring and demanding: sovereign AI is no longer an experiment, but it will only work if infrastructure, data, and ecosystem design are treated as first‑order strategic choices rather than procurement details. In a world of 500‑megawatt AI factories, warm‑water liquid cooling, and yearly GPU refresh cycles, the winners will be those who design AI data centres as living, multi‑generation platforms—anchored in national priorities, yet flexible enough to absorb the next wave of models, agents, and physical‑AI workloads without tearing everything down and starting again.

Anushka Pandit
Anushka Pandit
Anushka is a Principal Correspondent at AI and Data Insider, with a knack for studying what's impacting the world and presenting it in the most compelling packaging to the audience. She merges her background in Computer Science with her expertise in media communications to shape tech journalism of contemporary times.

Related

Unpack More