Nvidia Blackwell Ultra: “Agentic AI” Taken to a New Level

TECH NEWS – Instead of focusing on GeForce RTX 5000 Super cards, Nvidia is focusing on AI.

 

Since its initial boom in 2022, the AI industry has grown in complexity, and we are currently witnessing a significant shift toward “agentic” computing. This shift is being driven by applications and wrappers based on edge models. Concurrently, it has become imperative for infrastructure providers, such as Nvidia, to possess ample memory bandwidth and performance to satisfy the latency demands of agent frameworks. With Blackwell Ultra, Nvidia has achieved this goal. In a new blog post, Nvidia tested Blackwell Ultra on SemiAnalysis‘s InferenceMAX, and the results are impressive.

Nvidia‘s first infographic highlights a metric called “token/watt”, which is one of the most important numbers to consider in current hyperscaler designs. The company has focused on both raw performance and throughput optimization. With the GB300 NVL72, Nvidia sees a 50x increase in throughput per megawatt compared to Hopper GPUs. The comparison below shows the best possible deployed state for each architecture.

Nvidia infographic comparing token/watt and throughput for GB300 NVL72 versus Hopper

Nvidia is proud of its NVLink technology. It has expanded to the Blackwell Ultra 72 GPU configuration, combining them into a single, unified NVLink fabric with a connection speed of 130 TB/s. Compared to Hopper‘s 8-chip NVLink design, Nvidia‘s architecture, rack design, and, most importantly, the NVFP4 precision format are superior, which is why the GB300 dominates in terms of transfer speed.

Nvidia infographic highlighting NVLink scale and Blackwell Ultra (GB300 NVL72) design

In light of the wave of “agentic AI”, Nvidia‘s GB300 NVL72 tests focus on token costs and the aforementioned updates. Nvidia has achieved a 35-fold reduction in costs, calculated in millions of tokens, making this the most attractive inference option for edge labs and hyperscalers. The laws of scaling remain unchanged, yet they are evolving at an unprecedented pace. The main catalysts for these performance updates are Nvidia‘s exceptional co-design structure and the so-called Huang’s Law.

Given the incremental differences between compute nodes and architectures, comparisons with Hopper are somewhat unfair, so Nvidia also compared the GB200 with the GB300 (NVL72s) for long-context workloads. Maintaining the state of the entire code base requires aggressive token usage, making context a significant constraint for agents. With Blackwell Ultra, Nvidia achieves a cost reduction of up to 1.5x and attention processing that is twice as fast, making it well suited for agent workloads.

Blackwell Ultra is currently undergoing hyperscaler integration, so these are among the first benchmarks for this architecture. It appears that Nvidia has maintained performance scalability in line with modern AI use cases. Even better performance can be expected from the Blackwell generation with Vera Rubin, which is one reason Nvidia currently dominates the infrastructure competition.

Source: WCCFTech, Nvidia

Avatar photo
Anikó, our news editor and communication manager, is more interested in the business side of the gaming industry. She worked at banks, and she has a vast knowledge of business life. Still, she likes puzzle and story-oriented games, like Sherlock Holmes: Crimes & Punishments, which is her favourite title. She also played The Sims 3, but after accidentally killing a whole sim family, swore not to play it again. (For our office address, email and phone number check out our IMPRESSUM)