TECH NEWS – Instead of focusing on GeForce RTX 5000 Super cards, Nvidia is focusing on AI.
Since its initial boom in 2022, the AI industry has grown in complexity, and we are currently witnessing a significant shift toward “agentic” computing. This shift is being driven by applications and wrappers based on edge models. Concurrently, it has become imperative for infrastructure providers, such as Nvidia, to possess ample memory bandwidth and performance to satisfy the latency demands of agent frameworks. With Blackwell Ultra, Nvidia has achieved this goal. In a new blog post, Nvidia tested Blackwell Ultra on SemiAnalysis‘s InferenceMAX, and the results are impressive.
Nvidia‘s first infographic highlights a metric called “token/watt”, which is one of the most important numbers to consider in current hyperscaler designs. The company has focused on both raw performance and throughput optimization. With the GB300 NVL72, Nvidia sees a 50x increase in throughput per megawatt compared to Hopper GPUs. The comparison below shows the best possible deployed state for each architecture.
Nvidia is proud of its NVLink technology. It has expanded to the Blackwell Ultra 72 GPU configuration, combining them into a single, unified NVLink fabric with a connection speed of 130 TB/s. Compared to Hopper‘s 8-chip NVLink design, Nvidia‘s architecture, rack design, and, most importantly, the NVFP4 precision format are superior, which is why the GB300 dominates in terms of transfer speed.
In light of the wave of “agentic AI”, Nvidia‘s GB300 NVL72 tests focus on token costs and the aforementioned updates. Nvidia has achieved a 35-fold reduction in costs, calculated in millions of tokens, making this the most attractive inference option for edge labs and hyperscalers. The laws of scaling remain unchanged, yet they are evolving at an unprecedented pace. The main catalysts for these performance updates are Nvidia‘s exceptional co-design structure and the so-called Huang’s Law.
Given the incremental differences between compute nodes and architectures, comparisons with Hopper are somewhat unfair, so Nvidia also compared the GB200 with the GB300 (NVL72s) for long-context workloads. Maintaining the state of the entire code base requires aggressive token usage, making context a significant constraint for agents. With Blackwell Ultra, Nvidia achieves a cost reduction of up to 1.5x and attention processing that is twice as fast, making it well suited for agent workloads.
Blackwell Ultra is currently undergoing hyperscaler integration, so these are among the first benchmarks for this architecture. It appears that Nvidia has maintained performance scalability in line with modern AI use cases. Even better performance can be expected from the Blackwell generation with Vera Rubin, which is one reason Nvidia currently dominates the infrastructure competition.





