TECH NEWS – The GB300 demonstrates significantly improved performance compared to the GB200 AI racks.
Nvidia’s GB300 NVL72 AI racks were tested on DeepSeek’s latest open-source models, and the results look genuinely promising thanks to fine-tuning and optimized inference. With the GB300, Nvidia’s primary goal was to deliver optimal long-context performance in order to capitalize on the wave of agentic AI. The Large Model Systems Organization (LMSYS) tested the GB300 NVL72 for long-context inference, and the results appear extremely encouraging. The testing also includes infrastructure-level software routing.
Because long-context workloads shift the pressure more toward GPU VRAM, the LMSYS team integrated PD (prefill-decode) disaggregation, a widely used mechanism for maintaining large token contexts. PD disaggregation distributes work across different hardware “nodes” to avoid bottlenecks. The prefill phase – put simply, prompt processing – and the decode phase – token generation – can be optimized far more effectively via disaggregation, which substantially improves performance.
The LMSYS team also applied several additional optimization techniques, including dynamic chunking to deliver optimal fast responses for long-context windows, as well as efficient KV (key-value) capacity conversion. In terms of generational improvements, the team highlighted the following key reference points: throughput analysis, capacity, and latency ratio. The GB300 NVL72 delivers 1.53x peak throughput at 226.2 TPS/GPU (tokens per second), 1.87x user speed – a massive jump in TPS/user thanks to MTP (Multi-Token Prediction) – and a 1.58x latency advantage versus the GB200 NVL72.
According to the LMSYS team, the GB300 provides an average 1.4x to 1.5x advantage over the GB200, especially in latency-sensitive cases, and since the focus is on agentic workloads, Blackwell Ultra is best positioned to exploit these gains. While Blackwell Ultra appears clearly dominant in latency and throughput, the industry-discussed TCO figures have not been seen yet, particularly as deployment costs have risen alongside the GB300.
It looks like Nvidia is not only pushing architecture forward with each generation, but also addressing industry-specific constraints, and in Blackwell Ultra’s case the latency data shows a meaningful improvement. This is one reason the GB300 is becoming the leading choice for hyperscalers and neoclouds in agentic environments.




