DeepSeek V4: Nvidia Is Supporting It with Blackwell Before Everyone Else!

TECH NEWS – On 1.6T models, Jensen Huang’s company is already pushing as much as 3,500 tokens per second out of the Chinese AI model.

 

DeepSeek V4 has arrived, bringing major optimizations with it, including model sizes of up to 1.6T, and Nvidia is already offering Day-0 support for it on Blackwell GPUs using NVFP4. The updated AI model uses only 27% of the inference FLOPs per token and just 10% of the KV cache when operating with a one-million-token context window. Two new models have also been introduced: a Pro model with 1.6 trillion parameters and a Flash version with 284 billion parameters. Nvidia says Blackwell GPUs provide both the scale and the low-latency performance required to run the long-context, one-million-token inference and trillion-parameter AI models enabled by V4.

“From Nvidia Blackwell data center deployments to managed NIM microservices and fine-tuning workflows, Nvidia offers multiple ways to integrate DeepSeek and other open models across different stages of development and deployment. Nvidia is an active contributor to the open source ecosystem and has released hundreds of projects under open source licenses. Nvidia remains committed to optimizing community software, and open models allow users to share their work on AI safety and resilience far more broadly,” Nvidia wrote.

Nvidia is showing throughput of nearly 3,500 TPS per GPU, specifically GB300 or Blackwell Ultra, and these are only preliminary figures that are expected to rise even further as the shared design layer receives additional optimization. The Nvidia Blackwell stack includes a wide range of technologies specifically built for models like V4, including NVFP4, Dynamo, optimized CUDA kernels, advanced parallelization methods, and more. One of the key elements of DeepSeek V4 is its use of FP4, or MXFP4, quantization to accelerate rollouts and inference runs. With FP4 in play, V4 models reduce both memory traffic and sampling latency.

It is also worth noting that Huawei’s latest Ascend chips, the Ascend 950PR and Ascend 950DT, both planned for 2026, support MXFP4 instructions as well. That strongly suggests DeepSeek V4 will also be fully compatible with China’s domestic AI chips. Thanks to Nvidia’s ongoing optimizations, future models may end up enjoying a robust ecosystem of support from the very first day.

Source: WCCFTech, Nvidia

DeepSeek V4 illustration
DeepSeek V4 performance chart
Huawei Ascend AI chips

Avatar photo
Anikó, our news editor and communication manager, is more interested in the business side of the gaming industry. She worked at banks, and she has a vast knowledge of business life. Still, she likes puzzle and story-oriented games, like Sherlock Holmes: Crimes & Punishments, which is her favourite title. She also played The Sims 3, but after accidentally killing a whole sim family, swore not to play it again. (For our office address, email and phone number check out our IMPRESSUM)

No comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

theGeek Live