TECH NEWS – We’re hearing some pretty bold rumors about DeepSeek’s new AI model!
DeepSeek’s first mainstream model, the R1, showed the Western world that China is far from being left behind when it comes to developing high-end AI models. The technology gave the US stock markets a big slap in the face and showed that developing AI models does not require the high costs that companies like OpenAI have made public. Chinese media have begun to report rumors about DeepSeek’s next model, R2.
It is claimed that the R2 model will use a hybrid MoE (Mixture of Experts) architecture, which is said to be an enhanced version of the existing MoE implementation, and will likely feature advanced gating mechanisms or a combination of MoE and dense layers to optimize high-end workloads. With this architecture, DeepSeek R2 will have twice as many parameters as R1 (1.2 trillion). This alone puts R2 on par with GPT-4 Turbo and Google Gemini 2.0 Pro, but this is not the only area where DeepSeek aims to make a big impact.
For DeepSeek R2, the unit cost per token is 97.4% lower than GPT-4, which has a cost of $0.07/M input token and $0.27/M output token. Compared to OpenAI’s pricing, the DeepSeek R2 model will be a good value for enterprises as it will be the most cost-effective model on the market. When it is released, it could prove to be a defining moment for artificial intelligence and the surrounding economy. It is claimed that the model achieves 82% utilization on Huawei’s Ascend 910B chip cluster, with computing performance measured at 512 PetaFLOPS on FP16, so it could indeed use in-house resources.
We had previously suspected that the Chinese MI company was interested in domestic chips, and with this move, DeepSeek has “vertically integrated” its AI supply chain. Once the model is released, it could very well surprise the public and hit the US stock market again… but these are just rumors!
Source: WCCFTech, Jiuyangongshe
Leave a Reply