Chinese AI company DeepSeek releases 'DeepSeek-V3', an AI model comparable to GPT-4o, with an astounding 671 billion parameters



DeepSeek, a Chinese AI company, announced the large-scale language model ' DeepSeek-V3 ' on December 26, 2024. DeepSeek-V3, which has 671 billion parameters, is comparable to OpenAI's multimodal AI model '

GPT-4o ' and is said to outperform GPT-4o in some cases.

deepseek-ai/DeepSeek-V3-Base · Hugging Face
https://huggingface.co/deepseek-ai/DeepSeek-V3-Base






DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch | VentureBeat
https://venturebeat.com/ai/deepseek-v3-ultra-large-open-source-ai-outperforms-llama-and-qwen-on-launch/

DeepSeek-V3 is Now The Best Open Source AI Model
https://analyticsindiamag.com/ai-news-updates/deepseek-v3-is-the-best-open-source-ai-model/

DeepSeek is preparing Deep Roles and released a new V3 model
https://www.testingcatalog.com/deepseek-preparing-deep-roles-and-dropping-high-performing-v3-model/

The newly announced DeepSeek-V3 is a large-scale language model trained with 671 billion parameters and 14.8 trillion tokens. According to overseas media TestingCatalog, DeepSeek-V3 surpasses the previous record of 405 billion parameters held by Llama 3.1 405B , making it the largest language model with parameters to date.

Training DeepSeek-V3 required approximately 2,788,000 GPU hours on NVIDIA's H800 GPU, which cost about $5.57 million (about 870 million yen). However, pre-training large-scale language models typically costs hundreds of millions of dollars (several hundred billion yen), so training DeepSeek-V3 is much cheaper.



DeepSeek-V3 is designed by combining

multi-head attention with the Mixture of Experts (MoE) architecture, a method of integrating multiple specialized neural networks. By selecting and activating only the 37 billion parameters that are optimal for processing each task from a massive 671 billion parameters, it achieves both computational efficiency and processing performance.

DeepSeek-V3 also employs a load balancing strategy that dynamically monitors and adjusts the load between networks without compromising the performance of the entire model on the MoE architecture. It also implements a technology called 'multi-token prediction (MTP)' that enables multiple future tokens to be predicted simultaneously. This enables the generation of 60 tokens per second, three times faster than the previous generation DeepSeek-V2 .

DeepSeek has published benchmark scores for DeepSeek-V3, which are reported to be comparable to ' Qwen2.5 72B ', 'Llama 3.1 405B', ' Claude 3.5 Sonnet-1022 ', and 'GPT-4o 0513'. It has been revealed that it has shown outstanding results against other AI models, especially in programming such as 'HumanEval-Mul', mathematics such as 'CNMO 2024', and Chinese language processing such as 'C-Eval'.



DeepSeek further stated, 'We have cleverly incorporated

DeepSeek-R1 's verification and reflection patterns into DeepSeek-V3, successfully significantly improving inference capabilities.'

In addition, for a limited time until February 8, 2025, API fees for DeepSeek-V3 will remain unchanged from DeepSeek-V2. The price for input is $0.27 (approximately 42 yen) per million tokens, and the price for output is $1.10 (approximately 173 yen) per million tokens.




DeepSeek has open-sourced DeepSeek-V3, and the source code can be downloaded from GitHub.

deepseek-ai/DeepSeek-V3
https://github.com/deepseek-ai/DeepSeek-V3

in Software, Posted by log1r_ut