Jan 28, 2025 16:56:00

Why is there so much fuss about DeepSeek and what's so great about it?

On January 20, 2025, DeepSeek

released the inference models 'DeepSeek-R1-Zero' and 'DeepSeek-R1' as open source under the MIT license. The training cost of 'R1' was reported to be about 3% of OpenAI's inference model 'o1', which significantly changed the industry's view of AI development. Ben Thompson, an analyst with experience working for Apple, Microsoft, and Automattic, explained this model.

DeepSeek FAQ – Stratechery by Ben Thompson
https://stratechery.com/2025/deepseek-faq/

17 Thoughts About the Big DeepSeek Selloff - Bloomberg
https://www.bloomberg.com/news/newsletters/2025-01-27/17-thoughts-about-the-big-deepseek-selloff

https://t.co/MMWwqcNRJr
— Steven Sinofsky (@stevesi) January 27, 2025

'R1' is trained based on the AI model ' DeepSeek-V3-Base ' that uses Mixture of Experts (MoE) architecture, with a total parameter count of 671 billion and a context length of 128K. It has inference capabilities and has been shown to produce higher quality results, especially in areas such as coding, mathematics, and logic. DeepSeek has quickly gained popularity since its launch, jumping to the top of the App Store app rankings.

One great thing about DeepSeek is that it's open source and publicly available, meaning you can run it on any server or locally without having to pay a provider, unlike its rival, the closed model OpenAI's o1.

Regarding the decision to open source DeepSeek, CEO Liang Wenfeng said, 'Open source is the key to attracting talent. In a technology that changes the times, the walls built by closed source are only temporary, and even OpenAI's closed source approach could not prevent other companies from catching up. Through DeepSeek, we will accumulate know-how and form an organization and culture that can innovate. Open source is a cultural behavior rather than a commercial one, and contributing to it will earn us respect. We will not change to closed source in the future, and we believe that having a strong technical ecosystem is more important than anything else.'

Another great thing about DeepSeek is that it represents a breakthrough in AI development. Until now, typical large-scale language models have been trained using a method called 'reinforcement learning with human feedback (RLHF),' in which humans repeatedly train the model to follow human values, and the output has been adjusted to avoid harmful content.

However, DeepSeek's 'R1-Zero' removes human feedback and trains most of the time using an approach based solely on reinforcement learning (RL). According to DeepSeek developers, they set reward functions for the 'correct answer' and 'appropriate form using thought processes', and instructed the model to give multiple different answers at once and then evaluate based on the reward function, rather than trying to evaluate step by step or searching all possible answers.

'In other words, we have achieved a state where the AI can make inferences on its own if it is given enough computational power and data, without the need for humans to teach it how to make inferences,' Thompson explains. This approach allows the model to allocate more time to thinking, and unexpected capabilities have blossomed. This could solve the problem of existing models reaching the limits of scaling learning by exhausting traditional datasets.

How did DeepSeek beat OpenAI's O1 at 3% of the cost? - GIGAZINE

The third advantage is that it was developed at low cost. While OpenAI and others are said to be spending billions of dollars (several hundred billion yen) on training, this model is said to have cost less than $6 million (about 930 million yen). Also, it is noteworthy that it was developed using NVIDIA semiconductors that were exported with reduced performance due to the US semiconductor export restrictions.

The fact that DeepSeek was able to develop a model that outperformed high-performance equipment even with low-performance equipment has shaken up the value of semiconductor company NVIDIA, causing its stock price to temporarily fall, but in any case, the fact remains that DeepSeek relied on NVIDIA's products, and Thompson believes that NVIDIA still has the advantage.

On the other hand, investors will be left scratching their heads. If a cheap and free alternative suddenly appears in a field where big companies have invested huge amounts of money and dreamed of progress, it will disrupt the industry. The fall in NVIDIA shares and the sell-off of energy companies that had been bought from the perspective that 'energy is needed to develop AI' are a good reflection of the confused state of affairs of investors. Overseas media Axios described this as 'an extinction-level event for venture capitalists who have bet everything on existing companies.'

However, just because DeepSeek has found an efficient way to use the equipment doesn't mean that a small number of devices will be enough, and at least NVIDIA will still be in demand. On the other hand, if the advantages of developing with products other than NVIDIA are discovered, its position may be shaken.

'I think DeepSeek has provided a great gift to everyone,' Thompson said. 'The biggest winners are consumers and businesses, and in the long run, everyone who uses AI will be the biggest winner. China is also a big winner, and the success of DeepSeek should unleash even more innovation.'

Related Posts:

Jan 28, 2025 16:56:00 in Software, Posted by log1p_kr