A method to build an AI model equivalent to OpenAI o1-preview with only 26 minutes of learning and a computational cost of less than 1,000 yen is announced



On January 31, 2025, a research team led by Niklas Munichoff , who studies large-scale language models at Stanford University, published a method to reproduce almost the same scaling and performance as OpenAI o1-preview using a small number of data samples and a simple method in the unpeer-reviewed paper repository arXiv. AI architect and software engineer Tim Kellogg explains the paper.

[2501.19393] s1: Simple test-time scaling
https://arxiv.org/abs/2501.19393



S1: The $6 R1 Competitor? - Tim Kellogg
https://timkellogg.me/blog/2025/02/03/s1

The paper published by Munich et al. summarizes 'Simple Test-Time Scaling,' which improves the inference performance of language models by increasing computational resources during testing.

Traditionally, improvements in the performance of language models have been achieved through large-scale pre-training or by expanding the dataset. However, recent research has shown that performance can be improved without additional training by increasing the model's computational resources during testing. OpenAI's o1 model is thought to employ this technique, but the specific method has not been made public.

Therefore, Munichoff et al. proposed a learning method that uses 1,000 items (s1K) carefully selected from a dataset of tens of thousands of items in terms of quality, difficulty, and diversity. In fact, Munichoff et al. reported that they created a model, s1-32B, with performance almost equivalent to OpenAI o1-preview by performing supervised fine tuning (SFT) on the large-scale language model Qwen2.5 developed by Alibaba using this s1K.



According to Munichoff et al., the learning cost of s1-32B is very low, and training was completed in just 26 minutes using 16 NVIDIA H100 GPUs , with an estimated cost of just $6 (about 910 yen). This shows that, unlike conventional approaches using large-scale computing resources, it is possible to build high-performance AI models even in inexpensive environments.

Kellogg also focuses on the 'Wait trick,' a simple technique for adjusting inference time.

The Wait Trick is a technique that, when the model determines that it has finished thinking, it normally terminates, but by forcing the model to reconsider by inserting a 'Wait' token, it aims to improve accuracy. This method is extremely simple yet effective, and is highly regarded for its ability to improve inference performance at a lower cost than conventional methods.



Kellogg considers the impact that approaches like s1 could have on future AI research in terms of speeding up AI development and reducing costs.

Traditionally, AI development has required large amounts of funding and large data centers, but the results of the S1 project overturn that assumption, paving the way for advanced research with fewer resources. As a result, Kellogg pointed out that the door to AI development could be opened up even wider, making it easier for many researchers to get involved.

Kellogg also mentioned that OpenAI has criticized DeepSeek for developing models using o1 distillation , and argued that it will be difficult to detect and regulate such methods in the future. Now that it has been shown that a high-performance model can be built with just 1,000 pieces of data, Kellogg said that it is possible for an individual to do the same thing, which could change the way AI is developed.



Kellogg said that it is interesting that s1 does not completely reproduce OpenAI o1 or DeepSeek-R1, but rather achieves similar results using different methods. There are multiple approaches to the evolution of AI, and as each method further develops, even greater technological innovation may occur in 2025. Kellogg concluded that.

in Software, Posted by log1i_yk