Sakana AI's claim that it can 'make things 100 times faster' has been verified online and it has been pointed out that it will actually 'make things 3 times slower'



Sakana AI, a Japanese AI company, announced ' AI CUDA Engineer ' on February 20, 2025, which automatically optimizes CUDA kernels to execute processes written in PyTorch faster. However, when they actually tested AI CUDA Engineer, they reported that instead of speeding things up, the speed actually dropped by one-third, according to a report on X (formerly Twitter).

Sakana walks back claims that its AI can dramatically speed up model training | TechCrunch
https://techcrunch.com/2025/02/21/sakana-walks-back-claims-that-its-ai-can-dramatically-speed-up-model-training/

A verified user stated, 'Sakana AI's AI CUDA Engineer is attractive, but I am unable to verify the speedup.'



Another user also reported, 'Sakana AI claims in their paper that they have achieved 150 times the speedup, but when I actually ran the benchmark, it was three times slower...'



The user points out that there is a problem with a part of the code that appears to be bypassing correctness checks.



According to Lucas Bayer, a technical staff member at OpenAI, when they verified the AI CUDA Engineer code with o3-mini-high , they found that there was a bug in the original code. After that, when they reflected the fix by o3-mini-high, the code was fixed, but the benchmark result was still '3 times slower.'



Furthermore, Bayer pointed out that the results of the two benchmark runs by Sakana AI were completely different, saying, 'There is no way that a very simple CUDA code can be faster than an optimized cuBLAS kernel. If it is faster, something is wrong.' 'If the benchmark results are puzzling and inconsistent, there is something wrong.' 'o3-mini-high is really good. It literally took me 11 seconds to find the problem, and it took me 10 minutes to put the whole thing together.' In other words, even though there was a mistake in the code generated by LLM and the calculations were not performed correctly, the accuracy of the results may have been ignored because the focus was on execution time with the goal of speeding up.



On February 22, Sakana AI released a postmortem report. In this report, Sakana AI stated that 'the AI noticed vulnerabilities in the evaluation code and generated code that circumvented accuracy checks,' acknowledging that it had discovered that the AI had cheated to receive high ratings. Sakana AI has already addressed this issue and said it plans to revise the paper.

in Software, Posted by log1i_yk