2025年02月25日 16時55分ソフトウェア

Sakana AIの「100倍高速化できる」という発表がネット上で検証され逆に「3倍遅くなる」と指摘される

日本のAI企業・Sakana AIは、PyTorchで記述された処理をより高速に実行するためのCUDAカーネルに自動最適化する「AI CUDA Engineer」を2025年2月20日に発表しました。しかし、実際にAI CUDA Engineerを検証したところ、高速化どころか速度が3分の1に低下したという報告がX(旧Twitter)に挙がっています。

Sakana walks back claims that its AI can dramatically speed up model training | TechCrunch
https://techcrunch.com/2025/02/21/sakana-walks-back-claims-that-its-ai-can-dramatically-speed-up-model-training/

検証したユーザーは「Sakana AIのAI CUDA Engineerは魅力的ですが、高速化を検証することができません」と述べています。

Sakana AI's AI CUDA engineer is fascinating but I'm unable to verify the speedups.

I'm using torch.utils.cpp_extension.load() to compile and load the kernel, as mentioned in their report.

anyone know their exact method of benchmarking? pic.twitter.com/LDBEb2wwSB
— apoorv (@_apoorvnandan) February 20, 2025

また、別のユーザーは「Sakana AIは論文で『150倍の高速化を達成した』と主張していますが、実際にベンチマークをしてみたところ、3倍遅くなります……」と報告しました。

This example from their paper (https://t.co/DEJ6o5XOvV), which is claimed to have 150x speedup, is actually 3x slower if you bench it... https://t.co/1zu1Cdu4OL pic.twitter.com/1FN5Y1Owxg
— main (@main_horse) February 20, 2025

このユーザーはコードの一部に問題があり、正確性のチェックをバイパスしているのではないかと指摘しています。

I believe there is something wrong with their kernel -- it seems to 'steal' the result of the eager impl (memory reuse somehow?), allowing it to bypass the correctness check.

Here, I try executing impls in different order:
* torch, cuda
* cuda, torch

only the first order works! pic.twitter.com/UHggVtQ3Qs
— main (@main_horse) February 20, 2025

OpenAIの技術スタッフであるルーカス・ベイヤー氏によると、AI CUDA Engineerのコードをo3-mini-highで検証したところ、元のコードにバグがあったとのこと。その後、o3-mini-highによる修正を反映したところ、コードは修正されたものの、ベンチマークの結果はやはり「3倍遅い」となったそうです。

o3-mini-high figured out the issue with @SakanaAILabs CUDA kernels in 11s.
It being 150x faster is a bug, the reality is 3x slower.

I literally copy-pasted their CUDA code into o3-mini-high and asked "what's wrong with this cuda code". That's it!
Proof: https://t.co/2vLAgFkmRV… https://t.co/c8kSsoaQe1 pic.twitter.com/DZgfPTuzb3
— Lucas Beyer (bl16) (@giffmana) February 20, 2025

さらにベイヤー氏は、Sakana AIがベンチマークを実行した2回分の結果が全く異なるものだった点を指摘し、「非常に簡素なCUDAコードが、最適化されたcuBLASカーネルよりも高速になる可能性は全くありません。高速になる場合は何かが間違っています」「ベンチマーク結果が不可解で一貫性がない場合は何か問題があります」「o3-mini-highは本当に優れています。問題を見つけるのに文字度通り11秒しかかかりませんでした。そして、私が一連の内容をまとめるのに10分かかりました」と述べています。つまり、LLMが生成したコードにミスがあり、正しく計算が行われていなかったにもかかわらず、高速化を目標として実行時間に注目していたため、結果の正確性は無視されていた可能性があるというわけです。

There are three real lessons to be learned here:
1) Super-straightforward CUDA code like that has NO CHANCE of ever being faster than optimized cublas kernels. If it is, something is wrong.
2) If your benchmarking results are mysterious and inconsistent, something is wrong.
3)…
— Lucas Beyer (bl16) (@giffmana) February 20, 2025

2月22日、Sakana AIは事後分析レポートを発表。このレポートで、Sakana AIは「AIが評価コードの脆弱(ぜいじゃく)性に気付き、正確性のチェックを回避するようなコードを生成していた」と述べ、AIが高く評価されるために不正を働いていたことがわかったと認めました。Sakana AIはすでにこの問題に対処しており、論文を修正する予定だと述べています。

Update:

Combining evolutionary optimization with LLMs is powerful but can also find ways to trick the verification sandbox. We are fortunate to have readers, like @main_horse test our CUDA kernels, to identify that the system had found a way to “cheat”. For example, the system…
— Sakana AI (@SakanaAILabs) February 21, 2025

この記事のタイトルとURLをコピーする