Anthropic “Claude 3 Opus” achieved the feat of outperforming OpenAI “GPT-4” for the first time in the LLM evaluation index “Chatbot Arena”



Anthropic's large-scale language model (LLM) '

Claude 3 ' announced in March 2024 supports context lengths up to 200,000 tokens, and it is reported that it is possible to design quantum algorithms from just two prompts. is also mentioned. This time, in ' Chatbot Arena ', which is used by AI researchers to measure the relative abilities of large-scale language models, Claude 3's top model 'Claude 3 Opus' is compared to OpenAI's large-scale language model 'GPT-4'. It was reported that the performance exceeded that for the first time.

Anthropic's Claude 3 replaces OpenAI's GPT-4 as most popular user-rated LLM
https://the-decoder.com/anthropics-claude-3-replaces-openais-gpt-4-as-most-popular-user-rated-llm/



Claude takes the top spot in AI chatbot ranking — finally knocking GPT-4 down to second place | Tom's Guide

https://www.tomsguide.com/ai/claude-takes-the-top-spot-in-ai-chatbot-ranking-finally-knocking-gpt-4-down-to-second-place

“The king is dead”—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time | Ars Technica
https://arstechnica.com/information-technology/2024/03/the-king-is-dead-claude-3-surpasses-gpt-4-on-chatbot-arena-for-the-first-time/

Chatbot Arena is a benchmarking platform created by LMSYS Org to compare the performance of large-scale language models. This benchmark involves inviting human users to an open chat, having them converse with two anonymous AI models, then voting on them, and ranking them using the Elo rating used in chess.

GPT-4, which was released on May 3, 2023 and registered on Chatbot Arena around May 10, 2023, has always been at the top of the Chatbot Arena charts since its introduction. However, in an update on March 27, 2024, it was reported that Anthropic's large-scale language model ``Claude 3 Opus'' outperformed GPT-4.

In addition, it was found that the cheapest and most cost-effective Claude 3, ``Haiku'', has performance comparable to some GPT-4 models. LMSYS Org says, 'Claude 3 Haiku has impressed everyone by reaching GPT-4 levels of user preference. Its speed, functionality and length of context are unparalleled on the market.' I admire it.




Software developer Nick Dobos describes this achievement by Claude 3 as ``the king is dead.''




Also, according to Paul Gauthier , developer of ' Aider ', which uses large-scale language models to edit code, as a result of running Aider's code editing benchmark, Claude 3 Opus was lower than GPT-4 and GPT-3.5. It was revealed that the performance exceeded all of OpenAI's large-scale language models, including , and it has been reported that it is the most suitable model for programming using AI.

Below are graphs showing the results of running Aider's code editing benchmark on various large language models. Comparing the correct answer rate, OpenAI's large-scale language model 'gpt-4-0125-preview' had the highest rate of 66%, but 'claude-3-opus-20240229' had a higher correct answer rate of 68%. It can be confirmed that it was shown.



“For the first time, the best models available are from vendors other than OpenAI, such as Opus for advanced tasks and Haiku for cost and efficiency,” said independent AI researcher Simon Wilson. 'These results are very beneficial as we benefit from the diversity of the top vendors in this space.'

In addition, Meta, which focuses on open source AI, is expected to release the next generation large-scale language model 'Llama 3' in 2024, and OpenAI will release the next generation large-scale language model 'GPT-3' around the summer of 2024. 5' may be released.

Report that OpenAI's next-generation large-scale language model 'GPT-5' will be released in summer 2024 - GIGAZINE

in Software, Posted by log1r_ut