OpenAI's AI model 'GPT-4o' tops the rankings by performing more than twice as well as previous models in chess puzzles
On May 14, 2024, OpenAI announced a new AI model called ' GPT-4o .' GPT-4o is capable of processing text, voice, and visual information at high speeds to respond in real time, and it also performs well in calculation problems that general chat AI struggles with. It has been revealed that GPT-4o also performed more than twice as well as GPT-4 in chess puzzles.
GitHub - kagisearch/llm-chess-puzzles: Benchmark LLM reasoning capability by solving chess puzzles.
In April 2024, a model called 'gpt2-chatbot' was suddenly added to the ' Chatbot Arena ' website, which compares and evaluates the capabilities of chatbots in a battle format. gpt2-chatbot can handle problems that the conventional GPT-4 model and Claude 3 Opus cannot solve, and has high processing capabilities for not only English but also Japanese, and can also generate ASCII art, raising the possibility that it may be a new model from OpenAI.
The possibility that the mysterious masked chatbot 'gpt2-chatbot' that was unrivaled in the AI battle arena was a new model of OpenAI suddenly emerged - GIGAZINE
Then on May 14, OpenAI announced a new AI model called 'GPT-4o,' and it was officially revealed that gpt2-chatbot was actually GPT-4o.
OpenAI announces 'GPT-4o', capable of processing text, voice and camera input at the same speed as humans, and can perform a variety of operations such as 'looking around and judging the situation', 'teaching how to solve mathematics', and 'composing music by talking to each other' - GIGAZINE
GPT-4o is a multimodal AI model that can process text, voice, and visual inputs at high speed, and can respond to voice input at a speed equivalent to that of a human being, which is only 232 milliseconds. The voice conversation mode of ChatGPT equipped with conventional GPT-4 and GPT-3.5 used multiple models such as 'a model that converts voice to text,' 'a model that generates a reply text based on the input text,' and 'a model that converts the reply text to voice,' but GPT-4o can perform these processes with a single model.
GPT-4o also excels in mathematics, an area where typical chat AI struggles. It can explain to humans how to solve mathematical problems, and is capable of a variety of actions, including expressive conversation and singing.
Meanwhile, a project called 'llm-chess-puzzles,' which has been running various large-scale language models to solve chess puzzles and publishing the results as benchmarks, has released the results of GPT-4o solving chess puzzles.
llm-chess-puzzles asks a large-scale language model to solve 1000 puzzles using the FEN notation for chess piece placement. The benchmark results include the number of problems that the large-scale language model was able to solve, as well as the number of problems where the model made an illegal move, meaning that the model did not understand the board state and the rules of the game.
The results table published by llm-chess-puzzles is below. GPT-4o's accuracy rate was 50.1%, which is significantly higher than competing models such as 'GPT-4-turbo-preview', 'GPT-4', 'Claude 3 Opus' and 'Claude 3 Haiku'.
llm-chess-puzzles said, 'Chess puzzles are difficult for most humans, much less for large-scale language models that are given only a few characters describing the entire board.' 'It is noteworthy that the large-scale language model not only internalizes the correct state of the board based on the FEN representation, but also uses the rules of the game and chess strategy to find the best move.'
Related Posts:
in Software, Web Service, Game, Posted by log1h_ik