May 09, 2024 17:00:00

The possibility that the mysterious masked chatbot 'gpt2-chatbot' that was unrivaled in the AI battle arena was a new model of OpenAI suddenly emerged

As soon as it appeared on

Chatbot Arena , a website that compares and evaluates the capabilities of chatbots in a battle format, it became a hot topic as it defeated strong models such as GPT-4 one after another. It has now been revealed that the true identity of this AI is likely to be a new model from OpenAI.

gpt2-chatbot confirmed as OpenAI
https://simonwillison.net/2024/May/8/gpt2-chatbot-confirmed-as-openai/

Mystery chatbot is likely a new OpenAI product
https://www.axios.com/2024/05/02/mystery-chatbot-openai-gpt2

Is this mystery chatbot really GPT-4.5 in disguise? Here's how to see for yourself | ZDNET
https://www.zdnet.com/article/is-this-mystery-chatbot-really-gpt-4-5-in-disguise-heres-how-to-see-for-yourself/

Chatbot Arena is a competitive AI platform where users can rank chatbots by voting for which one is better using multiple large-scale language models (LLMs).

In April 2024, a model called 'gpt2-chatbot' was suddenly added to the Chatbot Arena, and it became a hot topic when it ranked on the leaderboard, defeating mainstream LLMs such as Gemini, Claude, and GPT-4 one after another. Although there was no detailed information about this model and its origin was unknown, its behavior in response to prompts was similar to that of OpenAI, so there were rumors that it was secretly testing GPT-4.5 or GPT-5.

Mysterious high-performance AI model 'gpt2-chatbot' appears on Chatbot Arena, and it is said to be GPT-4.5 or GPT-5 - GIGAZINE

Immediately after becoming a hot topic among AI users, gpt2-chatbot disappeared from Chatbot Arena, but about a week later, on May 6, 2024, two models that appear to be derived versions, 'im-a-good-gpt2-chatbot' and 'im-also-a-good-gpt2-chatbot', were registered on Chatbot Arena.

The gpt2-chatbot series has once again become the focus of attention, and since an error message revealed that it was related to the OpenAI API, the prediction that it was a GPT model has become certain.

pic.twitter.com/KSBMNLxBbD
— nano (@nanulled) May 7, 2024

In addition, the fact that OpenAI CEO Sam Altman posted 'im-a-good-gpt2-chatbot' on X (formerly Twitter) just before the model was registered is also thought to support the origins of this LLM.

im-a-good-gpt2-chatbot
— Sam Altman (@sama) May 5, 2024

While many of Chatbot Arena's models can be selected from a drop-down menu, the successor to gpt2-chatbot can only be conversed with by chance encounter in a random match, but it has been met with praise from users who have been lucky enough to have a conversation with it.

For example, one X user posted, 'im-also-a-good-gpt2-chatbot created a Flappy Bird clone in one go, with just a few simple prompts.'

Whoa the new gpt2-chatbot just created Flappy Bird clone in one-shot 🤯

And it was a dead simple prompt. 🧵👇 pic.twitter.com/rxwv6sJ5cw
— Min Choi (@minchoi) May 7, 2024

Some people reported that when asked a basic physics question, 'Which is heavier: 1 ton of feathers or 1 ton of lead?', one of the three models of Claude 3, 'Haiku', claimed that 1 ton of lead was heavier, while the im-a-good-gpt2-chatbot answered, '1 ton of feathers and 1 ton of lead have the same weight, which is 1 ton.'

🚨 GPT2-Chatbot is back - WTF Haiku?

Two mysterious new AI models, 'im-a-good-gpt2-chatbot' and 'im-also-a-good-gpt2-chatbot', have emerged, sparking speculation about their origins and capabilities.

Meanwhile, the 'claude-3-haiku' model fails at basic physics, claiming a ton of… pic.twitter.com/zepORPwqno
— Dominik Stosik (@iblamedom) May 7, 2024

Another X user joked, 'im-a-good-gpt2-chatbot is so good that it built me a code interpreter that uses Claude's Opus, and I fainted from ontological shock.'

im-a-good-gpt2-chatbot it's so good that it created a code interpreter that uses Claude Opus for me.

Excuse me as I faint in ontological shock. pic.twitter.com/aCO4XFfNCm
— Pietro Schirano (@skirano) May 7, 2024

On the other hand, while some users noted that it 'definitely outperforms open source models and in some cases even outperforms GPT4-turbo,' it is not better than Claude 3 Opus and that it freezes when using certain prompts.

I was skeptical about the GPT2 chatbot, but it is undoubtedly more capable than opensource models and, in some cases, better than GPT4-turbo

But it is not better than my experience with Opus – I’m curious to know what is behind it.

Also, about the gpt2-chatbot:
It does not have a… pic.twitter.com/CWPVrM48Ig
— Denis Shiryaev 💙💛 (@literallydenis) April 29, 2024

According to the news site Axios, CEO Altman, who spoke at Harvard University on May 1, 2024, said, 'It's not GPT-4.5,' referring to gpt2-chatbot. In addition, overseas media The Information reported that OpenAI had planned to hold an internal demo of a new product on May 9, 2024, but that it had been postponed. It is unclear what was planned to be announced at this event.

Axios said, 'If gpt2-chatbot is made by OpenAI, the company is likely deploying it in stealth mode to generate excitement or to see how the chatbot works in the wild. Whether it's a test or a prank, we should know more details soon.'

In addition, if you ask the im-a-good-gpt2-chatbot you actually encountered a question in Japanese, it looks like this. You can see that it is possible to respond very naturally.

When we asked the im-a-good-gpt2-chatbot a question similar to the physics question above, it responded as follows:

Continued
A researcher from OpenAI revealed that the true identity of 'im-also-a-good-gpt2-chatbot' is 'GPT-4o.'

GPT-4o is our new state-of-the-art frontier model. We've been testing a version on the LMSys arena as im-also-a-good-gpt2-chatbot 🙂. Here's how it's been doing. pic.twitter.com/xEE2bYQbRk
— William Fedus (@LiamFedus) May 13, 2024

For more information on GPT-4o, see the following article:

OpenAI announces 'GPT-4o', capable of processing text, voice and camera input at the same speed as humans, and can perform a variety of operations such as 'looking around and judging the situation', 'teaching how to solve mathematics', and 'composing music by talking to each other' - GIGAZINE

Related Posts:

May 09, 2024 17:00:00 in Software, Posted by log1l_ks