OpenAI announces 'GPT-4o', capable of processing text, voice and camera input at the same speed as humans, and can perform a variety of operations such as 'looking around and judging the situation', 'teaching how to solve mathematics', and 'composing music by talking to each other'



OpenAI announced the AI model ' GPT-4o ' on Tuesday, May 14, 2024 (Japan time). GPT-4o is a single model that can process text, voice, and visual input very quickly, and can perform tasks such as 'solving computational problems,' 'generating images,' and 'judging the situation from surrounding images' while having a real-time conversation.

Hello GPT-4o | OpenAI

https://openai.com/index/hello-gpt-4o/

Introducing GPT-4o and more tools to ChatGPT free users | OpenAI
https://openai.com/index/gpt-4o-and-more-tools-to-chatgpt-free/

GPT-4o is a multimodal AI model that can process text, voice, and visual inputs at high speed. GPT-4o has an average response time of 320 milliseconds and can respond to voice input in just 232 milliseconds, which is as fast as a human . The 'o' in GPT-4o is an abbreviation of 'omni,' which means 'all' or 'whole.'

The voice conversation mode of ChatGPT equipped with GPT-4 and GPT-3.5 was realized using multiple models such as 'a model that converts voice to text,' 'a model that generates reply text based on input text,' and 'a model that converts reply text to voice.' In contrast, GPT-4o can execute the process of 'receiving input such as voice, image, and video and then replying' with a single model.

A number of real-time response demos were held at the launch of GPT-4o. For example, in the following demo, while taking a picture of the surroundings with a smartphone, the robot was asked, 'Guess what I'm going to do here?' and replied, 'It looks like I'm preparing some kind of filming or live streaming, from setting up the lighting and tripod. This presentation may be related to OpenAI.' Furthermore, when I told him, 'This is a presentation about you,' he replied in a surprised voice, 'Me!?'

Say hello to GPT-4o - YouTube


There is also a demo on mathematics, which is a weakness of general chat AI. In the demo below, a math problem is shown and the son is instructed to 'teach my son how to solve the problem without telling him the answer.' GPT-4o recognizes that the problem is about trigonometric functions and is able to teach him how to solve the problem step by step, such as 'Do you know which side is the hypotenuse?'

Math problems with GPT-4o - YouTube


In the video below, we have prepared 'GPT-4o with camera input enabled' and 'GPT-4o with camera input disabled,' and have them talk to each other while understanding the surrounding situation. In addition, from around 4 minutes and 27 seconds into the video, you can see GPT-4o singing according to the instruction 'Sing a song about what just happened.'

Two GPT-4os interacting and singing - YouTube


In addition, GPT-4o can also process images according to instructions. In the example below, GPT-4o converts an input face photo into an illustration.



Below are the results of measuring GPT-4o's text processing performance using multiple benchmarks. GPT-4o's scores outperform models such as GPT-4 Turbo and Gemini Ultra in most tests.



When comparing the error rates when processing voice with GPT-4o and Whisper, it can be seen that GPT-4o has a lower error rate and better performance.



It also recorded scores that surpassed models such as GPT-4 Turbo and Gemini Ultra in visual processing performance.



GPT-4o is already available to ChatGPT Plus subscribers, who can experience text conversations with GPT-4o.



In addition, text and visual processing features will be rolled out to free users from May 14, 2024, and a voice mode using GPT-4o will be available within a few weeks.

Since around the end of April 2024, mysterious models called ' gpt2-chatbot ' and ' im-also-a-good-gpt2-chatbot ' have appeared on the AI performance comparison site 'Chatbot Arena,' raising rumors that they might be 'new models from OpenAI.' However, OpenAI researcher William Fedus has revealed that the true identity of im-also-a-good-gpt2-chatbot is GPT-4o.




in Software,   Web Application,   Video, Posted by log1o_hf