OpenAI is reportedly developing an AI agent called 'Operator' for ChatGPT that automates complex tasks performed through a browser and is scheduled to be released soon
The Information reported that OpenAI plans to release ' Operator ', an AI agent that can perform complex tasks on a web browser on behalf of users, in the fourth week of January 2025. According to the report, not only OpenAI but also AI companies such as Google and Anthropic are working on developing similar AI agents.
OpenAI Preps 'Operator' Release For This Week — The Information
OpenAI Reportedly Launching 'Operator' That Can Control Your Computer This Week
https://gizmodo.com/openai-reportedly-launching-operator-that-can-control-your-computer-this-week-2000553513
ChatGPT Operator AI agents could launch this week
https://bgr.com/tech/chatgpt-operator-feature-could-launch-as-soon-as-this-week/
OpenAI reportedly launching ChatGPT's first browser agent 'Operator' this week
https://the-decoder.com/openai-reportedly-launching-chatgpts-first-browser-agent-operator-this-week/
ChatGPT-Maker To Launch Web Automation Tool 'Operator' This Week - Slashdot
https://slashdot.org/story/25/01/22/1624227/chatgpt-maker-to-launch-web-automation-tool-operator-this-week
According to reports, OpenAI is developing Operator as a new feature of ChatGPT. Operator is an AI agent that performs complex tasks on the browser on behalf of the user, and users are suggested prompts by category, such as travel, meals, and events. For example, if a user asks Operator to find a flight from New York to Maui that does not arrive late at night, Operator will search and present flight information before purchasing a ticket, so the user can book a flight just by entering their personal information. In addition, if you ask Operator to make a restaurant reservation, Operator will ask the user for necessary information such as time and number of participants and find a restaurant that meets the conditions. It seems that the user can operate the screen while Operator is running. According to The Information, although Operator cannot control Gmail accounts, 'it can log in to other sites and remain logged in between sessions.'
Tibor Blaho, a software engineer and well-known AI product leaker, also pointed out that OpenAI will soon add Operators to ChatGPT. He reported that he discovered new options called 'Toggle Operator' and 'Force Quite Operator' in the Mac version of the ChatGPT app. However, the options are hidden at the time of writing.
Confirmed - the ChatGPT macOS desktop app has hidden options to define shortcuts for the desktop launcher to 'Toggle Operator' and 'Force Quit Operator' https://t.co/rSFobi4iPN pic.twitter.com/j19YSlexAS
— Tibor Blaho (@btibor91) January 19, 2025
Anthropic, the developer of the chat AI Claude, has already released a preview version of 'computer use,' a feature that allows an AI like Operator to operate a PC. However, early testers have complained that 'computer use' gets stuck in a loop when they don't know what to do,' 'forgets its task and starts doing something completely different, such as looking at nature photos on Google Images,' and 'is half-baked.'
Chat AI 'Claude' will have the ability to automatically operate a PC & an improved version of 'Claude 3.5 Sonnet' and a lightweight model 'Claude 3.5 Haiku' will also be released - GIGAZINE
Google is also reportedly developing an AI agent called 'Jarvis' that can 'book flights' and 'purchase products' in a browser.
Google plans to introduce AI feature 'Jarvis' to Chrome that will allow users to 'book flights' and 'purchase products' in the browser - GIGAZINE
According to an anonymous person familiar with OpenAI's Operator development, the company is developing multiple AI agents, and the one that is closest to completion is a 'general-purpose' agent that can operate a web browser on behalf of the user. The first time that the name of the AI agent being developed by OpenAI was Operator was reported by Bloomberg in November 2024 .
As an example of how Operator can be used, technology media Gizmodo gives an example of a use case where an elderly person who is not familiar with computers asks Operator to help them send an email. While this may be unnecessary help for technology savvy people, for elderly people and those who are not familiar with the Internet, even completing simple tasks can be difficult. Gizmodo points out that Operator will be a feature for such users. In addition, companies may use Operator to test whether new websites or services work properly.
However, Gizmodo pointed out that there are potential risks to AI agents. In fact, there are bots that can control end-user clients, such as bots that automatically post marketing spam to Reddit, and these can bypass APIs that block automation. Therefore, OpenAI points out that some measures must be taken to prevent Operator from being abused.
The AI agent essentially works by taking a screenshot of the user's browser and sending the image to OpenAI's servers for analysis. The AI model determines the steps required to complete the assigned task, and commands are sent back to the browser to execute the action using the mouse or keyboard. Gizmodo predicts that Operator will use multimodal AI that can interpret multiple forms of input, such as text and images.
Related Posts:
in Software, Posted by logu_ii