Gemini 3.5 Flash will gain a new 'computer use' feature that allows it to recognize the screen and perform clicks and text input, making it possible to build agents that control PCs.

Google has announced that it has incorporated a feature called ' computer use ' into its AI model 'Gemini 3.5 Flash,' which allows users to perform actions such as clicking and typing while looking at the screen.
Introducing computer use in Gemini 3.5 Flash
Computer Use is a feature where an AI agent understands the screen state based on screenshots and autonomously operates the computer. Computer Use was previously offered as a standalone model called 'Gemini 2.5 computer use model,' but it has now been integrated into Gemini 3.5 Flash.
It is envisioned for use in automating tasks involving multiple steps, gathering information across enterprise applications, verifying the functionality of web applications, and conducting accessibility testing. Gemini 3.5 Flash also outputs the intent behind the operation, making it easier for developers to understand why the AI is trying to press a particular button.
The following are the results of the 'OSWorld-Verified' benchmark, which measures how accurately AI can perform operations on the operating system. Gemini 3.5 Flash scored 78.4, a significant improvement from Gemini 3 Flash's 65.1, and also surpassing Gemini 3.1 Pro's 76.2. Sonnet 4.6 tied with Gemini 3.5 Flash at 78.4, while Opus 4.8 achieved the highest score at 83.4. GPT-5.4 mini scored 72.1 and GPT-5.5 scored 78.7. Despite being a lightweight and high-speed Flash-based model, Gemini 3.5 Flash demonstrates performance that rivals higher-end and competing models, even in agent applications involving PC operations.

On the other hand, in systems where AI interacts with the screen, there is a risk that the AI could be deceived by malicious text on a webpage. Google offers optional protection features for businesses, such as a system that requires user confirmation for operations that are difficult to undo or highly sensitive, and a system that stops tasks if indirect prompt injection is detected.
Google has also released a demo environment , reference implementation , and documentation at the time of writing. Google states that with the addition of computer use to Gemini 3.5 Flash, AI will not only be able to provide answers, but will also be easier to use as an agent that looks at the screen and interacts with it.
Related Posts:







