Jan 28, 2026 12:13:00

Gemini 3 Flash adds highly accurate image understanding feature 'Agentic Vision,' enabling detailed understanding by executing code and drawing borders on images.

Google has announced Agentic Vision , a new feature in Gemini 3 Flash that allows for highly accurate image understanding. Agentic Vision enables active image understanding while zooming in on images, and also includes the ability to execute code such as drawing bounding boxes in Python and accurately counting numbers.

Introducing Agentic Vision in Gemini 3 Flash

https://blog.google/innovation-and-ai/technology/developers-tools/agentic-vision-gemini-3-flash/

According to Google, existing image recognition AI works by 'looking at an image only once and trying to understand its contents.' Agentic Vision can understand images with high accuracy by looping through agent-like tasks such as 'thinking based on the user's instructions and the image, and performing processes such as enlarging the image or executing code as needed.'

At first glance, 'image understanding' and 'code execution' seem unrelated processes, but in benchmark tests conducted by Google, image understanding processing with code execution recorded higher scores.

You can get a good idea of what 'image understanding with code execution' is like by running the following demo provided by Google.

Gemini Agentic Vision | Google AI Studio

https://aistudio.google.com/apps/bundled/gemini_visual_thinking?e=0&showPreview=true&showAssistant=true&fullscreenApplet=true

Let's try running one of the demo tasks, 'Count the number of fingers.'

The task was to present participants with an emoji-style illustration of a hand and ask them to count the number of fingers.

Gemini correctly answered, 'I have six fingers,' while showing an image of each finger surrounded by a red frame.

If you check the code, you'll see that it uses Python to draw a red frame around each finger. According to Google, drawing the bounding box with Python prevents miscounting. By executing code and drawing directly on the image, the system reinforces its reasoning and improves the accuracy of image understanding.

Agentic Vision can also perform other operations such as enlarging an image or normalizing the values contained in the image.

Try 👁 Agentic Vision with Gemini 3 Flash in @GoogleAIStudio or Vertex AI. This new capability enables the model to effectively use code and reasoning to improve performance for common vision tasks.

See Agentic Vision in action: https://t.co/z0k9VG1YmQ pic.twitter.com/gO5YpAglK5
— Google AI Developers (@googleaidevs) January 27, 2026

Agentic Vision is now rolling out to run in the Think mode of the Gemini app and is also available via API.

Related Posts:

Jan 28, 2026 12:13:00 in AI, Posted by log1o_hf