Alibaba's AI research team has released the visual language model 'Qwen2.5 VL' that can recognize and automatically operate the UI of PCs and smartphones, and can automatically perform airline ticket reservations and other tasks with performance exceeding GPT-4o



Qwen , an AI research team at Alibaba Cloud, has released a visual language model called ' Qwen2.5 VL '. Qwen2.5 VL can not only recognize the type of subject in an image and transcribe text, but also recognize the UI of a PC or smartphone and automatically operate it.

Qwen2.5 VL! Qwen2.5 VL! Qwen2.5 VL! | Qwen
https://qwenlm.github.io/blog/qwen2.5-vl/




Below is an example showing the performance of Qwen2.5 VL. If you show a person an image of four cars and ask them to tell you the name of the car in English and Chinese, they will answer correctly.



It can also handle complex tasks such as labeling the names of two basketball players and the positions of their left and right hands when shown a photo of them.



It is also possible to transcribe vertically written text.



You can also summarize videos that are over an hour long.



In addition, Qwen2.5 VL can recognize the UI of a PC or smartphone and operate it automatically. In the video below, you can see how Qwen2.5 VL executes the task 'Install an extension to Visual Studio Code'.

How to automatically operate a PC with the AI model 'Qwen2.5 VL' - YouTube


You can also book your flight using a ticket booking app on your smartphone.

'Qwen2.5 VL' automatically operates smartphone apps to book airline tickets - YouTube


Qwen2.5 VL is available in three types: 3B, 7B, and 72B. Qwen2.5 VL 72B outperforms Gemini 2.0 Flash and GPT-4o in various benchmarks.



In addition, 'Qwen2.5 VL 7B' shows higher performance than 'GPT-4o mini'.



Qwen2.5 VL is already available for use with Qwen's chat AI,

Qwen Chat .



In addition, three types, 'Qwen2.5-VL-3B-Instruct', 'Qwen2.5-VL-7B-Instruct', and 'Qwen2.5-VL-72B-Instruct', have been released on Hugging Face.

Qwen2.5-VL - a Qwen Collection
https://huggingface.co/collections/Qwen/qwen25-vl-6795ffac22b334a837c0f9a5



in Software,   Video, Posted by log1o_hf