Google announces 'Project Astra,' a GPT-4o-like AI agent that understands video and audio and answers questions



At the developer event 'Google I/O' held on May 14, 2024 local time, Google announced ' Project Astra (Astra) ', an AI agent that understands video and audio and answers questions in real time. In fact, a demo video has been released in which users ask Astra various questions about things they have photographed with their smartphone and smartglasses cameras.

Google Gemini updates: Flash 1.5, Gemma 2 and Project Astra

https://blog.google/technology/ai/google-gemini-update-flash-ai-assistant-io-2024/

Google strikes back at OpenAI with “Project Astra” AI agent prototype | Ars Technica
https://arstechnica.com/information-technology/2024/05/google-strikes-back-at-openai-with-project-astra-ai-agent-prototype/

On May 13th local time, OpenAI announced a new AI model called ' GPT-4o ' that can process voice and visual information at high speed and respond in real time, which caused a big stir. At Google I/O held the following day, Google announced the AI agent Astra as part of its development of a universal AI agent that can be useful in everyday life.

'As part of our mission to build AI responsibly for the benefit of humanity, Google DeepMind has always wanted to develop universal AI agents that can help in everyday life,' said Demis Hassabis , head of Google's AI division. 'Today, we're sharing our progress on the future of AI assistants with Project Astra (advanced seeing and talking responsive agent).'

In fact, in the demo video below, you can see the user asking Astra various questions while taking pictures of the surroundings with their smartphone and smartglasses cameras.

Project Astra: Our vision for the future of AI assistants - YouTube


By turning on the smartphone's camera and microphone, the user asks Astra to 'let us know if you see anything that makes a sound.'



When the speaker came into the camera's field of view, Astra quickly responded, 'I can see a speaker making sound.'



The user then draws an arrow on the camera's image and asks, 'What is this part of the speaker called?', referring to the top part of the speaker from which the sound comes.



Astra responded, 'That's

a tweeter . It's the part that produces high frequency sound.'



Next, he projected a picture of crayons in a pen holder and asked the students to 'make a creative rhyme about these.' Astra responded with a rhyming poem: 'Creative crayons color cheerfully. They certainly craft colorful creations.'



When asked what the code on the PC screen did, Astra replied that it defined encryption and decryption functions.



When I showed it the view outside the window and asked, 'Where do you think I'm in?' Astra replied, 'This appears to be

the Kings Cross district of London, known for its train station and transport links.'



Users also ask unexpected questions like, 'Do you remember where you saw my glasses?'



It seemed pretty difficult, but Astra replied, 'Yes, I saw it. Your glasses were on the desk, near the red apple.' Indeed, the smart glasses were placed near the red apple.



This time, I talk to Astra while viewing the surroundings through the camera of the smart glasses I'm wearing.



When he wrote on a whiteboard diagram and asked, 'What can I add here to make this system faster?' Astra responded, 'Adding a cache between the server and the database could improve speed.'



In addition, a box with a question mark was placed between two cat faces drawn on a whiteboard and Astra was asked, 'What does this remind you of?' It was a bit of a riddle, but Astra answered, '

Schrödinger's cat .'



When I showed the robot a stuffed tiger and a dog and asked it the name of the band, it replied, 'Golden Stripes.' It's possible to respond in real time, as if you were having a conversation with a human.



Project Astra uses a state-of-the-art voice model to improve the quality of the voice and the range of intonations.

'With technology like this, it's easy to envision a future where people have dedicated AI assistants by their side, whether through their smartphones or smart glasses, and some of these features will be coming to Google services later this year, such as the Gemini app and web experience,' Hassabis said.

in Software,   Web Service,   Video, Posted by log1h_ik