Pointed out that Google's official movie highlighting the performance of multimodal AI ``Gemini'' was fake



Gemini , a large-scale language model (LLM) announced by Google on December 6, 2023, has a major feature of multimodality, which allows it to interact with users by understanding not only text but also images and videos. A demo movie showing Gemini's performance that was released by ``Gemini'' became a hot topic. However, it has been pointed out that the Gemini demo shown in this movie is a fake.

Google's best Gemini demo was faked | TechCrunch
https://techcrunch.com/2023/12/07/googles-best-gemini-demo-was-faked/

The demo movie that was pointed out as fake is below.

Hands-on with Gemini: Interacting with multimodal AI - YouTube


'According to a spokesperson, the movie was neither real-time nor audio,' said Bloomberg reporter Palmy Olson. It appears that it has been displayed.”



In the movie, it looks like Gemini is responding as it is after watching the video or image, but in reality, Gemini was not making judgments and responding in real time while watching the video, but was just looking at still images of the video. They then communicated using text prompts. However, Google has clarified this entered text prompt in the following developer blog.

How it's Made: Interacting with Gemini through multimodal prompting - Google for Developers
https://developers.googleblog.com/2023/12/how-its-made-gemini-multimodal-prompting.html

Around 2 minutes and 45 seconds into the movie, there is a scene where he plays rock-paper-scissors with Gemini.



In this scene, Gemini looks at the shape and movement of the hand and says, ``It's rock, paper, scissors!'', but in reality, it loads three types of images: Pa, Goo, and Choki, and then says, ``What do you do?'' think I'm doing? Hint: it's a game.'' The prompt asked.



Also, the part where Gemini talks about the blue rubber duck is not disclosed on the developer blog, and the IT news site TechCrunch has expressed distrust.



TechCrunch points out that the title of the demo movie is ``Hands-on with Gemini'', which is misleading as if it shows the actual operation of Gemini. He added: 'We should assume that the Google AI demo is probably an exaggeration. I would write in the headline of the article that this footage was 'fabricated.' At first, this harsh language is justified. 'I didn't know if it was a demo movie or not,' he said, criticizing Google for releasing something that was not an actual feature as if it were a demo movie.

TechCrunch says, 'While similar at first glance, this doesn't feel like the same interaction. One is an intuitive, wordless evaluation of an abstract idea captured on the fly, and the other is a functionally limited 'It's an engineering interaction with a lot of hints. In Gemini's case, it was the latter, not the former.'

According to TechCrunch, after publishing the article, Oriol Viñals, vice president of research at Google DeepMind, said, ``This movie shows what a multimodality user experience built with Gemini can be.'' We made this movie to give inspiration to developers.''

in Software,   Video, Posted by log1i_yk