Multimodal AI ``Gemini'' with performance exceeding GPT-4, which can process text, voice, and images simultaneously and communicate more naturally than humans, will be released



On December 6, 2023 local time, Google DeepMind released the multimodal AI ' Gemini '. It is possible to process text, audio, and images simultaneously, and the top model has achieved performance exceeding GPT-4, and in the hands-on movie released at the same time, you can see it giving very natural responses. Masu.

Gemini - Google DeepMind

https://deepmind.google/technologies/gemini/

Introducing Gemini: Google's most capable AI model yet
https://blog.google/technology/ai/google-gemini-ai/


Gemini was released in three models: 'Ultra', 'Pro' and 'Nano'.



Each model is explained below.

・Gemini Ultra
The largest and most capable model for the most complex tasks. It is scheduled to be available after 2024.

・Gemini Pro
The best model for a wide range of tasks. At the same time as the announcement, Google's chat AI 'Bard' has been

upgraded to one based on Gemini Pro and is already available.

・Gemini Nano
The most efficient model for tasks on your device. It will be available for Pixel 8 Pro at the same time as the announcement.

Among these, the performance of the top model 'Gemini Ultra' is as shown in the figure below. It outperformed not only GPT-4 but also human experts in the general performance benchmark MMLU , and outperformed GPT-4 in many other indicators.



Gemini is a multimodal AI, which means it can process not only text but also images, movies, and audio at the same time. It outperformed GPT-4V in all multimodal-related benchmarks.



A hands-on movie that actually uses Gemini to check its performance has been released with Japanese subtitles.

Hands-on with Gemini: Interacting with multimodal AI - YouTube


When asked by voice, ``What do you see?'' Gemini replied, ``I see a squiggly line.''



When I drew a line and asked her again, her answer changed to, ``It looks like a bird to me.''



When I put in the water line, it said it was a duck.



As you paint the duck blue, Gemini will tell you that the duck's color is not blue.



So he took out a toy duck and asked, 'Is it true?'



Gemini changed her opinion and said, ``Looks like blue ducks are more common than I thought.''



When I asked him to guess the material, he said, ``It looks like rubber or plastic.'' I am able to interact with people quite naturally.



Next, I show the world map and ask them to ``think of a game based on what they can see right now.''



Gemini suggested a country guessing game and immediately asked the first question. ``Home of kangaroos, koalas, and the Great Barrier Reef'' and gives you three hints.



When I pointed to Australia, he said he was correct.



When I put the ball under the cup, she didn't say anything yet, but I guessed, ``You're trying to get me to find the paper ball under the cup.''



I tried to confuse Gemini by quickly manipulating the cup, but Gemini correctly answered, 'It's the cup on the left.'



Hand gestures were also successfully guessed.



I show them two pieces of yarn and ask them for examples of their work. Three examples were suggested, including 'dragon fruit' for green and pink yarn.



It will generate an image of the work example according to the color of the yarn.



When we prepared an illustration of a fork in the road and asked, ``Which way should we go?'' Gemini correctly guessed the situation and answered, ``We should go to the left.''



When I showed him an illustration with just dots and numbers, he said, ``This is a picture of a crab.''



By connecting the dots with lines in numerical order, a beautiful crab picture was completed.



You can also guess which car is faster based on the shape of the car.



You can also answer the question, 'Which roller coaster seems more fun?'



When I said, ``Think of an appropriate line,'' it generated a nice line that sounded like ``Kya!''.



When you show an illustration of a guitar, it will play acoustic guitar music.


When connected to an amplifier, the music changes to electric guitar music.



Adding an illustration of a palm tree makes it sound like beachy ukulele music.



You can also answer the question, 'What is this scene trying to recreate?'



When I paused the movie and asked, ``What's going to happen after this?'' I was able to predict the future, saying, ``I'm sure I'll get a perfect score of 10!''



Gemini already has a Pro model installed on Bard, and a Nano model is available on Pixel 8 Pro. Gemini API for developers is scheduled to be available from December 13, 2023, and Gemini Pro can be accessed via the API. It is stated that Gemini Ultra is scheduled to be available after 2024.



・Continued
Bard, which has been significantly enhanced with the AI model 'Gemini Pro', will be available, and 'Bard Advanced' equipped with Gemini Ultra will also be released in 2024 - GIGAZINE



in Software,   Video, Posted by log1d_ts