Google releases open source visual language model 'PaliGemma' and announces large-scale language model 'Gemma 2' with performance equivalent to Llama 3

Google announced the visual language model (VLM) ' PaliGemma ' and the large-scale language model (LLM) ' Gemma 2 ' on May 15, 2024. PaliGemma has already been released, and a demo that you can easily try is also available.

◆ Visual language model 'PaliGemma'
PaliGemma is a visual language model that can recognize images and perform operations such as 'describe the content of the image,' 'understand the text in the image,' and 'separate objects from the background in the image.'

PaliGemma is available on GitHub , Hugging Face , Kaggle , and Vertex AI Model Garden . NVIDIA is also developing a version of PaliGemma optimized for its own GPUs. A demo page where you can try out PaliGemma's functions is also available at the following link.

PaliGemma Demo - a Hugging Face Space by google

I actually tried out the PaliGemma function on the demo page. I entered an image of a tissue box and the text 'What is this?' and clicked 'Run'.

The answer I got was 'A white tissue box sitting on a grey carpet.'

◆ Large-scale language model 'Gemma 2'
Google released Gemma, an open source LLM that utilizes Gemini research resources, in February 2024. Google has now announced Gemma 2, an enhanced version of Gemma.

Gemma 2 has 27 billion parameters, which is comparable to Llama 3 70B , which has 70 billion parameters. Gemma 2 is also optimized for NVIDIA GPUs and Google's AI platform 'Vertex AI,' and can run on less than half the resources of comparable models.

At the time of writing, Gemma 2 is in pre-training, but it has already outperformed Grok in various benchmark tests.

Gemma 2 is scheduled to be released in the coming weeks.

