Mistral releases its first multimodal AI model, Pixtral 12B, available via GitHub, Hugging Face, API service platform Le Chat, and Le Platforme



French AI startup Mistral has announced its first multimodal model , the Pixtral 12B , which can process not only text but also images.

mistral-community/pixtral-12b-240910 · Hugging Face

https://huggingface.co/mistral-community/pixtral-12b-240910

Pixtral 12B is a 12 billion parameter model that can process text and images simultaneously, enabling tasks such as image description, object identification, and answering image-related queries.

It is released under the Apache 2.0 license, meaning that anyone can get it for free and use and modify it without restrictions.

The model can be downloaded using the torrent magnet link shared by Mistral.




This is the same technique used by Mistral when they released the 8x22B MOE.

Mistral AI suddenly announces new large-scale language model '8x22B MOE', with a context length of 65k and a parameter size of up to 176 billion - GIGAZINE



In addition, GitHub and the Hugging Face page are also available. There is no demo version available to test the functionality on the web.

According to Sophia Yang, Mistral's head of developer relations, the Pixtral 12B will soon be available for testing on Mistral's chatbot and API delivery platforms, Le Chat and Le Plateforme.




It is unclear what image data Mistral used to develop the Pixtral 12B. Image data used by AI models for training may contain copyrighted material, which has often been problematic in the past.



in Software, Posted by log1p_kr