Elon Musk's AI company announces 'Grok-1.5', a multimodal AI capable of understanding images, with performance comparable to 'GPT-4V' and 'Gemini Pro 1.5'

xAI , an AI company founded by Elon Musk, has announced its first multimodal AI model, Grok-1.5 . Grok-1.5 is capable of understanding images and performing operations such as 'recognizing flowcharts and writing code' and 'calculating calories by looking at nutritional information.'

Grok-1.5 Vision Preview

Grok-1.5 is a multimodal AI model that supports both text generation and image recognition in one model. Grok-1.5 is scheduled to begin testing soon for existing Grok users and some testers.

xAI has published several examples that demonstrate the capabilities of Grok-1.5. For example, in the following example, you can show a flowchart and ask the user to 'convert this flowchart into Python code,' which will output the Python code.

Also, when I showed him the nutritional information and asked, 'How many calories are there in five pieces?' he gave me the correct calorie count, explaining the process along the way.

Additionally, you can show a screenshot of the table and have it converted into CSV format.

In conjunction with the release of Grok-1.5, xAI also announced the RealWorldQA benchmark for multimodal AI. The initial version of RealWorldQA includes 765 images and questions that accompany the images, allowing the company to measure the spatial recognition ability of multimodal AI in the real world.

Below is a table listing the various benchmark results for 'Grok-1.5V', 'GPT-4V', 'Claude 3 Sonnet', 'Claude 3 Opus' and 'Gemini Pro 1.5'. Grok-1.5V has a higher score than GPT-4V and Gemini Pro 1.5 in several benchmarks. In addition, it has the highest score among the comparison targets in some tests, including RealWorldQA.

benchmark Grok-1.5V GPT-4V Claude 3 Sonnet Claude 3 Opus Gemini Pro 1.5
MMMU 53.6% 56.8% 53.1% 59.4% 58.5%
Mathvista 52.8% 49.9% 47.9% 50.5% 52.1%
AI2D 88.3% 78.2% 88.7% 88.1% 80.3%
Text VQA 78.1% 78.0% - - 73.5%
ChartQA 76.1% 78.5% 81.1% 80.8% 81.3%
DocVQA 85.6% 88.4% 89.5% 89.3% 86.5%
RealWorldQA 68.7% 61.4% 51.9% 49.8% 67.5%

The RealWorldQA data can be downloaded by clicking 'here (677MB)' in the Grok-1.5V release article.

in Software, Posted by log1o_hf