The Japanese-language compatible OCR model 'Mistral OCR 4' has been released, and its AI converts PDFs and documents full of tables into 'usable data.'



Mistral AI has released its document recognition model, 'Mistral OCR 4.' This model not only extracts text from PDFs and Office documents, but also recognizes elements within the document such as tables, formulas, and signatures, and can output a structured output indicating where each element is located.

Mistral OCR 4 : SOTA OCR for Document Intelligence

https://mistral.ai/news/ocr-4/

OCR, which reads and makes searchable invoices, contracts, financial statements, manuals, and other documents, is used by a variety of companies. However, since many documents are created with the assumption that humans will read them, simply converting text into digital text doesn't tell the system whether something is a number in a table, a heading, a footnote, or a section that the system is confident in reading. As a result, human verification and manual formatting are often required in later stages.

Mistral OCR 4 goes beyond traditional OCR, which converts documents to plain text, and aims to break down documents into a format that is easier for AI and search systems to handle. According to Mistral AI, in addition to the extracted text, OCR 4 returns bounding boxes indicating the location of characters and blocks, block classifications such as titles, tables, formulas, and signatures, and confidence scores at the page and word levels.

The image below shows a comparison of OCR model performance. It demonstrates that Mistral OCR 4 achieved the highest score among the models compared, using both the publicly available benchmark OlmOCRBench and Mistral AI's internal evaluation tool, Crawl Multilingual.



Bounding boxes are a feature that indicates 'where in a document information was taken from.' For example, when AI answers questions about a contract, it can highlight the location of the relevant clause on the screen, or show which field the total amount on an invoice was read from. Confidence scores can be used to direct only questionable sections to human verification, reducing the need for a person to review every page.

Mistral AI envisions Mistral OCR 4 being used for document analysis, search augmentation generation (RAG), form entry and invoice processing by AI agents, compliance checks, internal search, and knowledge base building.

The image below shows the results of a blind comparison of Mistral OCR 4 and competing products by independent annotators. It shows that in many cases, the output of Mistral OCR 4 was chosen over the output of AWS Textract, Azure Doc Intel, Gemini 3.1 Pro Preview, and others.



Supported formats include common enterprise document formats such as PDF, DOC, PPT, and OpenDocument, and it supports 170 languages divided into 10 language groups. Mistral AI explains that improvements are particularly noticeable in specialized language categories, including Japanese, Hindi, and Greek, as well as low-resource languages, where accuracy tends to suffer in many systems.

The following is a comparison image of the special language category in Crawl Multilingual. It shows the scores of Mistral OCR 4, Chandra OCR 2, Mineru Pro, and PaddleOCR VL, and it can be seen that Mistral OCR 4 also scores the top in terms of reading performance for multilingual documents.



There are two ways to use OCR 4: using it as a standalone API, or layering Document AI functionality on the same API. It can also be used from Mistral Studio. Using the API alone is for developers to directly integrate it into their own apps or data processing pipelines, while layering it with Document AI is for applications such as formatting output according to a defined JSON schema, annotating images, or interpreting documents with custom instructions. Mistral AI suggests using OCR 4 directly via the API if raw extraction results are needed, and adding Document AI functionality if structuring tailored to specific business items is required.

As of the time of writing, the API fees for Mistral OCR 4 are $4 per 1000 pages (approximately 645 yen), $2 per 1000 pages (approximately 323 yen) when using the Batch API, and $5 per 1000 pages (approximately 808 yen) for Document AI. In addition to the API and Mistral Studio, it can also be used via Amazon SageMaker and Microsoft Foundry, and support for Snowflake Parse Document is planned. Mistral AI states that it will also offer a self-hosted option for enterprise customers that allows organizations that cannot send confidential documents outside the company to run it on their own infrastructure.

in AI, Posted by log1d_ts