Oct 02, 2024 11:35:00

OpenAI's GPT-4o adds 'image fine-tuning' function, improving task performance with just 100 images

The AI model '

GPT-4o ' released by OpenAI in May 2024 will have a 'function to fine-tune with images' added from October 1, 2024. The function to fine-tune GPT-4o with text was already available, but the ability to fine-tune with images will enable improvements to functions such as visual search and object detection.

🖼 We're adding support for vision fine-tuning. You can now fine-tune GPT-4o with images, in addition to text. Free training till October 31, up to 1M tokens a day. https://t.co/Nqi7DYYiNC pic.twitter.com/g8N68EIOTi
— OpenAI Developers (@OpenAIDevs) October 1, 2024

Introducing vision to the fine-tuning API | OpenAI
https://openai.com/index/introducing-vision-to-the-fine-tuning-api/

On October 1, OpenAI released the ability to fine-tune GPT-4o with images. OpenAI said, 'Since we first introduced fine-tuning in GPT-4o, hundreds of thousands of developers have customized models with text-only datasets to improve performance for specific tasks. However, in many cases, fine-tuning a model with text alone does not produce the expected performance gains.'

To fine-tune GPT-4o, first access the fine-tuning dashboard and click the “+Create” button on the screen.

Select 'gpt-4o-2024-08-06' from the base model list.

Then select the image dataset file and drag and drop it into the upload form. The image dataset must be in

the correct format beforehand. OpenAI claims that it can improve the performance of GPT-4o's image recognition tasks with just 100 images.

OpenAI is already working with several partners to use GPT-4o's image fine-tuning capabilities in practice.

Grab , a ride-hailing app based in Southeast Asia, creates mapping data using road images collected from drivers. Grab fine-tuned GPT-4o with just 100 images to accurately identify the location of traffic signs and lane dividers.

As a result of fine-tuning, the accuracy of counting lanes improved by 20% and the accuracy of estimating the location of speed limit signs improved by 13% compared to the baseline model of GPT-4o, enabling the machine learning team to better automate mapping tasks that were previously manual.

Automat , a company that automates document processing and UI-based business processes, fine-tuned GPT-4o on a screenshot dataset. It improved the performance of identifying on-screen UI elements given natural language descriptions, increasing the success rate of its automated agents from 16.6% to 61.67%. They also fine-tuned GPT-4o on 200 unstructured insurance-related documents, improving the score on an information extraction task by 7%.

Coframe , which develops AI engineering assistants to optimize the creation and testing of websites and UIs, offers a service that autonomously generates new branded sections based on already existing website elements.

By fine-tuning GPT-4o using images and code, Coframe was able to improve the model's ability to generate websites with a consistent visual style and correct layout by 26% compared to the base GPT-4o.

The image-based fine-tuning function of GPT-4o is supported by the latest model 'gpt-4o-2024-08-06' and is available to all developers with paid account tiers . Until October 31, 1 million training tokens are provided free of charge per day, and after November 1, fine-tuning will cost $25 (approx. 3,600 yen) per million tokens, $3.75 (approx. 540 yen) per million tokens for input, and $15 (approx. 2,200 yen) per million tokens for output.

Related Posts:

Oct 02, 2024 11:35:00 in Software, Web Service, Video, Posted by log1h_ik