Google releases Gemini 1.5 Flash-8B, which is smaller and faster than previous models, with a 50% lower price and reduced latency



Google announced the high-speed, high-performance, lightweight AI model 'Gemini 1.5 Flash' in May 2024. The product version of this even lighter and more affordable version 'Gemini 1.5 Flash-8B' was released on October 3, 2024.

Gemini 1.5 Flash-8B is now production ready - Google Developers Blog

https://developers.googleblog.com/en/gemini-15-flash-8b-is-now-generally-available-for-use/



Gemini 1.5 Flash-8B is now production ready
https://simonwillison.net/2024/Oct/3/gemini-15-flash-8b/

The Gemini 1.5 Flash, an AI model announced by Google, boasts performance comparable to the Gemini 1.5 Pro in benchmark tests, but is significantly cheaper to use than the Gemini 1.5 Pro.

Google announces 'Gemini Flash', a high-speed, high-performance lightweight AI model, with performance equivalent to that of Gemini Pro at one-tenth the price - GIGAZINE



Over the next few months, Google improved Gemini 1.5 Flash and developed a smaller, faster version called Gemini 1.5 Flash-8B, which was released as an experimental version in September and a production version in October.

Although the Gemini 1.5 Flash-8B is a lightweight model, it is said to perform on par with the Gemini 1.5 Flash in many benchmarks.

Gemini 1.5 Flash-8B is the lowest priced Gemini series, and Google claims it is '50% cheaper than Gemini 1.5 Flash.' The specific costs are '$0.0375 (about 5.5 yen) per 1 million tokens of prompt input,' '$0.15 (about 22 yen) per 1 million tokens of prompt output,' and '$0.01 (about 1.5 yen) per 1 million tokens of prompt cache,' and the amount doubles for any prompts exceeding 12,800 tokens.



For developers on paid plans , billing will begin on October 14. Google said, 'This new pricing, along with the work we've done to reduce costs for developers with Gemini 1.5 Flash and Gemini 1.5 Pro, underscores our commitment to empower developers to freely build products and services that move the world forward.'

Gemini 1.5 Flash-8B also boasts lower latency for small prompts than Gemini 1.5 Flash, and the rate limit has been doubled.

Developers can access Gemini 1.5 Flash-8B via Google AI Studio and the Gemini API.

Untitled prompt | Google AI Studio
https://aistudio.google.com/app/prompts/new_chat?model=gemini-1.5-flash-8b-002

Gemini models | Gemini API | Google AI for Developers
https://ai.google.dev/gemini-api/docs/models/gemini

in Software,   Web Service, Posted by log1h_ik