The world's top open source large-scale language model 'Reflection 70B' has been released, surpassing GPT-4o in all benchmarks, and reflection tuning is applied based on Llama 3.1 70B Instruct



Recently, Google has announced the launch of Reflection 70B , an open source large-scale language model (LLM) trained using

reflection tuning , a training technique developed to enable large-scale language models (LLMs) to correct their own mistakes.

HyperWrite debuts Reflection 70B, most powerful open source LLM | VentureBeat
https://venturebeat.com/ai/meet-the-new-most-powerful-open-source-ai-model-in-the-world-hyperwrites-reflection-70b/



New Open Source AI Model Can Check Itself and Avoid Hallucinations | Inc.com
https://www.inc.com/kit-eaton/new-open-source-ai-model-can-check-itself-avoid-hallucinations.html

Startup aims to open source the world's most capable AI model
https://the-decoder.com/startup-aims-to-open-source-the-worlds-most-capable-ai-model/

Reflection 70B is a new AI model built by Matt Shumer , CEO of OthersideAI , the developer of the AI personal assistant HyperWrite , in collaboration with AI company Glaive AI .




When comparing its performance with LLMs such as Claude 3.5 Sonnet, GPT-4o, Gemini, and Llama 3.1 405B, Reflection 70B achieved top results in MMLU , a benchmark that measures multimodal language comprehension, MATH , a benchmark that measures mathematical problem solving, IFEval , a benchmark that evaluates instruction following, and GSM8K , a benchmark that evaluates mathematical ability. In addition, GPQA , a benchmark that evaluates advanced reasoning ability, and HumanEval , a benchmark that evaluates program synthesis ability, were also conducted, and it seems that they performed better than GPT-4o in all of these. In response to these benchmark results, Reflection 70B is being promoted as the 'world's top open source LLM.'



Reflection 70B is based on Meta's open source Llama 3.1 70B Instruct and utilizes the same code pipeline as other Llama models, but incorporates reflection tuning, a technique that allows the LLM to recognize mistakes and correct itself before finalizing the answer. Reflection 70B introduces new special tokens for inference and error correction, making it easier for users to interact with the model in a more structured way. During inference, Reflection 70B outputs inferences in special tags, so that if an error is detected, it can be corrected in real time.

In addition, by separating the 'planning' phase from the answer generation stage, Reflection 70B appears to embrace the effectiveness of the Chain-of-Thought approach while making the output simpler and more concise for end users.




Reflection 70B also uses LMSys ' LLM Decontaminator to check for data contamination.

Reflection 70B is available from:

mattshumer/Reflection-Llama-3.1-70B · Hugging Face
https://huggingface.co/mattshumer/Reflection-Llama-3.1-70B



A demo version of the Reflection 70B is available on the website , but due to heavy traffic, the page was temporarily down and unavailable at the time of writing.

According to Shumer, the Reflection 405B, which has more parameters than the Reflection 70B, is also planned to be released, and the Reflection 70B is also scheduled to be integrated into HyperWrite.

Reflection 70B was trained using a dataset created by Glaive AI, a company focused on solving one of the biggest bottlenecks in AI development: the availability of high-quality, task-specific data. By creating synthetic datasets tailored to specific needs, Glaive AI helps companies fine-tune their AI models quickly and at low cost.

in Software, Posted by logu_ii