Introducing the chemical-specific AI inference model 'ether0' capable of scientifically correct molecular design, applying reinforcement learning to existing language models to achieve performance that exceeds that of humans and existing AI

ether0: a scientific reasoning model for chemistry | FutureHouse
https://www.futurehouse.org/research-announcements/ether0-a-scientific-reasoning-model-for-chemistry
TRAINING A SCIENTIFIC REASONING MODEL FOR CHEMISTRY
(PDF file) https://storage.googleapis.com/aviary-public/ether0_preprint.pdf
Inference models such as OpenAI o3 and Claude Opus 4 can score higher than humans in benchmark tests that include questions about chemistry. However, they are not good at answering practical questions about molecules, and sometimes respond with scientifically impossible molecular structures. Ether0 was developed to solve this problem.
ether0 is a model that uses reinforcement learning and fine-tuning based on Mistral AI's inference model 'Mistral-Small-24B-Instruct-2501', and is capable of processing molecular questions with high accuracy. Below are the results of outputting the structure of 'C 27 H 37 N 3 O 4 ' to ether0 (left), OpenAI o3 (center), and Claude Opus 4 (right). OpenAI o3 and Claude Opus 4 output incorrect structures, but ether0 was able to output the scientifically correct structure.

The graph below shows the accuracy rate when multiple AI models, including ether0, and humans were asked to solve chemistry problems. The left side of the graph shows the accuracy rate for free-form questions, and the right side shows the accuracy rate for multiple-choice questions. As you can see from the graph, ether0 outperforms humans and other AI models in free-form questions.

In addition, ether0 could output the thoughts leading up to a direct answer to a question, and the content was scientifically convincing.
Finally, the individual reasoning traces that come out of the model are also very compelling. Here it is backing out the chemical structure associated with a natural product. This is a hard task for humans. 4/n
pic.twitter.com/3AzDeGfgUv — Sam Rodriques (@SGRodriques) June 5, 2025
Sam Rodriques , CEO of FutureHouse, points out that ether0 is notable for its ability to learn more efficiently than specialized models. The graph below shows the progress of learning on the horizontal axis and the accuracy rate on the vertical axis, and it can be seen that ether0 has high accuracy from an early stage.
However, ether0 is merely a prototype model, and there are problems such as 'performance declines in problems other than molecular problems' and 'due to the influence of training data, it outputs incorrect answers in some tasks.' Nevertheless, Rodriques appeals, 'This research can be said to be a proof of concept that shows that with the right training data, language models can achieve superhuman performance in scientific problems very efficiently.'
This model is a prototype. It beats frontier models by a large margin on some tasks, but is relatively narrowly useful in those domains. It is trained for molecular design tasks, so it can only answer questions for which the answer is a molecule. Also, it is based on a 24B model,…
— Sam Rodriques (@SGRodriques) June 5, 2025
The model data for ether0 is distributed at the following link.
futurehouse/ether0 · Hugging Face
https://huggingface.co/futurehouse/ether0

You can also try running ether0 on the following page:
ether0 Chemical Reasoner
https://ether0.platform.futurehouse.org/

Related Posts: