Google claims that its Gemini 3 Deep Think-based agent, Aletheia, has successfully completed autonomous mathematical research

Google DeepMind announced that it has developed a mathematical research agent called ' Aletheia ,' which utilizes Gemini's advanced reasoning capabilities, 'Deep Think,' and has achieved autonomous results in specialized mathematical research. The agent has the ability to generate, verify, and revise answers end-to-end using natural language, and has achieved a wide range of milestones, from solving difficult problems at the International Mathematical Olympiad (IMO) level to doctoral-level exercises and even solving open problems in actual academic research.
Gemini Deep Think: Redefining the Future of Scientific Research — Google DeepMind
superhuman/aletheia/Aletheia.pdf at main · google-deepmind/superhuman · GitHub
https://github.com/google-deepmind/superhuman/blob/main/aletheia/Aletheia.pdf
Aletheia is based on Gemini 3 Deep Think, which was developed to solve extremely difficult reasoning problems. The system works by interacting with three subagents: a 'Generator' that generates the answer, a 'Verifier' that judges whether the answer is correct, and a 'Reviser' that makes minor corrections.

While conventional large language models are prone to hallucinations in specialized topics and have the problem of outputting inaccurate information, Aletheia significantly reduces unfounded citations and calculation errors by utilizing tools such as Google search to navigate the literature. Furthermore, the 'scaling law,' which improves accuracy by increasing the amount of computation allocated to inference time, was shown to be effective not only in competitive mathematics but also in doctoral-level mathematical exercises.
In benchmark evaluations, Aletheia achieved the highest accuracy of 95.1% on

Aletheia achieved groundbreaking results by calculating eigenweights, a structural constant in arithmetic geometry, without human intervention and generating
In the study to prove the bounds of independent sets , Aletheia proposed a 'big picture' strategy and a human described it in detail. Furthermore, in a large-scale evaluation of 700 open problems in the Erdős Conjecture database, Aletheia autonomously solved four open problems, one of which was further generalized and led to an independent paper .
Furthermore, Google DeepMind has proposed a classification system called 'Mathematical Research Autonomy Levels,' modeled on the level of self-driving cars, to appropriately evaluate mathematical results generated by AI. This framework categorizes the degree of AI contribution into three levels: 'Human with Secondary AI Input,' 'Human-AI Collaboration,' and 'Essentially Autonomous.' It also categorizes mathematical significance into five levels, from Level 0 (Negligible Novelty) to Level 4 (Landmark Breakthrough), with the aim of promoting highly transparent information sharing. Google DeepMind has classified the aforementioned research results as the highest level 2 (Publishable Research) and has already submitted them for peer review.

According to Google DeepMind, the applications of Gemini Deep Think extend beyond mathematics to solving difficult problems in physics and computer science. These results suggest that AI can be a powerful companion to human scientists by integrating vast amounts of knowledge and bridging different academic fields.
The prompts and outputs used to generate each paper in Aletheia are available on GitHub.
superhuman/aletheia at main · google-deepmind/superhuman · GitHub
https://github.com/google-deepmind/superhuman/tree/main/aletheia
Related Posts:







