The head of Google DeepMind's AlphaFold team explains how AI will revolutionize scientific discovery



AlphaFold , an AI developed by Google DeepMind in 2018 to predict the three-dimensional structure of proteins from amino acid sequence information, was open-sourced in 2021 with the aim of accelerating research in many important fields, and AlphaFold 3 , an AI model capable of predicting the structures and interactions of even more biological molecules, was released and open-sourced in 2024. John Jumper, who led the AlphaFold team, spoke at the AI Startup School about what makes AlphaFold so superior and how it will revolutionize scientific discovery.

Nobel Laureate John Jumper: AI is Revolutionizing Scientific Discovery - YouTube - YouTube


'I truly believe that AI systems, technologies and ideas can be used to change the world in very concrete ways, accelerating scientific progress and enabling new discoveries,' Jumper said during a speech at the AI Startup School.



Jumper majored in physics and earned a PhD, but he lost interest in the field and went to work for a

computational biology company. Computational biology is a field that applies computer science, applied mathematics, and statistics to solving biological problems and analyzing data. Jumper then left his job and returned to university to study biophysics. He then joined Google DeepMind.

Jumper cited the complexity of cells as a challenge in biological research. For example, proteins are substances involved in almost all biological processes, such as muscle contraction, blood transport, light detection, and food energy conversion. However, most of the more than 200 million proteins discovered by humans have such complex structures that only their amino acid sequences are known. Predicting three-dimensional structure from amino acid sequences is known as the ' protein folding problem ,' and has been a major challenge in biology for many years.



Furthermore, even if a protein structure is somehow identified, the process of registering it in a databank is complicated and time-consuming. 'There are datasets representing almost all of the academically correct answers regarding protein structures that the community has solved, and we need to collect them in an easily accessible way,' Jumper said. 'Although the amount of data solved continues to grow every year, it is still far from enough for research.'

AlphaFold is playing an active role in this field. It predicts the three-dimensional structures of over 100 million proteins per year, which is roughly half of all known proteins. Since AlphaFold became open source, it has been so influential that the number of citations of papers using AlphaFold has skyrocketed.

How is the AI 'AlphaFold' that predicts protein three-dimensional structures changing the world of biology? - GIGAZINE



AlphaFold's contributions extend beyond the enrichment of the database. Protein structure research requires data-based computing and experimental structure predictions. However, AlphaFold's structure predictions provide useful approximations for interpreting data obtained through experimental methods, making it possible to conduct experiments with fewer people and accelerating research, thereby making a significant contribution to efficiency.



Jumper points out that the key to building a good AI system is to 'use external benchmarks.' When announcing a new AI model, it is often announced that 'we have achieved such high benchmarks,' but Jumper points out that 'we often choose benchmarks that are overly suited to our ideas.' Since the problems we face in reality are often more difficult than the problems we deal with in training, Jumper says that to build a good AI system, it is necessary to properly measure both 'during development' and 'when users decide to use the system.'

Jumper then cited two crucial factors that made AlphaFold even better. The first was open-sourcing the AlphaFold code. According to Jumper, releasing the code for expert use versus making it widely available in database format had a significantly different impact on society. 'Creating the right tools to solve the right problems has a significant impact on the lives and work of other researchers,' Jumper said. 'Even without Google DeepMind's direct involvement, releasing tools can be a catalyst for scientific innovation, leading to some amazing discoveries we never anticipated. Making the code widely accessible in database format has a greater sociological and scientific impact than simply releasing the code for experts.'

The second point is the wide range of applications of AlphaFold. AlphaFold has not only been widely used for its primary purpose of analyzing protein structures, but also for problems that were not originally anticipated, such as the interaction of multiple proteins and the design of new proteins.

The following was posted by X user Yoshitaka Moriwaki to X as 'I thought I could use a little bug technique using AlphaFold2,' and was quoted by Jumper in his lecture. According to Moriwaki, the idea of predicting which parts stick together when proteins A and B combine to form the AB complex from 'coevolutionary information' was previously unknown to researchers. However, he thought that if AlphaFold2 was as accurate as AlphaFold2, he might be able to predict the parts that stick together. He tried it, and although it wasn't a perfect match, it was possible to predict the complex. Jumper states that 'the tool's ability to apply the knowledge it has learned to other uses' is what makes AlphaFold more powerful.



'I believe AlphaFold has advanced the entire field of structural biology by 5 to 10 percent,' Jumper said of AlphaFold's contribution. 'It's immeasurable how important this is to the world, and I believe many more discoveries will be made in the future.'

in AI,   Software,   Science, Posted by log1e_dh