Using machine learning may be able to identify 'in which laboratory genetically edited DNA was produced'
by Caroline Davis 2010
It is known that machine learning can be applied to a wide variety of fields and it is possible to " judge whether it is a Johnny's face or not " and " to identify an individual from anonymous source code ". By using such machine learning, research results have been announced that "It is possible to find out where the genetically edited DNA was genetically edited in the laboratory".
Deep learning to predict the lab-of-origin of engineered DNA | Nature Communications
CRISPR and other gene editing technologies are new technologies that have developed rapidly in recent years, and the scale of research is acceleratingly increasing. Meanwhile, as malicious characters accessed new technology that was newly announced along with the dramatic development of the field, it carries out illegal drug manufacturing, malicious genetic editing, infringement of intellectual property rights There is concern that it is being done.
"The sequence of genes has become the key to resolution of the case" includes Salmonella terrorism by Rajnesi, an emerging religious group that occurred in the United States, and a case of anthrax bacteria identified as a criminal by researchers at the Institute of Army Infectious Diseases there is. In these cases the characteristics of Salmonella genes used were consistent with salmonella cultured in vitro seized from religious organizations or genetic variation of Bacillus anthracis detected from letters with anthrax bacteria confirmed, Evidence that the criminal matched genetic variation found in the anthrax in the flask found at the laboratory where the offender worked led to the identification of the criminal.
Similar genetic features appear in genes edited by researchers. In the field of gene editing, genes are edited to add new genes to the original DNA and change the sequence of genes and to work as desired. Among them, editing the same genetic location as that time and relying on the previous successful experience, as well as unintended genetic mutation unique to laboratory and laboratory occurred, individuals and laboratories · It is said that a "signature" connected to the institute will be formed. However, even a skilled researcher finds it difficult to find a unique pattern that will be signed from a long gene sequence.
Therefore, researchers obtained the plasmid gene sequence often used for genetic modification from Addgene which is a large-scale repository, and edited using a convolution neural network of a machine learning model often used for image recognition processing and the like We attempted to identify laboratories and laboratories from the features of the genes.
First, researchers got a data set consisting of 42,364 DNAs associated with laboratory data, edited in 2230 laboratories worldwide. Next, in order to improve the precision of machine learning, researchers abolished laboratory and DNA data which could obtain only 9 or fewer DNA data in order to secure certain gene sequence data per laboratory . We used the remaining 37 th 6764 DNAs as a tool for machine learning and investigated whether we can identify laboratories and laboratories from DNA.
As a result, the machine learning model trained with an accuracy of 48% accurately specifies the laboratory and the laboratory from the gene sequence, and within the "top 10 laboratories likely to have edited a certain gene" within the correct answer The possibility that the laboratory is included has recorded 70%. Some cases confirmed genetic mutation peculiar to the laboratory, and in that case the gene mutation functions as "a signature unique to the laboratory". Unusual genetic mutations are strong evidence to identify laboratories from genetic sequences, but genetic mutations specific to laboratories have not necessarily been confirmed from all genes.
by TED Conference
A model that identifies laboratories and laboratories from genetic sequences using machine learning can not be said to boast of high accuracy at the time of article creation. In addition, even if the DNA of the bacteria used in a certain case was found to be derived from a specific laboratory, "Because the researcher at the laboratory caused the case, the DNA stored in the laboratory Is it not possible to identify whether the third party that used the incident caused the incident? Nonetheless, future research may further improve the accuracy of identifying laboratories from genetic sequences, and it may not be long before it can be used for criminal investigation.