Research results report that Google's medical interview specialized AI 'AMIE' can make more accurate diagnoses than humans and has a better impression on patients



When a patient becomes ill or injured, a doctor and patient at a hospital or clinic conduct a ``

medical interview ,'' in which information is exchanged such as listening to symptoms and medical history. A new AI model developed by Google, Articulate Medical Intelligence Explorer (AMIE), is specialized for medical interviews, and in its own research, AMIE not only makes better diagnoses than human doctors, but also It has been reported that the impression was excellent.

[2401.05654] Towards Conversational Diagnostic AI
https://arxiv.org/abs/2401.05654

AMIE: A research AI system for diagnostic medical reasoning and conversations – Google Research Blog
https://blog.research.google/2024/01/amie-research-ai-system-for-diagnostic_12.html



Google AI has better bedside manner than human doctors — and makes better diagnoses

https://www.nature.com/articles/d41586-024-00099-4

In a medical interview, having an appropriate dialogue with a patient not only makes it possible to understand the patient's medical history and decide on appropriate treatment, but also to respond empathically to the patient's emotions, thereby improving the patient's mental health care. It is possible to do both at the same time.

However, while traditional large-scale language models have been able to accurately perform tasks such as summarizing medical papers and answering medical questions, until now AI developed for medical interviews There were very few.

Developed jointly by Google Research and Google DeepMind research teams, AMIE is a conversational medical AI optimized for medical interviews. According to the research team, AMIE is trained from both clinician and patient perspectives.

Furthermore, in order to address the issue of ``the lack of real-world medical conversations that can be used as training data,'' the research team developed a self-play-based simulated interaction environment and equipped AMIE with an automatic feedback function. . As a result, AMIE is able to respond to a large number of medical conditions, specialties, and scenarios, and through repeated dialogue and feedback, its responses are gradually refined to provide accurate and evidence-based responses to patients. It looks like this.



When developing AMIE, the basic large-scale language model was first fine-tuned using real-world datasets such as electronic medical records and transcribed medical interviews. The research team then repeatedly trained the large-scale language model to instruct it to understand the medical history and diagnose a hypothetical patient.

Ultimately, the research team conducted an experiment in which 20 subjects underwent medical interviews through online chat, with the participants concealing whether they were interacting with AMIE or a human doctor. Subjects were asked to complete 149 interview scenarios and rate their interviews.

As a result of the experiment, AMIE was found to be effective in the six medical specialty areas examined: 'accuracy of diagnosis,' 'trust in treatment,' 'honesty of doctors,' 'empathy from doctors,' 'accuracy of instructions,' and 'patient health management.' In all cases, the results matched or exceeded the diagnostic accuracy of human doctors.



Additionally, it has been reported that AMIE outperformed doctors in 24 of 26 items related to conversational quality, including ``politeness,'' ``explanation of medical conditions and treatments,'' ``honesty,'' and ``consideration for patients.'' Below is a graph showing the results. It is clear that AMIE (red) has a higher quality of conversation than doctors (blue) by most criteria.



In addition, it was revealed that diagnostic accuracy was significantly improved when human doctors used AMIE during medical interviews. The graph below shows the diagnostic accuracy with and without the use of AMIE. Compared to when no support is used (blue), the diagnostic accuracy is significantly improved when Internet search is also used (green) and when AMIE is also used (yellow). In addition, in this experiment, the diagnostic accuracy was highest in the case (red) in which AMIE conducted the medical interview.



On the other hand, Alan Karthikesalingam of the research team pointed out, ``These results do not show that AMIE is in any way superior to doctors.'' According to Karthikesaringam, the doctors who participated in the study were not accustomed to conducting medical interviews with patients through text-based chat, which may have led to poor performance.

Still, AMIE 'has the advantage of being able to quickly generate consistent answers, and its tireless AI can be consistently compassionate to any patient.'

As a next step, the research team cites ``investigating the ethical requirements for testing AMIE in patients who actually have the disease.'' Daniel Ting, a clinician and AI developer at Duke University School of Medicine, said: 'Privacy for patients who will be using AMIE is also an important aspect to consider. The problem with current large-scale language models is that data There are many cases where it is not clear where the data is stored or how it is analyzed.'

in Software,   Science, Posted by log1r_ut