Microsoft's AI predicts "between" the conversation, enabling more natural communication


byian dooley

Microsoft has announced that it has developed a method to simultaneously analyze the voices of multiple speakers in a speech recognition system that is developing using artificial intelligence (AI). By doing this, the speech recognition system predicts "what people will say next", it becomes possible to recognize interruption of conversation and so on, so that it will become more sophisticated communication.

Microsoft's AI lets bots predict pauses and interrupt conversations | VentureBeat
https://venturebeat.com/2018/04/04/microsofts-ai-lets-bots-predict-pauses-and-interrupt-conversations/

Major speech recognition assistants such as Alexa and Google Assistant are aiming to express more human voice, and by further analyzing voicesUnderstand even human emotionsI am trying hard. As a result, many companies such as Google, Amazon, Apple and others are investing heavily in improving the performance of the voice assistant, but at the time of writing the article, the voice such as "Hey Siri" or "Alexa" for activating the function Currently it is only possible to communicate without taste, just asking for specific patterned actions from the command.

Meanwhile, Microsoft said "Xiaoice"Or"Rin"To evolve into a voice recognition assistant and aim to install it in a smart device like Xiaomi's Yeelight. Technology-based media VentureBeat is Microsoft's AI chat bot "Zo"When interviewing Ying Wang, who is the development director of the company, I heard that" Microsoft plans to extend the voice recognition assistant to the device within the next 6 months. "

Microsoft's chatbot · Rin which is provided for Japan has official account in LINE · Twitter · Instagram and you can enjoy communication with AI.


Normally, when asking the voice recognition assistant for something, it is necessary to activate the function by saying the key phrase first. For example, Amazon Echo needs to say "Alexa" before giving an action instruction to the voice recognition assistant. To instruct multiple actions, it is necessary to speak to "Alexa" each time.

I tried music playback on Amazon Echo with speech recognition - YouTube


On the other hand, according to the method developed by Microsoft that it is possible for conversation with the speech recognition assistant to be more natural, by activating the speech recognition assistant once, the timing at which the conversation is interrupted is all judged by the AI, It will pick up only instruction contents exactly. Microsoft's new speech recognition system is called "Full Duplex Voice Sense" and I think that you can see that more natural communication is obtained if you actually see the following movies. In order to activate the voice recognition assistant, we first talk to "Hello Xiaoice", but after that we have natural communication without voice commands to activate the function.

Microsoft Full Duplex Voice Sense - YouTube


In developing "Full Duplex Voice Sense", it seems that information obtained from conversations of people such as people-Zo · Xiaoice provided by Microsoft and people all over the world is used. The number of people who tried communicating with these chat bots is about 200 million people.

According to Mr. Wang, "If Xiaoice is speaking, unless there is obvious instructions from the user, the talk will not be interrupted by noise or a bit of conversation.Similarly, Xiaoice will charge the robot vacuum cleaner If some IoT task is being executed, the user's "explicit voice" such as "Well" or "Ah ~" will be passed through. You can see that it is possible to distinguish between things that are not.

In addition, the speech recognition system aiming at natural communication between computer and human by Microsoft is not only to make it easier to do things, but also to encourage more casual conversation with speech recognition assistant It seems there is.

in Software,   Video, Posted by logu_ii