"WaveNet" speaking natural human-like sound with deep learning is installed in new hardware with Google Assistant



Google's Artificial Intelligence Development DivisionDeep Mind"We have been developing the artificial neural network to generate artificial voice"WaveNet"Was installed in Google's voice assistant" Google Assistant. " This will make natural artificial sound available in two languages, English and Japanese.

WaveNet launches in the Google Assistant | Deep Mind
https://deepmind.com/blog/wavenet-launches-google-assistant/

The Google Assistant, powering our new family of hardware
https://www.blog.google/products/assistant/google-assistant-powering-our-new-family-hardware/

WaveNet, which Deep Mind has been developing, is a technology that raises the level of artificial speech generation up a notch. In the sample collection below, existing TTS voice and audio sample by WaveNet are released, it is possible to compare English 3 voice patterns and Japanese voice 1 pattern between old and new. (There is a comparison of a total of 4 patterns.If you use a smartphone, you can check it by scrolling)




The biggest point of WaveNet and existing artificial speech is its generation technology. In the text-to-text (TTS) technology used in existing artificial speech, it was a technique to prepare a large number of speech databases that were basically shredded and then join these words together, In WaveNet, using the deep learning technique by the convolution neural network, it analyzes the sampled human voice waveform finely and analyzes it to generate a voice close to a natural utterance.


WaveNet realizes natural pronunciation and accent, and intonation of the whole sentence by reproducing the sound more closely to humans, but since we use the latest neural network technology, WaveNet up to 0.2 seconds It took about 1 second to generate speech. In other words, it took me five times as much time to talk to prepare in advance, which was not quite practical. So DeepMind developed a new speech synthesis model in about 12 months. As a result, we have succeeded in improving the performance of synthesizing 20 seconds of sound in 1 second, that is, 1000 times faster than in the initial stage.


Also, the level of naturalness of speech is improving. One of methods for human being to evaluate the quality of media such as voice "Average opinion score"(Mean-Option-Score), scores of 4300 points are recorded, both of which exceed the score of the conventional technology. By the way, it is 4667 points of the evaluation point of the voice truly uttered by humans.


Artificial speech using WaveNet is implemented in hardware equipped with Google Assistant such as Google Home Mini, Google Home Max, Pixel phone, Pixelbook / Pixelbook Pen, Pixel Buds etc. In the future it is thought that further introduction to Android smartphone will progress. Research papers by DeepMind can be viewed from the following links.

WAVE NET: A GENERATIVE MODEL FOR RAW AUDIO 1609.03499.pdf
https://arxiv.org/pdf/1609.03499.pdf

in Software,   Web Service, Posted by darkhorse_log