Facebook engineers develop AI that speaks like Bill Gates


by

Gratisography

The development of AI technology is progressing at a rapid pace, and even a technology that can correct the audio and video in a movie just by correcting the text is being developed. Based on voice samples spoken by people from Facebook engineers, AI was developed to create conversational voices for the person. Examples of such voices are 'Voice that sounds like Microsoft founder Bill Gates is speaking'. doing.

MelNet: A Generative Model for Audio in the Frequency Domain-1906.01083.pdf
(PDF file) https://arxiv.org/pdf/1906.01083.pdf

MelNet-Audio Samples
https://audio-samples.github.io/

Listen to this AI voice clone of Bill Gates created by Facebook's engineers-The Verge
https://www.theverge.com/2019/6/10/186985987/ai-voice-clone-bill-gates-facebook-melnet-speech-generation



Any audio file embedded below sounds like Bill Gates is speaking. The words 'He said the same phrase thirty times (he repeated the same phrase 30 times)' or ...


The words 'Two plus seven is less than ten (less than two plus seven is less than ten)' can be heard in the voice of Bill Gates.


This is a movie where Bill Gates is speaking TED. The above sounds seem to be Bill Gates's voice, even if you compare it.

Bill Gates: Teachers need real feedback | TED Talk


The voice generated by Bill Gates, an AI-generated ' MelNet ' developed by a Facebook engineer, uses voice samples taken from this TED speech. Because MelNet analyzes spectrograms of audio samples, it is excellent for capturing the 'high-level structure' of audio.

For example, the following voice sample consists of the first 5 seconds of the voice of a data set spoken by a real person, and the second 5 seconds of the voice generated by MelNet. It is very difficult to notice that humans and AI are being replaced in the first half and the second half, even if you actually hear it.






MelNet uses TED's speech as training data. In addition to Bill Gates, Daphne Koller , Faye Faye Lee , George Takei , Jane Goodall , Sal Khan , Stephen You can listen to the sound made from Wolfram and Steven Hawking 's voice samples on the official site .



For example, this is a TED speech that will be Mr. Takei's voice sample.

George Takei: Why I love a country that once betrayed me | TED Talk



When you listen to the audio data generated by MelNet, you can hardly understand the difference.


Also, the speech resembling Hawking, who used speech and speech with synthetic speech, looks like synthetic speech.


The limitation of MelNet is that it can not capture the changes that occur as human beings speak for a long time. You can not change tension or put emotions in specific places depending on paragraphs or text, and you are maintaining speech consistency at a superficial level.

Although MelNet is an AI that can also generate music, when I listened to it, it is an avant-garde performance too, and it seems that music can not be generated as well as human voice.




in Software,   Science,   Video, Posted by log1h_ik