Deep learning develops voice conversion technology that allows anyone to easily become the voice of 'Yuzuki Yukari'



VOICEROID , a voice synthesis software for reading aloud from AH-Software, is often used for video narration and robot utterances because it allows the software to read out your favorite words. However, setting VOICEROID is a little troublesome and time-consuming. Therefore, Hiho , an engineer of Dwango Media Village , announced a voice change technology that anyone can easily convert the voice recorded with a microphone to the voice of ' Yuzuki Yukari ' of VOICEROID on Nico Nico Douga.

I tried to make a voice conversion technology that anyone can become Yukari's voice with the power of deep learning --Nico Nico Douga



This time, Mr. Hiho has developed a technology that can convert anyone's voice into Yukari's voice with high quality.



In the movie, you can actually hear Hiho's voice converted to Yukari's voice.



'Converting voice' means converting the input voice data to another voice data, and it seems that the mainstream is to convert it by an algorithm by deep learning these days.



However, conventional voice conversion converts the waveform of the input voice by bringing it closer to the target voice data, but for that purpose, the algorithm must learn the conversion for each input voice, which is not efficient.



However, Hiho's technology first decomposes the voice into pitch (pitch pitch) and phoneme (pronunciation unit), and then converts only the pitch and resynthesizes it. The original VOICEROID system, which creates voice from text, required careful adjustment to add intonation and emotion to the voice, but this method requires a huge amount of learning data for any voice. It is possible to convert voice without any.



You can see in the movie how Hiho actually converts the live voice into Yukari's voice while playing the popular battle royale shooting 'Apex Legends'.



Mr. Hiho says that this technology can be applied to the production of self-made animation, activities like VTuber, TRPG replay, commentary video, etc. However, at the time of writing the article, there are no plans to distribute this voice conversion system.



Originally, voice conversion needs to trade off 'real-time property' and 'voice quality', and since this technology pursues voice quality, the real-time property of inputting voice and converting immediately is weak. .. Even so, it is a big attraction that anyone can easily get Yukari's voice because there are some things that require some experience and tips to set the voice of VOICEROID without discomfort.

In addition, Mr. Hiho is also developing an AI voice changer 'Seiren Voice' that applies a similar system. Seiren Voice is also a method of extracting phonemes from input voice and then synthesizing the voice with an algorithm, and it is characterized by not requiring a huge amount of learning data or re-learning for deep learning.

If you try using an AI voice changer that anyone can convert to 100 types of voice, it looks like this --GIGAZINE



in Software,   Video, Posted by log1i_yk