If you try using an AI voice changer that anyone can convert to 100 types of voice, it looks like this



Dwango Media Village, which conducts research and development and service application centered on machine learning, has announced a voice conversion system that can convert anyone's voice into various voices.

Voice conversion system that can convert anyone's voice into 100 voices --Dwango Media Village (dmv)
https://dmv.nico/ja/articles/seiren_voice/

'Seiren Voice', which is a demo version of the voice conversion system announced by Dwango Media Village, is available on the following site.

Seiren Voice (AI voice changer)
https://seiren-voice.dmvnico/

This time I tried to access the site from Firefox. You can record your own voice by clicking 'Record'.



A pop-up will appear asking for permission to use the microphone device. Click 'Allow' to start recording immediately. This time, I read aloud Matsuo Basho's famous phrase, '

The voice of a 蝉 that penetrates into a quiet rock .' In addition, the demo version could not convert very long audio, and it seemed that it was better to keep it within about 5 seconds.



When the recording is finished, the waveform of the recorded voice will appear to the right of the record / play button.



Then enter the text you read aloud. It is said that the accuracy will be improved if the reading point is added according to the voice.



There are 100 types of voices, from high to low, and voice conversion can be performed for eight people at the same time. You can freely choose from 100 types, but this time I decided to convert with the preset 8 people. Click 'Start Conversion'.



When the conversion starts, the analysis result is displayed first. For the input voice, the

phoneme conversion result and the intonation detection result are shown in the figure.



After waiting for about 40 seconds, the video will be displayed in the conversion result. If you play it on the spot, you can hear your own voice and 8 types of conversion results.



You can hear the conversion result from the following. The quality depends on the voice quality, but the converted voice is very smooth, and it is almost the same as the kerokero voice that is common in real-time voice changers.

When you actually convert your voice with an AI voice changer that anyone can convert to 100 different voices, it looks like this-YouTube


The video can be downloaded in MP4 format.



Also, if you click 'Display conversion results individually', a bar will be displayed in which each audio can be played individually.



Voice changers have a trade-off between 'real time' and 'conversion quality'. In general, there are many voice changers that emphasize real-time, and there are few voice changers that prioritize quality, and Dwango Media Village aimed to 'convert anyone's voice into the voice of various people.' It was the development of a voice changer.

The voice conversion system developed by Dwango Media Village is not a voice changer that tends to require real-time performance, but an algorithm that converts the input voice. However, instead of performing voice conversion directly with the deep learning model, the method of 'extracting phonemes and pitches from the input voice, changing the pitch, and then synthesizing the voice from the phonemes and pitches with an algorithm' is used. It is said that it is taken. As a result, deep learning of the algorithm does not require a huge amount of training data or re-learning.

However, with this voice conversion system, it seems difficult to convert voices that cannot be expressed by phonemes such as laughter, so if you continue your research while keeping in mind what kind of application it can be applied to entertainment, Dwango Media Village Says.

in Review,   Web Application,   Video, Posted by log1i_yk