A voice changer that converts your voice into a beautiful girl voice or a handsome voice is a great thing when you live stream or post a movie. However, the voice that can be converted by the voice changer is fixed, and it is difficult to find a voice changer that can be converted to your favorite voice. Mr.

Tennozu Isle explains how to solve this problem using AI, and has released a voice changer ' MMVC ' that can convert your voice to your favorite voice in real time with the power of AI.

With the advent of VRChat, anyone can use their favorite avatar to become their favorite character. In addition, there are many types of voice changers that can convert your own voice into a beautiful girl voice or a handsome voice. However, existing voice changers are 'unsuitable for conversation and live distribution because real-time conversion is not possible' and 'prepared in advance'. It can only be converted to a specific voice. '

Some existing voice changers enable low-latency real-time conversion, but using those voice changers 'makes the voice similar to other voice changer users' 'mechanical' There is a problem such as 'Noisy noise is generated'.

A voice changer has also been developed that realizes the function of 'converting to a favorite voice instead of a prepared voice' by AI, but with a voice changer using AI, 'time is required to learn AI' and 'practical use'. There are problems such as 'not reaching the standard quality' and 'a large amount of voice learning data is required'. The speech synthesis technology ' VITS ' is introduced by Mr. Tennozu Isle as a technology that can solve these problems.

VITS is a speech synthesis method

announced in June 2021. Existing speech synthesis methods cannot be used for voice conversion, but VITS can be used for voice conversion. Furthermore, for the development of real-time voice changers such as 'quality that is almost comparable to actual voice', 'learning with a small amount of data', and 'data for about 41 seconds per second can be converted even with high-quality voice of 48,000 kHz'. It has the necessary features.

Tennozu Isle is developing a voice changer using this VITS. At this time, since the audio data is divided into 8192 samples (corresponding to about 0.34 seconds) and processed, a delay of about 0.34 seconds will occur.

The voice changer developed using the above technology is ' MMVC '. You can watch the demo of Tennozu Isle live streaming using MMVC from about 4 minutes 32 seconds of the following movie.

MMVC is distributed separately as 'File collection necessary for machine learning' and 'Client software for actually converting voice'. To convert voice, 'Machine learning is performed' from the following page. You need to download the 'Files Needed for' and let AI learn the voice by following the steps provided.

GitHub --isletennos / MMVC_Trainer: Real-time voice changer with AI (Trainer)

When I hear machine learning, I feel that it requires specialized equipment and specialized knowledge, but MMVC is distributed in a format (notebook format) that summarizes the procedure for using Google's AI platform '

Colaboratory '. Anyone can do machine learning just by following the steps. In addition, 'voice data containing the voice you want to learn' and 'manuscript data of voice data' are required for learning, but ' JVS (Japanese versatile speech) corpus ' and ' ITA corpus' are available for free on the Internet. Data sets including audio and manuscripts such as ' multimodal database ' and ' Tsukuyomi-chan corpus ' can be used.

On the following page, you can compare the original audio (jvs001) and the audio converted by MMVC (target). In addition, you can listen to the conversion results of MMVC that learned the voices of 'Zundamon' and 'Tsukuyomi-chan'.


In addition, questions about MMVC are being accepted on the official Discord channel and Mr. Tennozu Isle's Twitter .

