I tried using the software 'VOICE VOX' that automatically synthesizes read-aloud voice from sentences for free

From the text entered by Mr.

Hiho , an engineer of Dwango Media Village , who developed 'AI voice changer that anyone can convert to 100 kinds of voices' and 'technology that anyone can easily become a voice related to Yuzuki by deep learning' We have released the open source software 'VOICEVOX ' that automatically synthesizes the read-aloud voice, so I actually tried using it.


Access the above page and click 'Download'.

VOICEVOX is Google Drive and is distributed in ZIP format. However, if you cannot download due to download restrictions, you can download the ZIP file by accessing the link described in 'If you cannot download'. The size of the ZIP file is about 3.26GB. The capacity of the unzipped folder is about 5.5GB.

Unzip the downloaded 'VOICEVOX-0.1.1-win.zip' with

Explzh or the standard Windows decompression function, and start 'VOICEVOX.exe' inside.

When you start it for the first time, you will be asked if you want to start the engine in CPU mode or GPU mode. According to Mr. Hiho, GPU mode works much more comfortably, but it requires an NVIDIA GPU with 3GB or more of memory. This time it will boot in GPU mode.

The screen looks like this. Voice, which is provided in the article creation point, Shikoku methane and Zunda Mon 2 people. Enter the text you want to read aloud in the blank next to the character's face.

Then, the accent will be displayed like this. The text I entered was a mixture of kanji and numbers, but the reading was recognized without any problems.

In 'Intonation', you can change the way you speak more finely by adjusting the pitch of each sound. Since the height can be freely adjusted for each character, not only standard languages but also intonations peculiar to dialects such as Tohoku dialect and Kansai dialect can be reproduced with considerable effort. Looking at the intonation, the vowels are dropped from the last 'su' of 'Good morning' and only the consonants are pronounced, and the synthesized voice is set to be closer to nature. understand.

In addition, the slide bar on the right column allows you to adjust the speaking speed, pitch, and intonation.

The following movie shows the text you entered read aloud without adjusting any accents or intonation. You can see that the pronunciation is quite natural, although there are some small details that are of concern.

I tried using 'VOICEVOX', a software that automatically synthesizes read-aloud voice from sentences for free --YouTube

Click the + icon at the bottom right to enter another voice. If you click on the face of Shikoku Metan next to the voice ...

Characters can be switched. This time, I will ask 'Zundamon' to speak.

The following is what Shikoku Metan and Zundamon actually read aloud. The intonation has been adjusted a little, but the sound is almost automatically determined. There is a slight overlap of metallic sounds, but it reads aloud so naturally that you can't think it's being synthesized on the spot.

When you ask Shikoku Metan and Zundamon to speak with the software 'VOICEVOX' that automatically synthesizes voices from sentences for free, it looks like this --YouTube

Click 'Export' at the top of the window to export the audio in WAV format to any folder. Since the voice is output for each input line, it seems to be useful even when using a large amount of synthetic voice of the character slowly like the live commentary or the biim system.

According to Mr. Hiho, the characters that can be used in VOICEVOX can also be used for commercial purposes and may be added in the future. However, please note that the terms of use may differ depending on the character. Details are explained in the following movie.

[VOICE VOX] I tried to make text-to-speech software with the power of deep learning --Nico Nico Douga

The source code of VOICEVOX is available on GitHub.

GitHub --Hiroshiba / voicevox

in Review,   Software,   Video, Posted by log1i_yk