Dec 03, 2024 20:30:00

'AivisSpeech' review that allows anyone to easily synthesize emotional voices completely free of charge

AivisSpeech , provided by the Aivis Project, a speech synthesis AI project developed and operated by JPChain, is a free speech synthesis software that anyone can use easily. It can synthesize emotional, high-quality, and high-quality speech that is indistinguishable from a human voice, and can be used freely without credit for personal, corporate, or commercial purposes, so I actually tried using it.

Aivis Project | Why not try out AivisSpeech, which allows you to easily synthesize emotive speech?

https://aivis-project.com/

The Aivis Project's mission is to create emotive speech synthesis technology that anyone can use easily.

We are offering the service completely free of charge because we want everyone to know about this amazing technology.

We will be expanding the features at a rapid pace! Give it a try!
— Hayato Omori / Aivis Project (@hayato_omr) November 19, 2024

Visit the Aivis Project official website and click 'Download AivisSpeech.'

This time, I downloaded the Windows installer version. The installer is in EXE format and the file size is 1.25MB.

Run the downloaded installer and the setup wizard will start. Click 'Next'.

Click Next.

Select the installation location. Click 'Next'.

Click Install.

When the installation is complete, click 'Finish'.

AivisSpeech has started. It takes about 30 seconds for the speech synthesis engine to start up.

The license information will be displayed, so click 'Agree and Start' in the upper right.

The privacy policy is displayed and you are asked to cooperate with data collection. In this case, select 'Allow.'

The AivisSpeech screen looks like this. Click on the input field.

Then, the following UI was displayed. Just enter the text you want to be read out in the input field.

This time, I have pasted the opening part of Natsume Soseki's '

I Am a Cat .' The text is made up of multiple sentences, but the lines are separated by punctuation marks.

The following movie shows the text being read aloud as it is. Although no adjustments were made to the accent or pitch, the reading is smooth with almost no sense of incongruity. The speech is smooth and there is almost no unnatural accent.

'AivisSpeech', which allows anyone to easily synthesize emotional voices, read out 'I am a cat' - YouTube

On the right side were slides that allowed you to adjust the speech rate, strength of style, tempo, pitch, volume, and the amount of silence at the beginning and end.

Also, depending on the voice model that is reading the text, multiple emotions are available, and by clicking the icon next to the line, you can set the emotion for each voice without having to go through complicated settings.

At the bottom of the screen, you can edit the Japanese pronunciation and accent of the audio.

The synthesized audio can be exported in WAVE format by selecting 'Export Audio' under 'File.'

If you select 'Export Audio', each line of dialogue will be exported as a separate audio file.

If you want an audio file with all the lines connected in order, select 'Connect and export audio.'

An audio recording has been saved of someone reading the opening of 'I Am a Cat' in one go.

To change the voice model that reads aloud, select 'Speaker List' from 'Settings'.

The Aivis Project proposes

the AIVM (Aivis Voice Model)/AIVMX format as an open file format for AI speech synthesis models that combines trained models, hyperparameters, style vectors, and speaker metadata into a single file, and AivisSpeech can install models in this AIVM format. By default, 'Anneli' is installed as a speech synthesis model in AivisSpeech, and you can click 'Speech Synthesis Models' to search for other voice models.

The browser will open and a list of AIVM format speech synthesis models will be displayed. This time, we will install 'Rotejin (Elder Voice)'.

Click 'Download' on the right to download the speech synthesis model in AIVM format (file size: 252.93MB).

Return to the AivisSpeech window and click 'Install/Update' in the upper right corner.

Select 'Install from File', select the downloaded speech synthesis model in the input field below, and click 'Install/Update'.

When the installation is complete, click 'Close'.

By clicking the icon displayed to the left of the audio, you can change the speaker to 'Rotejin (Elder Voice).'

The following movie shows Anneli Torotejin taking turns reading 'I Am a Cat'.

Using the voice synthesis software 'AivisSpeech', a beautiful girl and an elder read 'I am a cat' alternately - YouTube

When I actually tried using AivisSpeech, I felt that the quality of the synthesized voice was quite high, and it was comparable to commercial software, even though it was completely free. The UI is very simple and easy to use, so even people who have never used voice synthesis software can easily use it.

In addition, the AIVM format is an open file format, and at the time of writing, the Aivis Project is developing 'AivisBuilder', a free tool that can create AIVM format speech synthesis models. In addition, the AIVM Generator, which allows you to easily create and edit AIVM files on your browser, is also available.

Related Posts:

Dec 03, 2024 20:30:00 in Review, Software, Video, Posted by log1i_yk