'VoiceCraft' can synthesize voice from just a few seconds of voice data

A research team led by the University of Texas at Austin has announced ' VoiceCraft ,' an AI capable of

zero-shot audio editing and voice synthesis, performing tasks not included in the training data.


The newly announced 'VoiceCraft' is a neural codec language model inspired by multimodal models of text and images, enabling zero-shot text-to-speech output, speech synthesis, and speech editing.

VoiceCraft allows you to edit voices in a very natural way. First, here is the original voice saying, 'but the renaissance broke their monopoly on knowledge, one of the most important bastions of the church.'

Next, the audio edited with VoiceCraft is as follows. The content of the audio is 'But the renaissance broke their monopoly on knowledge, with it's free movement of research and endless scientific inquiry , one of the most important bastions of the church.' The part in bold is the part added by VoiceCraft.

VoiceCraft is available on GitHub and Hugging Face, so you can actually try it out for yourself.

GitHub - jasonppy/VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

VoiceCraft - a Hugging Face Space by pyp1

So I decided to try out 'VoiceCraft' available on Hugging Face. When you click on the URL above to access it, you will see the following screen.

Although the demo audio has already been input, I decided to use another

demo audio published in the GitHub repository. To do so, just click the button in the red frame below and upload the audio file.

Click 'Transcribe' to transcribe the audio.

Next, enter the text you want to read in the 'Text' field and click 'Run.' For the prompt, I used a passage from the famous

speech by Martin Luther King .

The resulting audio can be played or downloaded using the buttons in the red frame below.

Let's compare the two. First, here is the original audio that was uploaded:

Next, here is the audio created by VoiceCraft:

in Review,   Software, Posted by log1l_ks