Apr 16, 2024 07:00:00

'VoiceCraft' can synthesize voice from just a few seconds of voice data

A research team led by the University of Texas at Austin has announced ' VoiceCraft ,' an AI capable of

zero-shot audio editing and voice synthesis, performing tasks not included in the training data.

VoiceCraft
https://jasonppy.github.io/VoiceCraft_web/

The newly announced 'VoiceCraft' is a neural codec language model inspired by multimodal models of text and images, enabling zero-shot text-to-speech output, speech synthesis, and speech editing.

VoiceCraft allows you to edit voices in a very natural way. First, here is the original voice saying, 'but the renaissance broke their monopoly on knowledge, one of the most important bastions of the church.'

Next, the audio edited with VoiceCraft is as follows. The content of the audio is 'But the renaissance broke their monopoly on knowledge, with it's free movement of research and endless scientific inquiry , one of the most important bastions of the church.' The part in bold is the part added by VoiceCraft.

VoiceCraft is available on GitHub and Hugging Face, so you can actually try it out for yourself.

GitHub - jasonppy/VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
https://github.com/jasonppy/VoiceCraft

VoiceCraft - a Hugging Face Space by pyp1
https://huggingface.co/spaces/pyp1/VoiceCraft_gradio

So I decided to try out 'VoiceCraft' available on Hugging Face. When you click on the URL above to access it, you will see the following screen.