OpenAI releases 'Voice Engine,' an AI model that can generate cloned voices from just 15 seconds of audio



OpenAI now offers limited access to Voice Engine , an AI model that can create synthetic speech from just 15 seconds of audio samples. Voice Engine is a text-to-speech generation tool that can read out text input in various languages, not just the same language as the voice sample used to create the synthesized voice.

Navigating the Challenges and Opportunities of Synthetic Voices
https://openai.com/blog/navigating-the-challenges-and-opportunities-of-synthetic-voices



OpenAI built a voice cloning tool, but you can't use it… yet | TechCrunch
https://techcrunch.com/2024/03/29/openai-custom-voice-engine-preview/

OpenAI details Voice Engine speech generation AI - SiliconANGLE
https://siliconangle.com/2024/03/29/openai-details-voice-engine-speech-generation-ai/

OpenAI's voice cloning AI model only needs a 15-second sample to work - The Verge
https://www.theverge.com/2024/3/29/24115701/openai-voice-generation-ai-model

Below is an example of a 15 second voice sample that is first input to Voice Engine.


The following is a synthetic voice created by Voice Engine based on this voice sample, which reads various texts aloud.

'Some of the most amazing habitats on Earth are found in the rainforest. A rainforest is a place with a lot of precipitation and it has many kinds of animal trees and other plants. Tropical rainforests are usually not too far from the equator and are warm (Some of the most amazing habitats on Earth are found in tropical rainforests. Rainforests are places with high rainfall and are home to a wide variety of animals, trees, and other plants. , it is not far from the equator and is warm all year round.”


“This story has been told and retold for thousands of years. What is the central message that it is teaching? ?)'


“Salt also makes sure we stay hydrated which means there is enough water in our body for it to properly function. means there is enough water in the body.)


'Let's make the parts the same by adding one to three!'


'Have you ever wondered why a soccer ball soars through the air the way it does or how a skateboarder manages to stay on their board while flipping it? It's all about the science of how objects move called physics. First the push you give off the ground is the force that gets you going. Then as you speed up gravity another natural force pulls you down the hill. Finally when you brake the force of friction between the bike's brake pads and the tires slows you down. Have you ever wondered how skateboarders fly through the air or how they stay on their board? These are all about the science of how objects move, called physics. First, the force pushing against the ground is the force that pushes you forward. Then, as gravity accelerates, another natural force pulls you down the hill. Finally, when you apply the brakes, The friction between the bicycle's brake pads and the tires slows it down.)


According to OpenAI, the company started developing voice AI models in late 2022, and the technology is already being used to power preset voices such as text-to-speech APIs and ChatGPT's text-to-speech functionality.

In an interview with TechCrunch, an overseas technology media, Jeff Harris, a member of OpenAI's Voice Engine product team, said about Voice Engine, ``We train using a combination of licensed data and publicly available data.'' ” reveals.

At the time of writing, the companies that have access to Voice Engine include educational technology company Age of Learning , AI video creation tool HeyGen , healthcare software maker Dimagi , AI communication app developer Livox , and healthcare system. It is said that only some companies, such as developer Lifespan , are limited to it, and OpenAI explained to TechCrunch that ``only about 10 developers have access to Voice Engine.''

OpenAI explained why access to Voice Engine is limited to a few companies: 'Due to the potential for malicious use of synthesized voices, we are taking a cautious and informed approach towards widespread release. ”

In addition, in January 2024, an incident occurred in which a spam call was made to a voter using a ``fake voice of President Joe Biden'' created by AI, so the U.S. government will suppress the unethical use of AI voice technology. The Federal Communications Commission (FCC) has declared robocalls using AI voices to be illegal.

FCC declares ``Use of AI voice for robocalls is illegal'' - GIGAZINE



According to OpenAI, Voice Engine requires partner companies with access rights to agree to a policy of ``not to use Voice Engine to impersonate individuals or organizations without their consent.'' OpenAI also requires partner companies to obtain the 'explicit informed consent' of the original speaker, so that rather than building a way for individual users to create their own voices, It is necessary to disclose to the user that it is generated by AI.

In addition, in order to reduce risks arising from the use of AI voice tools, OpenAI has implemented policies such as ``phasing out voice recognition systems for accessing bank accounts'' and ``policies to protect the use of people's voices in AI''. The recommendations include 'formulation of AI voice tracking system,' 'strengthening education on AI deepfakes,' and 'development of an AI voice tracking system.'

in Software, Posted by logu_ii