OpenAI Preview New Voice Mimicry and Text Reading Tool

The company had originally intended to give the tool to up to 100 developers through an application procedure.

Last updated: April 1, 2024 12:49 PM

Published March 30, 2024 9:41 PM

Last updated: April 1, 2024 12:49 PM

Published March 30, 2024 9:41 PM

OpenAI is unveiling initial findings from a test for a feature capable of convincingly narrating text in a human-like voice. This highlights a new AI frontier while also raising concerns about deepfake risks.

According to a spokesperson, the business has shared early demos and use cases from Voice Engine, a small-scale preview of its text-to-speech technology, with roughly ten developers so far.

An OpenAI representative stated that the company decided to reduce the release after getting input from various stakeholders, including legislators, business leaders, educators, and creatives. According to the prior press event, the company had originally intended to give the tool to up to 100 developers through an application procedure.

The business stated on its blog on Friday, “We recognize that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year. We are engaging with US and international partners from across government, media, entertainment, education, civil society and beyond to ensure we are incorporating their feedback as we build.”

Other AI technology has been used to create fake voices, like a convincing call impersonating President Joe Biden in January, raising concerns ahead of global elections.

Unlike prior attempts, Voice Engine by OpenAI can generate speech resembling specific individuals, replicating their cadence and intonations with just 15 seconds of recorded audio.

“If you have the right audio setup, it’s basically a human-caliber voice. It’s a pretty impressive technical quality,” OpenAI product lead Jeff Harris stated. Though, Harris noted, “There’s obviously a lot of safety delicacy around the ability to really accurately mimic human speech.

A current OpenAI developer partner, the Norman Prince Neurosciences Institute at Lifespan, utilizes the tool to aid patients in voice recovery. For instance, it helped restore the voice of a young patient impaired by a brain tumor, using her earlier recordings.

OpenAI’s speech model translates generated audio into various languages, benefiting companies like Spotify. They’ve used it to pilot podcast translations for hosts like Lex Fridman. The technology also offers diverse voices for children’s educational content.

In the testing program, OpenAI partners must adhere to usage policies, obtain consent from original speakers, disclose AI-generated voices to listeners, and implement an inaudible audio watermark for identification.

Before potential widespread release, OpenAI seeks feedback from external experts to ensure a global understanding of the technology’s direction. Additionally, OpenAI stated that it believes the software preview “motivates the need to bolster societal resilience” against the difficulties posed by increasingly sophisticated AI technology.