OpenAI, the company behind ChatGPT, announced on Friday (29) an audio cloning tool that it plans to keep under strict control until security measures are implemented to prevent audio spoofing aimed at deceiving listeners.
The model, called Voice Engine, can virtually replicate someone's speech based on a 15-second sample, according to A OpenAI blog postwhich shares the results of a small-scale test.
“We recognize that generating speech that sounds like people's voices presents serious risks, which are especially highlighted in an election year,” the San Francisco-based company said.
“We are working to engage local and international partners from government, media, entertainment, education, civil society and other sectors to ensure their feedback is incorporated during construction,” he added.
Disinformation researchers fear widespread use of AI-powered software in a crucial election year, thanks to the proliferation of cheap, easy-to-use and difficult-to-trace audio cloning tools.
Acknowledging these issues, OpenAI said it is “taking a cautious and informed approach to a broader rollout due to the potential for misuse of synthetic voices.”
This cautious announcement comes just months after a political consultant working on the presidential campaign of an unlikely candidate, Joe Biden's Democratic rival, claimed responsibility for a robocall in which he pretended to be the US president.
The AI-generated call, the brainchild of Democratic Congressman Dean Phillips' advisor, included what appeared to be the voice of Biden urging people not to vote in the New Hampshire primary in January.
This incident has alarmed experts who fear a flood of “deep fake” misinformation using artificial intelligence in the 2024 US presidential race, as well as other major elections around the world.
OpenAI said its partners testing the audio engine have agreed on rules including the need for explicit, informed consent from anyone whose voice is replicated.
The company added that it should also be clear to the public when the sounds they hear are generated by artificial intelligence.
“We have implemented a range of security measures, including a watermark to track the origin of any audio generated by the audio engine, as well as proactive monitoring of how it is used,” the company said.
“Incurable thinker. Food aficionado. Subtly charming alcohol scholar. Pop culture advocate.”
More Stories
What ChatGPT knows about you is scary
The return of NFT? Champions Tactics is released by Ubisoft
What does Meta want from the “blue circle AI” in WhatsApp chats?