- AiNews.com
- Posts
- ChatGPT Clones User’s Voice During Test, Highlighting AI Risks
ChatGPT Clones User’s Voice During Test, Highlighting AI Risks
Image Source: ChatGPT
ChatGPT Clones User’s Voice During Test, Highlighting AI Risks
OpenAI recently released the system card for its new GPT-4o AI model, which outlines the model’s limitations and safety testing procedures. Among the findings, the document reveals a surprising incident where the model’s Advanced Voice Mode unintentionally mimicked a user’s voice during testing. While OpenAI has safeguards in place to prevent this from happening again, the incident highlights the growing complexity of safely managing AI systems capable of voice imitation.
Understanding Advanced Voice Mode
Advanced Voice Mode is a feature within ChatGPT that allows users to engage in spoken conversations with the AI assistant. This capability relies on the model’s ability to generate voices, including the imitation of authorized voice samples provided by OpenAI.
The Incident: Unintended Voice Imitation
During testing, a section of the system card titled "Unauthorized voice generation" describes an episode where noisy input from a user caused the model to suddenly mimic the user’s voice. This incident, though rare, occurred when the model’s voice generation was inadvertently triggered, leading it to output a voice similar to that of the tester, known as a "red teamer" (a person hired to conduct adversarial testing).
Such an occurrence, where an AI unexpectedly starts speaking in a user’s own voice, could be unsettling. OpenAI has emphasized that it has robust safeguards to prevent this type of unauthorized voice generation, and the incident occurred under specific test conditions before these measures were fully implemented. The example even prompted BuzzFeed data scientist Max Woolf to tweet, "OpenAI just leaked the plot of Black Mirror's next season."
How Voice Imitation Occurred
The incident likely stemmed from the model’s capability to synthesize a wide range of sounds, including voices, based on its training data. GPT-4o can imitate any voice if provided with a short audio clip, which it typically does using an authorized sample embedded in the system’s prompt. However, the incident suggests that audio noise from the user may have been misinterpreted as an unintentional prompt, leading the model to generate the unauthorized voice.
Safeguards and the Future of AI Voice Synthesis
OpenAI has implemented a series of safeguards, including an output classifier that detects unauthorized voice generation, ensuring that the model only uses pre-selected voices. According to OpenAI, this classifier currently catches 100% of meaningful deviations from the system’s authorized voice, minimizing the risk of unauthorized voice imitation.
Independent AI researcher Simon Willison, who coined the term "prompt injection" in 2022, noted that OpenAI’s robust safeguards make it unlikely that the model could be tricked into using an unapproved voice. While the full potential of voice synthesis remains restricted, the technology continues to advance, with other companies like ElevenLabs already offering voice cloning capabilities.
As AI-driven voice synthesis technology evolves, similar capabilities may soon be available to end users, raising both excitement and concerns about the ethical use of such tools.
A Look Ahead
The incident underscores the importance of continuous testing and refinement of AI models, particularly those with the ability to replicate human voices. While OpenAI has implemented strong protections, the broader implications of AI voice imitation will continue to be a topic of discussion as the technology becomes more widely accessible.