OpenAI has initiated the rollout of a new, advanced voice mode for ChatGPT, aiming to make its artificial intelligence (AI) experience more lifelike and interactive. This release follows significant backlash and subsequent delays, with the feature now being made available to a select group of ChatGPT Plus users. The advanced voice mode introduces hyperrealistic audio responses from GPT-4o, marking a substantial enhancement in the AI’s capabilities.
Initially, the voice mode faced heavy criticism for its striking resemblance to Hollywood actor Scarlett Johansson, particularly her character Samantha in the movie “Her.” This controversy led to a delay in the launch, pushing the rollout from the end of May to late July. For now, the alpha version will be accessible to a small group of ChatGPT Plus users, with a broader rollout to all Plus users expected in the fall of 2024.
Unlike previous iterations of ChatGPT, which required transcribing spoken questions into text before generating responses, the new voice mode leverages OpenAI’s advanced AI model to directly process and understand audio inputs. This advancement facilitates a more seamless and efficient voice interaction experience, eliminating the need for intermediate text conversion. According to an official video posted on Instagram, the sophisticated voice recognition technology allows the bot to identify and interpret multiple speakers, sense emotional nuances in their tone, and adjust its replies to reflect a deeper understanding of their feelings, thereby creating a more human-like and empathetic interaction.
The new voice mode will initially feature four preset voices – Juniper, Breeze, Cove, and Ember – developed in collaboration with professional voice actors. The most controversial voice, Sky, which drew comparisons to Scarlett Johansson, appears to have been removed from the system. OpenAI spokesperson Lindsay McCallum clarified that ChatGPT cannot impersonate other people’s voices, whether individuals or public figures, and will block outputs that deviate from these preset voices.
The gradual rollout of the new voice feature allows OpenAI to closely monitor its usage and ensure safety. Users in the alpha group will receive alerts in the ChatGPT app, followed by email instructions on how to utilize the new feature. Prior to this release, OpenAI conducted extensive testing of GPT-4o’s voice features with over 100 external experts, covering 45 languages, to identify potential safety risks and areas for improvement. A comprehensive report detailing these efforts is scheduled for release in early August.
To avoid copyright infringement issues, OpenAI has implemented filters that restrict GPT-4o from generating copyrighted audio, including music. This measure is particularly important as AI-generated music has already faced legal challenges from record labels, and GPT-4o’s capabilities could attract similar scrutiny.
The rollout of the advanced voice mode follows months of controversy. In May, OpenAI’s unveiling of GPT-4o’s voice feature stunned audiences with its human-like tone and rapid responses, which many found eerily similar to Scarlett Johansson’s voice in “Her.” The resemblance led to legal action from the actor, prompting OpenAI to deny using Johansson’s voice and eventually remove the Sky voice from its demo. In June, OpenAI announced a delay in the release of the advanced voice mode to enhance its safety measures.
OpenAI’s proactive approach in addressing these issues and its commitment to improving the safety and functionality of its AI products underline its dedication to advancing AI technology responsibly. As the new voice mode rolls out, users can look forward to a more interactive and human-like experience with ChatGPT, bringing the technology one step closer to making sci-fi a reality.
