OpenAI Launches the Advanced Voice Mode… With a Catch
OpenAI has launched the new Voice Mode for a small group of ChatGPT Plus users. Learn about the features available and how to access Voice Mode in alpha.
- OpenAI has launched the much-anticipated Advanced Voice Mode (AVM). However, it is available to a small group of ChatGPT Plus users.
- The new version is expected to be better than the older version. However, only a limited number of features will be available in alpha.
In its Spring Launch event in May, OpenAI, the creator of ChatGPT, demonstrated the new Voice Mode on ChatGPT that leveraged GPT-4o’s audio and video capabilities. The artificial intelligence (AI) research company has launched this much-anticipated advanced Voice Mode. However, it is available only to a small group of people.
The company announced on an X post that it was releasing Voice Mode in alpha to a small group of ChatGPT Plus users and would offer them a smarter voice assistant that could respond to emotions or be interrupted.
We’re starting to roll out advanced Voice Mode to a small group of ChatGPT Plus users. Advanced Voice Mode offers more natural, real-time conversations, allows you to interrupt anytime, and senses and responds to your emotions. pic.x.com/64O94EhhXK
— OpenAI (@OpenAI) July 30, 2024
See more: Apple Intelligence Is Here as Part of iOS 18.1, iPadOS 18.1, and macOS Sequoia Developer Beta
What Is Voice Mode?
Voice Mode is a smart voice assistant that allows users to have back-and-forth conversations with ChatGPT. The voice capability is powered by a text-to-speech model that generates a human-sounding voice. However, the earlier Voice Mode came under fire, especially when actress Scarlett Johansson threatened to sue the company for using her voice without her consent.
The new Voice Mode is expected to be supercharged with GPT-4o’s video and audio capabilities, with better performance and capabilities. For example, the earlier version had latencies of 2.8 seconds (for GPT-3.5) and 5.4 seconds (for GPT-4) as it used three separate models.
The new version uses a single model end-to-end across audio, vision, and text, implying that all inputs and outputs are processed by a single neural network. The company said that the voice assistant will have four preset voices, and it is working on pausing the use of Sky voice type. Moreover, the company has added new filters to ensure the software can refuse requests to generate music or other forms of copyrighted content.
The new Voice Mode can assist with content on users’ screens and respond using the phone camera as context. That said, Voice Mode in alpha will not have these features. According to the company, screen and video-sharing capabilities will be launched later.
OpenAI said it will consider user feedback to improve the model further. It will also share a detailed report regarding GPT-4o’s performance in August, including safety evaluations and limitations.
How to Access Voice Mode Alpha
As posted by OpenAI, the alpha is available to only a few ChatGPT Plus users. You can become a ChatGPT Plus subscriber by paying $20 a month. That said, if you are selected for alpha, you will receive an email with instructions and a message in the mobile app. If you haven’t received a notification, you needn’t worry; the company will continue adding users on a rolling basis.
The real-time Voice Mode will be widely available for ChatGPT Plus users in the fall.
MORE ARTIFICIAL INTELLIGENCE (AI) NEWS
- Canva Acquires Leonardo AI to Bring Visual AI to Content Creators
- Microsoft and OpenAI Incorporate Artificial Intelligence Features in Search Engines
- Meta Launches New Llama AI Model to Rival Google, OpenAI
- OpenAI Launches Cost-Effective GPT-4o Mini and Enhanced Integrations for Enterprise Customers