Artificial Intelligence

OpenAI Launches the Advanced Voice Mode… With a Catch

OpenAI has launched the new Voice Mode for a small group of ChatGPT Plus users. Learn about the features available and how to access Voice Mode in alpha.

Karthik Kashyap

August 1, 2024

OpenAI has launched the much-anticipated Advanced Voice Mode (AVM). However, it is available to a small group of ChatGPT Plus users.
The new version is expected to be better than the older version. However, only a limited number of features will be available in alpha.

In its Spring Launch event in May, OpenAI, the creator of ChatGPT, demonstrated the new Voice Mode on ChatGPT that leveraged GPT-4o’s audio and video capabilities. The artificial intelligence (AI) research company has launched this much-anticipated advanced Voice Mode. However, it is available only to a small group of people.

The company announced on an X post that it was releasing Voice Mode in alpha to a small group of ChatGPT Plus users and would offer them a smarter voice assistant that could respond to emotions or be interrupted.

We’re starting to roll out advanced Voice Mode to a small group of ChatGPT Plus users. Advanced Voice Mode offers more natural, real-time conversations, allows you to interrupt anytime, and senses and responds to your emotions. pic.x.com/64O94EhhXKOpens a new window

— OpenAI (@OpenAI) July 30, 2024Opens a new window

See more: Apple Intelligence Is Here as Part of iOS 18.1, iPadOS 18.1, and macOS Sequoia Developer Beta

What Is Voice Mode?

Voice Mode is a smart voice assistant that allows users to have back-and-forth conversations with ChatGPT. The voice capability is powered by a text-to-speech model that generates a human-sounding voice. However, the earlier Voice Mode came under fire, especially when actress Scarlett Johansson threatened to sue the company for using her voice without her consent.

The new Voice Mode is expected to be supercharged with GPT-4o’s video and audio capabilities, with better performance and capabilities. For example, the earlier version had latencies of 2.8 seconds (for GPT-3.5) and 5.4 seconds (for GPT-4) as it used three separate models.

The new version uses a single model end-to-end across audio, vision, and text, implying that all inputs and outputs are processed by a single neural network. The company said that the voice assistant will have four preset voices, and it is working on pausing the use of Sky voice type. Moreover, the company has added new filters to ensure the software can refuse requests to generate music or other forms of copyrighted content.

The new Voice Mode can assist with content on users’ screens and respond using the phone camera as context. That said, Voice Mode in alpha will not have these features. According to the company, screen and video-sharing capabilities will be launched later.

OpenAI said it will consider user feedback to improve the model further. It will also share a detailed report regarding GPT-4o’s performance in August, including safety evaluations and limitations.

How to Access Voice Mode Alpha

As posted by OpenAI, the alpha is available to only a few ChatGPT Plus users. You can become a ChatGPT Plus subscriber by paying $20 a month. That said, if you are selected for alpha, you will receive an email with instructions and a message in the mobile app. If you haven’t received a notification, you needn’t worry; the company will continue adding users on a rolling basis.

The real-time Voice Mode will be widely available for ChatGPT Plus users in the fall.

MORE ARTIFICIAL INTELLIGENCE (AI) NEWS

artificial intelligence

Karthik Kashyap

opens a new window opens a new window

Karthik comes from a diverse educational and work background. With an engineering degree and a Masters in Supply Chain and Operations Management from Nottingham University, United Kingdom, he has experience of close to 15 years having worked across different industries out of which, he has worked as a content marketing professional for a significant part of his career. Currently, as an assistant editor at Spiceworks Ziff Davis, he covers a broad range of topics across HR Tech and Martech, from talent acquisition to workforce management and from marketing strategy to innovation. Besides being a content professional, Karthik is an avid blogger, traveler, history buff, and fitness enthusiast. To share quotes or inputs for news pieces, please get in touch on [email protected]