OpenAI rolls out voice mode after delaying it for safety reasons

SAN FRANCISCO — ChatGPT maker OpenAI said Tuesday it would begin rolling out its new voice mode to customers, a month after delaying the launch to do more safety testing on the tool.

OpenAI in May showed off the conversational voice mode, which can detect different tones of voice and respond to being interrupted, much like a human. But some researchers quickly criticized the company for showing off an artificial intelligence product that hewed to sexist stereotypes about female assistants being flirty and compliant. Actor Scarlett Johansson alleged the company had copied her voice from the movie “Her,” in which an AI bot develops a romantic relationship with a man.

OpenAI’s records show it worked with a completely different actor, and it pulled the voice, called Sky, from its product. In June, it said it would delay the launch of voice mode to conduct more safety testing. The new voice mode launching Tuesday does not include the Sky voice, an OpenAI spokesperson confirmed.

Tech companies have worked to make conversational AI chatbots for years. Amazon’s Alexa and Apple’s Siri are ubiquitous and used by millions of people to set timers and look up the weather but aren’t capable enough for complex tasks. Now, OpenAI, Google, Microsoft, Apple and a host of other tech companies are trying to use breakthroughs in generative AI to finally build the kind of assistant that has been a fixture of science fiction for decades.

OpenAI’s fans and customers have clamored for the voice mode, with some complaining online when the company delayed the launch in June. The new feature will be available to a small number of users at first, and the company will gradually open it up to all of OpenAI’s paying customers by the fall.

Previous versions of ChatGPT have had the ability to listen to spoken questions and respond with audio by transcribing the questions into text, running them through its AI algorithm, and then reading its text response out loud. But the new voice features are built on OpenAI’s latest AI model, which directly processes audio without needing to convert it to text first. That allows the bot to listen to multiple voices at once and determine a person’s tone of voice, responding differently based on what it thinks the person’s emotions are.

That opens up a whole new set of questions, such as how cultural differences come into play, or whether people might develop relationships with bots that are trained to respond to their emotions in specific ways. OpenAI said it worked with people representing 45 languages and 29 “geographies” to improve the AI model’s capabilities.

Only four unique voices will be available to use, and the tool will block attempts to get the bot to generate voices of real people, the company said.

LEAVE A REPLY Cancel reply