OpenAI’s records show it worked with a completely different actor, and it pulled the voice, called Sky, from its product. In June, it said it would delay the launch of voice mode to conduct more safety testing. The new voice mode launching Tuesday does not include the Sky voice, an OpenAI spokesperson confirmed.
Tech companies have worked to make conversational AI chatbots for years. Amazon’s Alexa and Apple’s Siri are ubiquitous and used by millions of people to set timers and look up the weather but aren’t capable enough for complex tasks. Now, OpenAI, Google, Microsoft, Apple and a host of other tech companies are trying to use breakthroughs in generative AI to finally build the kind of assistant that has been a fixture of science fiction for decades.
OpenAI’s fans and customers have clamored for the voice mode, with some complaining online when the company delayed the launch in June. The new feature will be available to a small number of users at first, and the company will gradually open it up to all of OpenAI’s paying customers by the fall.
Previous versions of ChatGPT have had the ability to listen to spoken questions and respond with audio by transcribing the questions into text, running them through its AI algorithm, and then reading its text response out loud. But the new voice features are built on OpenAI’s latest AI model, which directly processes audio without needing to convert it to text first. That allows the bot to listen to multiple voices at once and determine a person’s tone of voice, responding differently based on what it thinks the person’s emotions are.
That opens up a whole new set of questions, such as how cultural differences come into play, or whether people might develop relationships with bots that are trained to respond to their emotions in specific ways. OpenAI said it worked with people representing 45 languages and 29 “geographies” to improve the AI model’s capabilities.
Only four unique voices will be available to use, and the tool will block attempts to get the bot to generate voices of real people, the company said.