TECH NEWS – The super popular AI chatbot ChatGPT, developed by OpenAI, was previously just a textbox. Now, it’s learning a new way to understand your questions.
The majority of the changes OpenAI has made to ChatGPT concern what the AI-powered bot knows: the questions it can answer, the information it can access, and the underlying models it has enhanced. This time, however, it is also changing the way ChatGPT itself is used. The company is launching a new version of the service that allows you to ask the AI bot not only to type sentences into a text box, but also to speak aloud or just upload a picture.
The new features will be rolled out over the next two weeks for those who pay for ChatGPT, with everyone else getting it “soon”, according to OpenAI.
The voice chat part is pretty familiar. You tap a button and say your question. The software converts it to text and feeds it into the big language model. It gets back an answer, converts it back into speech, and says the answer out loud. It will feel like you’re talking to Alexa or Google Assistant. But, OpenAI hopes the answers will be better thanks to improved core technology. It seems most virtual assistants are being rebuilt to rely on LLM – OpenAI is just ahead of the rest.
OpenAI’s excellent Whisper model does much of the speech-to-text conversion. The company is introducing a new text-to-speech model that claims to be able to “generate human-like audio from just text and a few seconds of sample speech.” ChatGPT’s voice will have five options to choose from. But OpenAI seems to think the model has much more potential than that. OpenAI is working with Spotify, for example, to translate podcasts into other languages, all while preserving the podcaster’s voice. Synthesised audio has many exciting uses. OpenAI could be an essential player in this industry.
Concerns and achievements in the development of ChatGPT
But the fact that it is possible to create a synthetic voice from a few seconds of audio opens the door to all sorts of problematic use cases. “These capabilities also present new risks, such as the potential for malicious actors to impersonate public figures or commit fraud,” the company says in a blog post announcing the new features.
According to OpenAI, this is why the model will not be available for widespread use; instead, it will be controlled and limited to specific use cases and partnerships.
Image search, meanwhile, is a bit like Google Lens. You take a picture of anything you’re interested in, and ChatGPT tries to figure out what you’re asking about and answers accordingly. You can also use the app’s drawing tool to help clarify your question. Alternatively, you can talk or type questions into the picture. This is where the back-and-forth nature of ChatGPT comes in handy. Rather than searching, getting the wrong answer, and then starting another search, you can ask the bot, and it can refine the answer on the fly. (This is very similar to what Google does with multimodal search.)
Obviously, image search has its own potential problems. One is what can happen when you ask a chatbot about a person. OpenAI says it has deliberately limited ChatGPT’s “ability to analyse and make direct statements about people” for both accuracy and privacy reasons. This means that one of the most mysterious sci-fi visions of artificial intelligence – the ability to look at someone and ask, “Who is that?” – won’t be happening anytime soon. Which is probably a good thing.
Source: X
Use your voice to engage in a back-and-forth conversation with ChatGPT. Speak with it on the go, request a bedtime story, or settle a dinner table debate.
Sound on 🔊 pic.twitter.com/3tuWzX0wtS
— OpenAI (@OpenAI) September 25, 2023
Leave a Reply