ChatGPT rolls out voice and image capabilities

US-based artificial intelligence (AI) research firm, OpenAI, on September 25, announced introducing fresh voice and image capabilities into ChatGPT to make the chat platform more versatile and user-friendly.

“ChatGPT can now see, hear and speak,” the company stated in a blog post, while adding that these new features introduce a novel and more user-friendly interface, enabling users to engage in voice conversations or visually illustrate their discussions to ChatGPT.

Voice input

One of the notable additions is voice input, allowing users to interact with ChatGPT through spoken questions. The AI then processes the speech, provides a response, and even reads the answer aloud, similar to popular virtual assistants like Alexa or Google Assistant.

OpenAI anticipated that these improvements will be powered by enhanced underlying technology, promising more accurate and insightful responses.

OpenAI leverages its Whisper model for speech-to-text conversion, while introducing a new text-to-speech model capable of generating remarkably human-like audio from plain text and a short sample of speech. Users will have the option to choose from five distinct voices for ChatGPT.

OpenAI is also partnering with Spotify to translate podcasts into various languages while preserving the original podcaster’s voice, highlighting the potential of synthetic voices in various applications.

Image recognition

Another noteworthy addition is image recognition, akin to Google Lens. Users can now snap a photo and prompt ChatGPT to analyze the image and respond accordingly. This feature includes a drawing tool to refine queries and options to speak or type questions alongside the image.

Unlike traditional search engines, ChatGPT offers a back-and-forth interaction, enabling users to iteratively refine their queries for more accurate results. OpenAI’s continuous efforts to expand ChatGPT’s capabilities without compromising on ethics and safety highlight the evolving landscape of AI.

As more users embrace voice commands and image search, maintaining these boundaries will become an increasingly complex task, but one that OpenAI is committed to addressing as the technology progresses.

Potential risks attached

These advancements, however, come with potential challenges. OpenAI acknowledged the risks associated with synthetic voices, such as the potential for malicious actors to impersonate public figures or commit fraud.

To avoid this, the company said it is “using this technology to power a specific use case”.

OpenAI exercised caution with image recognition too, intentionally restricting ChatGPT’s ability to analyze and make direct statements about individuals for reasons of accuracy and privacy. While the notion of an AI identifying people from images remains in the realm of science fiction, OpenAI sees this limitation as a responsible step forward.

ChatGPT rolls out voice and image capabilities

ASEAN Express to boost Malaysia’s trade connectivity with China and Southeast Asia

China’s Chang’e-6 lunar mission returns to Earth with first samples from far side of the moon

Experts highlight untapped potential and health benefits of palm oil at COMSTECH seminar

WikiLeaks founder Julian Assange finally free after pleading guilty in deal with US

Switzerland jails Indian-Swiss billionaire family members for exploiting staff in Swiss mansion

More Articles Like This

Categories

Links

Stay connected