Monday, July 1, 2024

ChatGPT rolls out voice and image capabilities

Must Read

US-based artificial intelligence (AI) research firm, OpenAI, on September 25, announced introducing fresh voice and image capabilities into ChatGPT to make the chat platform more versatile and user-friendly.

“ChatGPT can now see, hear and speak,” the company stated in a blog post, while adding that these new features introduce a novel and more user-friendly interface, enabling users to engage in voice conversations or visually illustrate their discussions to ChatGPT.

Voice input

One of the notable additions is voice input, allowing users to interact with ChatGPT through spoken questions. The AI then processes the speech, provides a response, and even reads the answer aloud, similar to popular virtual assistants like Alexa or Google Assistant. 

OpenAI anticipated that these improvements will be powered by enhanced underlying technology, promising more accurate and insightful responses.

OpenAI leverages its Whisper model for speech-to-text conversion, while introducing a new text-to-speech model capable of generating remarkably human-like audio from plain text and a short sample of speech. Users will have the option to choose from five distinct voices for ChatGPT. 

OpenAI is also partnering with Spotify to translate podcasts into various languages while preserving the original podcaster’s voice, highlighting the potential of synthetic voices in various applications.

Image recognition

Another noteworthy addition is image recognition, akin to Google Lens. Users can now snap a photo and prompt ChatGPT to analyze the image and respond accordingly. This feature includes a drawing tool to refine queries and options to speak or type questions alongside the image.

Unlike traditional search engines, ChatGPT offers a back-and-forth interaction, enabling users to iteratively refine their queries for more accurate results. OpenAI’s continuous efforts to expand ChatGPT’s capabilities without compromising on ethics and safety highlight the evolving landscape of AI. 

As more users embrace voice commands and image search, maintaining these boundaries will become an increasingly complex task, but one that OpenAI is committed to addressing as the technology progresses.

Potential risks attached

These advancements, however, come with potential challenges. OpenAI acknowledged the risks associated with synthetic voices, such as the potential for malicious actors to impersonate public figures or commit fraud. 

To avoid this, the company said it is “using this technology to power a specific use case”.

OpenAI exercised caution with image recognition too, intentionally restricting ChatGPT’s ability to analyze and make direct statements about individuals for reasons of accuracy and privacy. While the notion of an AI identifying people from images remains in the realm of science fiction, OpenAI sees this limitation as a responsible step forward.

Latest

ASEAN Express to boost Malaysia’s trade connectivity with China and Southeast Asia

Malaysia’s trade connectivity with Southeast Asia and China received a boost with the launch of the ASEAN Express, an international freight train connecting Malaysia to Thailand, Laos, and China.

More Articles Like This