Not so way back, generative AI may solely talk with human customers through textual content. Now it is more and more being given the facility of speech — and this means is enhancing by the day.
On Thursday, AI voice platform ElevenLabs launched v3, described on the corporate’s web site as “essentially the most expressive text-to-speech mannequin ever.” The brand new mannequin can exhibit a variety of feelings and delicate communicative quirks — like sighs, laughter, and whispering — making its speech extra humanlike than the corporate’s earlier fashions.
In a demo shared on X, v3 was proven producing the voices of two characters, one male and the opposite feminine, who had been having a lighthearted dialog about their newfound means to talk in additional humanlike voices.
Introducing Eleven v3 (alpha) – essentially the most expressive Textual content to Speech mannequin ever.
Supporting 70+ languages, multi-speaker dialogue, and audio tags equivalent to [excited], [sighs], [laughing], and [whispers].
Now in public alpha and 80% off in June. pic.twitter.com/n56BersdUc— ElevenLabs (@elevenlabsio) June 5, 2025
There is definitely not one of the Alexa-esque flatness of tone, however the v3-generated voices are typically virtually excessively animated, to the purpose that their laughter is extra creepy than charming — take a pay attention your self.
The mannequin also can converse greater than 70 languages, in comparison with its predecessor’s v2 restrict of 29. It is accessible now in public alpha, and its price ticket has been slashed by 80% till the tip of this month.
The way forward for AI interplay
AI-generated voice has grow to be a significant focus of innovation as tech builders look towards the way forward for human-machine interplay.
Automated assistants like Siri and Alexa have lengthy been in a position to converse, after all, however as anybody who routinely makes use of these techniques can attest, their voices are very mechanical, with a quite slender vary of emotional cadence and tones. They’re helpful for dealing with fast and straightforward duties, like taking part in a track or setting an alarm, however they do not make nice dialog companions.
A number of the newest text-to-speech (TTS) AI instruments, however, have been engineered to talk in voices which can be maximally lifelike and interesting.
Customers can immediate v3, for instance, to talk in voices which can be simply customizable by using “audio tags.” Consider these as stylistic filters that modify the output, and which may be inserted straight into textual content prompts: “Excited,” “Loudly,” “Sings,” “Laughing,” “Offended,” and so forth.
ElevenLabs is not the one firm racing to construct extra lifelike TTS fashions, which huge tech firms are promoting as a extra intuitive and accessible option to work together with AI.
In late Might, ElevenLabs competitor Hume AI unveiled its Empathic Voice Interface (EVI) 3 mannequin, which permits customers to generate customized voices by describing them in pure language. Equally nuanced conversational talents are additionally now on supply by Google’s Gemini 2.5 Professional Flash mannequin.
Need extra tales about AI? Join Innovation, our weekly publication.