How AI-driven speech-to-text can more accurately handle different dialects and what goes into the machine learning to enable this to happen

techtalks - speechmatics

In the first episode of the new series of the Broadcast Tech Talks Podcast, Jake Bickerton talks to Trevor Back from Speechmatics and Mira Pelovska from Broadteam about large language models and the future of multilingual automatic speech recognition.

The focus of the episode is about how AI-driven speech-to-text engines can more accurately and more effectively handle different dialects and accents, and what goes into the machine learning training to enable this to happen.

Trevor and Mira explain in straightforward terms what’s currently possible for applications such as live subtitling utilising speech recognition, and how highly trained AI from Speechmatics makes it possible to cope with different voices from around the world and produce highly accurate speech transcriptions in a matter of seconds.

The podcast also covers what the future holds for AI-driven transcription and translation services as their accuracy and abilities continue to grow, especially with the introduction of genAI enabling a move from speech transcription to speech comprehension.

What this means in practice is Speechmatics will, in the future, be able to understand what was said, how it was said and the context in which it was said, which will have a huge range of benefits and potential uses within the broadcast industry.