Bangkok, April 2026. The air here usually hums with the symphony of tuk-tuks, street food vendors calling out their wares, and the incessant chatter of a city that never quite sleeps. But lately, there’s a new sound creeping into the digital ether, a familiar yet uncanny echo. It’s the sound of AI voices, and they’re getting eerily good. So good, in fact, that a company called ElevenLabs, co-founded by Matiur Rahman, has managed to turn this once-niche technology into a reported billion-dollar enterprise. And if you ask me, the real story isn't just about the tech, it's about the people behind it, and how their global vision might just be changing the soundscape of places like Thailand.
I’ve always been a bit skeptical of the hype around AI. We’ve seen enough digital ghosts and promises of Skynet to make a sensible person wary. But then you hear an ElevenLabs-generated voice, maybe narrating an audiobook in perfect Thai, and you have to admit, it’s a game-changer. It’s not just mimicking; it’s capturing nuance, emotion, the very jai or 'heart' of a language. And that, my friends, is where Matiur Rahman, the soft-spoken co-founder, enters the stage.
The Defining Moment: A Whisper Becomes a Roar
Imagine a world where every voice, every story, every piece of knowledge could be instantly translated and spoken in any language, by a voice that sounds utterly natural. For Rahman, this wasn’t just a futuristic fantasy; it was a deeply personal mission. He once recounted in an interview how his own experiences with language barriers, particularly during his childhood moving between different countries, sparked an early fascination with communication. He saw how easily messages could be lost, or worse, misinterpreted, when the human element of voice was stripped away or poorly rendered. This wasn't just about text to speech; it was about emotion to speech, about preserving the soul of communication. This deep-seated understanding of the human need for connection, even across linguistic divides, became the bedrock of ElevenLabs.
The Origin Story: From Poland to the Global Stage
Matiur Rahman’s journey began far from the bustling tech hubs of Silicon Valley. Born in Bangladesh, he spent his formative years in Poland, a country with a rich linguistic heritage. This early exposure to diverse languages and cultures, coupled with a natural aptitude for mathematics and computer science, laid the groundwork for his future endeavors. He pursued his higher education at the University of Cambridge, a place renowned for its intellectual rigor, where he delved into computer science. It was there, amidst the ancient spires and cutting-edge research, that his fascination with artificial intelligence truly blossomed. He wasn't just learning to code; he was learning to think about how machines could understand and replicate the most human of traits: communication.
After Cambridge, Rahman’s path led him through various roles, including a stint at Google, where he honed his skills in machine learning and large-scale systems. These experiences were crucial, giving him a front-row seat to the advancements in AI and the immense potential it held. But like many true innovators, he wasn't content to simply work within established frameworks. He saw a gap, a profound need for something more than just functional speech synthesis. He envisioned a technology that could capture the very essence of a voice, its intonation, its rhythm, its unique character.
Meeting the Co-Founder: A Shared Vision
The spark for ElevenLabs truly ignited when Rahman met his co-founder, Piotr Dabkowski. They shared a common background, both having worked at Google, and a mutual frustration with the limitations of existing speech synthesis technology. Dabkowski, with his background in machine learning and deep learning, complemented Rahman's vision perfectly. Their conversations, often stretching late into the night, revolved around the idea of building a truly expressive and natural-sounding AI voice. It wasn't about creating robotic voices that could read text; it was about creating digital clones that could convey emotion, personality, and even subtle humor. They realized that the technology wasn't just a novelty; it had the potential to revolutionize everything from audiobook narration to film dubbing, and even accessibility for those with speech impairments. It was a grand ambition, a bit like trying to catch lightning in a bottle, but they were determined.
The Breakthrough: More Than Just Words
The real breakthrough for ElevenLabs wasn't just in making voices sound human; it was in making them sound like specific humans. Their proprietary deep learning models could clone a voice from just a few minutes of audio, then generate new speech in that voice, complete with emotional inflections. This was a significant leap beyond what was available. Imagine a Thai voice actor, whose voice is beloved for narrating historical dramas, suddenly being able to narrate hundreds of hours of new content without ever stepping into a studio again. Or a local teacher, whose voice is comforting to their students, having their lessons automatically translated into multiple languages, all spoken in their own familiar tone. The possibilities, especially in a linguistically diverse region like Southeast Asia, are immense.









