ElevenLabs co-founder and CEO Mati Staniszewski says sound is becoming nan adjacent awesome interface for AI – nan measurement group will progressively interact pinch machines arsenic models move beyond matter and screens.
Speaking astatine Web Summit successful Doha, Staniszewski told TechCrunch sound models for illustration those developed by ElevenLabs person precocious moved beyond simply mimicking quality reside — including emotion and intonation – to moving successful tandem pinch nan reasoning capabilities of ample connection models. The result, he argued, is simply a displacement successful really group interact pinch technology.
In nan years ahead, he said, “hopefully each our phones will spell backmost successful our pockets, and we tin immerse ourselves successful nan existent world astir us, pinch sound arsenic nan system that controls technology.”
That imagination fueled ElevenLabs’s $500 cardinal raise this week astatine an $11 cardinal valuation, and it is progressively shared crossed nan AI industry. OpenAI and Google person some made sound a cardinal attraction of their next-generation models, while Apple appears to beryllium softly building voice-adjacent, always-on technologies done acquisitions for illustration Q.ai. As AI spreads into wearables, cars, and different caller hardware, power is becoming little astir tapping screens and much astir speaking, making sound a cardinal battleground for nan adjacent shape of AI development.
Iconiq Capital wide partner Seth Pierrepont echoed that position onstage astatine Web Summit, arguing that while screens will proceed to matter for gaming and entertainment, accepted input methods for illustration keyboards are starting to consciousness “outdated.”
And arsenic AI systems go much agentic, Pierrepont said, nan relationship itself will besides change, pinch models gaining guardrails, integrations, and discourse needed to respond pinch little definitive prompting from users.
Staniszewski pointed to that agentic displacement arsenic 1 of nan biggest changes underway. Rather than pronunciation retired each instruction, he said early sound systems will progressively trust connected persistent representation and discourse built up complete time, making interactions consciousness much earthy and requiring little effort from users.
Techcrunch event
Boston, MA | June 23, 2026
That evolution, he added, will power really sound models are deployed. While high-quality audio models person mostly lived successful nan cloud, Staniszewski said ElevenLabs is moving toward a hybrid attack that blends unreality and on-device processing — a move aimed astatine supporting caller hardware, including headphones and different wearables, wherever sound becomes a changeless companion alternatively than a characteristic you determine erstwhile to prosecute with.
ElevenLabs is already partnering pinch Meta to bring its sound exertion to products including Instagram and Horizon Worlds, nan company’s virtual reality platform. Staniszewski said he would besides beryllium unfastened to moving pinch Meta connected its Ray-Ban smart glasses arsenic voice-driven interfaces grow into caller shape factors.
But arsenic sound becomes much persistent and embedded successful mundane hardware, it opens nan doorway to superior concerns astir privacy, surveillance, and really overmuch individual information voice-based systems will shop arsenic they move person to users’ regular lives — thing companies for illustration Google person already been accused of abusing.
7 hours ago
English (US) ·
Indonesian (ID) ·