The Market Map of AI Voice & Call Center Agents

Insights
Marisa Krummrich
May 7, 2025
min read

The Market Map of AI Voice & Call Center Agents

Insights
Marisa Krummrich
Published on
May 7, 2025
min read
Back to stories

The Market Map of AI Voice & Call Center Agents

The Market Map of AI Voice & Call Center Agents

Insights
Marisa Krummrich
May 7, 2025
min read

Voice is a deeply human way to interact: Intuitive, fast, and often richer than text. And as voice interfaces improve, they’re starting to reshape how we work. We now talk to our machines: across our team, we use voice regularly to speak with tools like ChatGPT, DeepL, and TextCortex. Even some developers, who traditionally preferred precise keyboard input, are experimenting with voice-driven coding instructions. This shift doesn’t just change workflows, it’s starting to change physical office setups too, as speaking out loud doesn’t fit well in open-plan spaces. Voice is no longer a novelty - it’s becoming embedded, and that raises big questions for where things go next.

As a team who spends every day thinking about AI innovation and its promises, we did a deep dive on the biggest opportunities from voice AI - specifically in the B2B context, since this is our fund’s core playing field. We named it “call center” AI agents, though as you will see below this goes beyond the typical customer support call center, and more broadly to any caller function that AI can take over for a business.

So, what does it take to build an enduring company in AI voice and call centers?

Market Pulse: AI Voice Agents

First, a brief recap of where the market stands today.

  • Model quality is largely a solved problem: Incumbents and well-funded startup have released the underlying technology for virtual humans with astounding quality and low latency. They tend to either focus on certain components such as voice synthesis (e.g., OpenAI realtime API, Eleven Labs, Cartesia Sonic), or provide “synthetic humans” with speech and video (e.g., Canopy Labs, Synthesia). 2024 brought notable advances in controlling nuanced aspects of synthetic speech, from emotional tone and pacing to precise pronunciation. This makes synthetic outputs increasingly human-like. However, models can still struggle with edge cases like background noise. And, while the top layer with speech and video generation is sophisticated, real-world workflows often hit bottlenecks in the backend systems, when handling dialogue and task execution - as we will explore below.
  • Agents have evolved from Supporters to Actors: The first wave of Call Center AI Solutions was focused on automating, guiding and analyzing conversation flow in call centers. As GenAI technology matures, the second wave of solutions is able to understand what to do with conversations and act on it. Conversational AI and emotionally intelligent chatbots are taking over the full job of the call center agent for many tasks already.

Since the tech foundation for AI voice has reached incredible quality with nuanced emotion and low latency, it is already becoming a commodity. A major indicator is that „generic“ call agents are already a very crowded market (see our map below) with large funding rounds even at early stages (voice model platform Cartesia raised a $27m seed in 2024, no-code voice agent builder Synthflow raised a $7.4m seed in 2024).

How the Market is Responding to AI Voice

  • Trend #1: Measurably Better and Cheaper Customer EngagementAcross digital channels, customers have rising expectations for service quality. AI enables 24/7 responsiveness and resolution (as of now, mostly possible within a confined problem space), automates routine inquiries and tasks, and reduces wait times and operational costs. The explosion of customer data also provides rich fuel for AI analytics, enabling better service quality. This enables companies to see measurable outcomes within weeks when deploying AI solutions for customer engagement.
  • Trend #2: Voice goes Mainstream with Growth in # of companiesTech advancement is making it very accessible to build applications with “synthetic humans” (see also a tweet by Olivia Moore (a16z) here who built an AI voice agent within 30 minutes). Companies building with voice represented 22% of the most recent YC class, per Cartesia. On a micro level, this means it becomes harder for a new start-up to differentiate, placing the focus more on GTM speed/execution.
  • Trend #3: Monetization under PressureSimilar to the LLM stack in other modalities, the underlying tech is becoming commoditized and more affordable: In December 2024, OpenAI dropped the price of the GPT-4o realtime API by 60% for input (to $40/1M tokens) and 87.5% for output (to $2.50/1M tokens). GPT-4o mini is also now available via realtime. With this, cost-based pricing per minute is not necessarily the best model anymore, and companies are shifting towards platform SaaS pricing and Call Center as a Service.

Market Map: AI Call Center Agents

Note: Some companies would fit in several boxes, the placement is indicative. This map is also biased towards the European ecosystem.

As we pointed out above, the “general” call center space is seeing the most action. This is not a big surprise, since a typical customer service call is much less sophisticated and critical than a sales call or even a full-blown negotiation in the supply chain (“I am unhappy with my electric toothbrush” versus “Can you commit to 500 units at a fixed price, delivered by end of month, and if not what are the alternatives?”).

Some notable companies from the map include:

  • Parloa *(2017; Berlin, Germany; raised USD 218M over 4 rounds): is an enterprise-grade contact center AI platform automating customer service.
  • Sierra *(2023; San Francisco, USA; raised USD 285M over 2 rounds): helps businesses deploy customisable AI agents for customer support that reflect the brand’s tone.
  • Happy Robot *(2022; San Francisco, USA; raised USD 16.7M over 3 funding rounds): is a voice AI tool that automates phone operations in the logistics and fleet management field.

The Emerging Frontier: How to Build an Enduring Company in AI Voice & Call Centers

A few concrete factors often make the difference between short-term success and a truly defensible company. For AI Voice and Call Center, we see 3 strong signals:

Signal 1: Complex, Vertical-Specific Workflows across Organizations

As AI voice agents become more ubiquitous, differentiation will come from solving complex, multi-step tasks that are highly industry - and function-specific. Deep integration in complex vertical workflows becomes particularly interesting when happening across organizations, especially in cases where structured integrations are impractical - for instance, when communicating with smaller suppliers or handling negotiation and exception cases that require flexible, human-like interaction. In these contexts, voice becomes the most efficient and natural medium. More complex integration also creates higher stickiness.

For example, an AI agent handling airline customer service end-to-end must instantly retrieve passenger records, check flight availability, rebook tickets, and apply airline policies - often navigating exceptions and special requests that require flexible handling. Another example, beyond traditional call centers, is AI in supply chain logistics: These agents must coordinate suppliers, manage dependencies across the supply chain, communicate across multiple channels (voice, email, SMS), and integrate with ERPs in real time, including managing negotiations and unexpected changes.

Signal 2: Achieve 100% AI Replacement - rather than AI Assistance - by finding a good “Trusted Wedge”

Enterprises rarely hand over human responsibility fully to an AI agent from day one - which is why many companies opt for co-pilot models, where AI assists rather than replaces the human. This hesitation is driven not only by trust issues, but also by technical limitations, ingrained user habits, potential regulatory requirements for human oversight, the challenge of handling edge cases, etc. A co-pilot model unfortunately limits ROI, as efficiency gains at scale happen when humans can be fully taken out of the loop.

This first “full handover”, where an AI agent handles the task end-to-end, typically happens in lower-stake scenarios. Examples are after-hours calls or routine back-office interactions such as prescription refill calls from healthcare providers to pharmacies. When these initial cases deliver measurable success, the AI solutions gain trust within the company. This is the basis for expansion into more complex workflows.

The winning company will likely be the one that finds one (or more) really good initial “trusted wedge(s)” - a use case that earns trust while minimizing operational and legal risks, enabling faster scaling to broader deployments.

Signal 3: Out-of-the Box Agents - UX is King

Call centers are critical for customer satisfaction but not the core product companies want to build or optimize. Instead, they seek a solution that works out of the box - much like traditional call center offshoring but with automation. The company that build self-serve deployment, much like hiring a human agent, can be productive on day one. Over time, these solutions could “land and expand” into deeper integrations, automating more complex workflows (see previous points).

Traction is the signal for this. Especially if it shows not just initial adoption by innovators, but the potential to serve broader, more traditional customers over time.

Conclusion

In voice AI, defensibility no longer comes from technology alone. As capabilities commoditize, the real challenge lies in delivering clear value to customers - especially in complex, real-world workflows.

Certain signals suggest a company is on the path to building a lasting advantage: deep industry understanding, flexible architectures, strong switching costs, and a roadmap toward adaptive, learning-based systems. These are not recipes - we know the hard work of actually making this real falls on founders. When these elements start to show up early, they often point to something worth watching.

If you want to further read into the state of voice AI and AI voice agents, we recommend the article by a16z here and a blog by Cartesia here.

Looking ahead, it’s also worth asking: Will voice remain the dominant interface for commercial interaction - or is it just a temporary medium for communication, destined to eventually be replaced by AI agents quietly coordinating with company systems in the background (autonomous or pre-programmed - read more thoughts on that topic from our partner Andreas here)?

If you're building a company that understands challenges around finding defensibility and is deeply engrained into a specific industry, we'd love to hear from you.

Voice is a deeply human way to interact: Intuitive, fast, and often richer than text. And as voice interfaces improve, they’re starting to reshape how we work. We now talk to our machines: across our team, we use voice regularly to speak with tools like ChatGPT, DeepL, and TextCortex. Even some developers, who traditionally preferred precise keyboard input, are experimenting with voice-driven coding instructions. This shift doesn’t just change workflows, it’s starting to change physical office setups too, as speaking out loud doesn’t fit well in open-plan spaces. Voice is no longer a novelty - it’s becoming embedded, and that raises big questions for where things go next.

As a team who spends every day thinking about AI innovation and its promises, we did a deep dive on the biggest opportunities from voice AI - specifically in the B2B context, since this is our fund’s core playing field. We named it “call center” AI agents, though as you will see below this goes beyond the typical customer support call center, and more broadly to any caller function that AI can take over for a business.

So, what does it take to build an enduring company in AI voice and call centers?

Market Pulse: AI Voice Agents

First, a brief recap of where the market stands today.

  • Model quality is largely a solved problem: Incumbents and well-funded startup have released the underlying technology for virtual humans with astounding quality and low latency. They tend to either focus on certain components such as voice synthesis (e.g., OpenAI realtime API, Eleven Labs, Cartesia Sonic), or provide “synthetic humans” with speech and video (e.g., Canopy Labs, Synthesia). 2024 brought notable advances in controlling nuanced aspects of synthetic speech, from emotional tone and pacing to precise pronunciation. This makes synthetic outputs increasingly human-like. However, models can still struggle with edge cases like background noise. And, while the top layer with speech and video generation is sophisticated, real-world workflows often hit bottlenecks in the backend systems, when handling dialogue and task execution - as we will explore below.
  • Agents have evolved from Supporters to Actors: The first wave of Call Center AI Solutions was focused on automating, guiding and analyzing conversation flow in call centers. As GenAI technology matures, the second wave of solutions is able to understand what to do with conversations and act on it. Conversational AI and emotionally intelligent chatbots are taking over the full job of the call center agent for many tasks already.

Since the tech foundation for AI voice has reached incredible quality with nuanced emotion and low latency, it is already becoming a commodity. A major indicator is that „generic“ call agents are already a very crowded market (see our map below) with large funding rounds even at early stages (voice model platform Cartesia raised a $27m seed in 2024, no-code voice agent builder Synthflow raised a $7.4m seed in 2024).

How the Market is Responding to AI Voice

  • Trend #1: Measurably Better and Cheaper Customer EngagementAcross digital channels, customers have rising expectations for service quality. AI enables 24/7 responsiveness and resolution (as of now, mostly possible within a confined problem space), automates routine inquiries and tasks, and reduces wait times and operational costs. The explosion of customer data also provides rich fuel for AI analytics, enabling better service quality. This enables companies to see measurable outcomes within weeks when deploying AI solutions for customer engagement.
  • Trend #2: Voice goes Mainstream with Growth in # of companiesTech advancement is making it very accessible to build applications with “synthetic humans” (see also a tweet by Olivia Moore (a16z) here who built an AI voice agent within 30 minutes). Companies building with voice represented 22% of the most recent YC class, per Cartesia. On a micro level, this means it becomes harder for a new start-up to differentiate, placing the focus more on GTM speed/execution.
  • Trend #3: Monetization under PressureSimilar to the LLM stack in other modalities, the underlying tech is becoming commoditized and more affordable: In December 2024, OpenAI dropped the price of the GPT-4o realtime API by 60% for input (to $40/1M tokens) and 87.5% for output (to $2.50/1M tokens). GPT-4o mini is also now available via realtime. With this, cost-based pricing per minute is not necessarily the best model anymore, and companies are shifting towards platform SaaS pricing and Call Center as a Service.

Market Map: AI Call Center Agents

Note: Some companies would fit in several boxes, the placement is indicative. This map is also biased towards the European ecosystem.

As we pointed out above, the “general” call center space is seeing the most action. This is not a big surprise, since a typical customer service call is much less sophisticated and critical than a sales call or even a full-blown negotiation in the supply chain (“I am unhappy with my electric toothbrush” versus “Can you commit to 500 units at a fixed price, delivered by end of month, and if not what are the alternatives?”).

Some notable companies from the map include:

  • Parloa *(2017; Berlin, Germany; raised USD 218M over 4 rounds): is an enterprise-grade contact center AI platform automating customer service.
  • Sierra *(2023; San Francisco, USA; raised USD 285M over 2 rounds): helps businesses deploy customisable AI agents for customer support that reflect the brand’s tone.
  • Happy Robot *(2022; San Francisco, USA; raised USD 16.7M over 3 funding rounds): is a voice AI tool that automates phone operations in the logistics and fleet management field.

The Emerging Frontier: How to Build an Enduring Company in AI Voice & Call Centers

A few concrete factors often make the difference between short-term success and a truly defensible company. For AI Voice and Call Center, we see 3 strong signals:

Signal 1: Complex, Vertical-Specific Workflows across Organizations

As AI voice agents become more ubiquitous, differentiation will come from solving complex, multi-step tasks that are highly industry - and function-specific. Deep integration in complex vertical workflows becomes particularly interesting when happening across organizations, especially in cases where structured integrations are impractical - for instance, when communicating with smaller suppliers or handling negotiation and exception cases that require flexible, human-like interaction. In these contexts, voice becomes the most efficient and natural medium. More complex integration also creates higher stickiness.

For example, an AI agent handling airline customer service end-to-end must instantly retrieve passenger records, check flight availability, rebook tickets, and apply airline policies - often navigating exceptions and special requests that require flexible handling. Another example, beyond traditional call centers, is AI in supply chain logistics: These agents must coordinate suppliers, manage dependencies across the supply chain, communicate across multiple channels (voice, email, SMS), and integrate with ERPs in real time, including managing negotiations and unexpected changes.

Signal 2: Achieve 100% AI Replacement - rather than AI Assistance - by finding a good “Trusted Wedge”

Enterprises rarely hand over human responsibility fully to an AI agent from day one - which is why many companies opt for co-pilot models, where AI assists rather than replaces the human. This hesitation is driven not only by trust issues, but also by technical limitations, ingrained user habits, potential regulatory requirements for human oversight, the challenge of handling edge cases, etc. A co-pilot model unfortunately limits ROI, as efficiency gains at scale happen when humans can be fully taken out of the loop.

This first “full handover”, where an AI agent handles the task end-to-end, typically happens in lower-stake scenarios. Examples are after-hours calls or routine back-office interactions such as prescription refill calls from healthcare providers to pharmacies. When these initial cases deliver measurable success, the AI solutions gain trust within the company. This is the basis for expansion into more complex workflows.

The winning company will likely be the one that finds one (or more) really good initial “trusted wedge(s)” - a use case that earns trust while minimizing operational and legal risks, enabling faster scaling to broader deployments.

Signal 3: Out-of-the Box Agents - UX is King

Call centers are critical for customer satisfaction but not the core product companies want to build or optimize. Instead, they seek a solution that works out of the box - much like traditional call center offshoring but with automation. The company that build self-serve deployment, much like hiring a human agent, can be productive on day one. Over time, these solutions could “land and expand” into deeper integrations, automating more complex workflows (see previous points).

Traction is the signal for this. Especially if it shows not just initial adoption by innovators, but the potential to serve broader, more traditional customers over time.

Conclusion

In voice AI, defensibility no longer comes from technology alone. As capabilities commoditize, the real challenge lies in delivering clear value to customers - especially in complex, real-world workflows.

Certain signals suggest a company is on the path to building a lasting advantage: deep industry understanding, flexible architectures, strong switching costs, and a roadmap toward adaptive, learning-based systems. These are not recipes - we know the hard work of actually making this real falls on founders. When these elements start to show up early, they often point to something worth watching.

If you want to further read into the state of voice AI and AI voice agents, we recommend the article by a16z here and a blog by Cartesia here.

Looking ahead, it’s also worth asking: Will voice remain the dominant interface for commercial interaction - or is it just a temporary medium for communication, destined to eventually be replaced by AI agents quietly coordinating with company systems in the background (autonomous or pre-programmed - read more thoughts on that topic from our partner Andreas here)?

If you're building a company that understands challenges around finding defensibility and is deeply engrained into a specific industry, we'd love to hear from you.

Go to website
URL copied to clipboard

Learn the Essentials of Entrepreneurship

Discover our curated collection of tools, best practices and relevant articles. Get started now.
Explore our startup resources