Startup Finance Guide
Voice AI telephony infrastructure diagram for Indian fintech and collections founders
News

India's voice AI infrastructure shift: what founders building collections and fintech apps need to know

SMBy Sandilya M6 min read5 sources
Photo · Startup Finance Guide

India's $1.8B voice AI market is bottlenecked by decade-old telecom infrastructure. Fintech and collections founders must audit telephony dependencies before scaling voice agents.

This article is for informational purposes only and does not constitute financial, tax, or legal advice. Consult a qualified professional for guidance specific to your situation.

Editorial note: Reviewed for accuracy by the Startup Finance Guide editorial team. Our editors cross-reference all claims against platform documentation, regulatory publications, and vendor disclosures. Last reviewed: 2026-06-18.


India's voice AI market is projected to reach $1.8 billion by 2030, according to industry estimates cited at the Inc42 AI Summit 2026, yet the telecom infrastructure carrying those voice calls still runs on systems built more than a decade ago for human-to-human traffic. For founders deploying AI voice agents in collections, merchant onboarding, or fraud verification, that gap is not a future problem. It is a present one.

The tension surfaced publicly during a panel at the Inc42 AI Summit 2026 in June, where Suman Gandham, cofounder and CEO of Vobiz.ai (a Bengaluru-based AI-first telephony infrastructure startup that raised over $1 million in seed funding), Maitreya Wagh, founder of Bolna (a multilingual voice AI orchestration platform), and Nitin Pulyani, SVP and head of product at Cashfree Payments (a Bengaluru-based payment infrastructure company), described the same operational ceiling: AI models are improving faster than the telecom layer beneath them can support. The panel was moderated by Inc42 senior editor Nikhil Subramaniam.

What changed

The problem is not new, but the scale is. Vobiz.ai reported that its call volume grew from 100,000 to 3 million concurrent daily calls within a single month, and the company is targeting 30 million daily calls by year-end. That trajectory exposes a structural mismatch that smaller deployments could previously absorb.

Legacy public switched telephone network (PSTN) infrastructure operates at 8 kHz audio sampling, a rate set for intelligible human speech, not for the acoustic precision that real-time speech-to-text models need to perform reliably. Add high and variable latency, no native noise suppression, and limited programmatic call controls, and the result is that even a well-tuned AI model degrades noticeably once the call hits the carrier layer.

Gandham put it directly at the summit: "The AI models change every day, but the telco layer is still sitting on old infrastructure made for human-to-human calls. Legacy players are retrofitting themselves for AI voice, but they are not building ground-up. You need an AI-first telco infra layer to handle deeper call controls, low latency, and structural noise suppression."

Cashfree Payments' Pulyani described concrete fintech use cases where this matters: replacing paper-heavy merchant onboarding flows with voice agents, and running automated fraud verification calls. Both require the voice agent to respond in near-real time and to handle unexpected audio conditions. Neither works reliably on a retrofitted PSTN stack under traffic surges.

The Telecom Regulatory Authority of India (TRAI) has not yet issued specific technical standards for AI-generated voice calls, and the Reserve Bank of India (RBI) has published guidelines on digital lending and customer communication but has not addressed AI voice agent infrastructure requirements directly. That regulatory gap means founders are currently making infrastructure choices without a compliance floor to build toward.

What this means for founders

If you are building a voice agent for collections, loan servicing, or merchant onboarding in India, the telephony vendor you choose is as consequential as the language model you use. Here is what to pressure-test before signing a contract.

Audio quality floor. Ask vendors whether their infrastructure supports wideband or HD voice (16 kHz or higher sampling). An 8 kHz connection will degrade transcription accuracy, particularly for regional language speakers, which is the exact population the next wave of Indian fintech products targets.

Latency guarantees. Conversational AI requires end-to-end round-trip latency below roughly 300 milliseconds to feel natural. Get this in writing, with SLA penalties. Vendors that retrofit AI capabilities onto legacy PSTN routing rarely publish latency figures because they cannot guarantee them.

Call control depth. Collections and fraud verification workflows require mid-call branching: transferring to a human agent, injecting a compliance disclosure, or terminating a call based on a real-time signal. Legacy systems expose limited programmatic controls. AI-first telephony platforms, including Vobiz.ai and competitors such as Floatbot (an Ahmedabad-based conversational AI platform), Vodex (a voice AI sales automation platform), and Retell AI (a US-based voice AI infrastructure provider), are building these controls natively.

Traffic surge handling. The Vobiz.ai data point, 100,000 to 3 million concurrent calls in a month, is a stress test most legacy carriers were not designed for. Ask vendors for documented peak-load performance, not average-load figures.

Regulatory audit trail. The RBI's digital lending guidelines require lenders to maintain records of customer interactions. Confirm that your telephony vendor produces call logs and recordings in a format your compliance team can actually use, and that data residency meets Indian data localisation norms.

For US-incorporated startups with Indian operations, there is an additional layer. Voice-based debt collection in the US falls under Regulation F, the Consumer Financial Protection Bureau (CFPB)'s Debt Collection Rule in force since 30 November 2021, which imposes contact-frequency limits and disclosure requirements on third-party debt collectors. If your voice agent calls Indian borrowers from a US entity, or if you are building for a US lender, Regulation F compliance requirements apply to the call content and frequency, regardless of which telephony infrastructure carries the call.

Limitations and open questions

Several things are genuinely unsettled.

TRAI has not published technical standards for AI voice agents on Indian carrier networks. That means there is no regulatory definition of what "AI-first" telephony infrastructure must provide, and vendor claims in this space are currently self-certified.

The RBI has not clarified whether automated voice agents conducting collections or loan servicing calls must meet the same disclosure standards as human agents. The digital lending guidelines address communication broadly, but the specific obligations for AI-generated voice are not spelled out. Founders should not assume that a compliant human-agent script, read by an AI, satisfies the spirit of those guidelines without legal review.

The $1.8 billion market projection for Indian voice AI by 2030 comes from industry estimates, not from a government statistical body or an independent research firm with a published methodology. Treat it as directional, not precise.

Finally, the competitive dynamics among AI-first telephony vendors in India are early. Vobiz.ai and Bolna are both seed-stage companies. Floatbot, Vodex, and Retell AI are at different stages of maturity and geographic focus. None has been stress-tested at the scale that a large NBFC or payments network would require. Founders should run pilots with real traffic before committing infrastructure budgets.

The infrastructure gap is real and documented. The regulatory framework to govern it is not yet written.


This article is for informational purposes only and does not constitute financial, tax, or legal advice. Consult a qualified professional for guidance specific to your situation.

Sources

All newsUpdated 18 June 2026