Smallest.ai — PM Deep Dive

Unlike Bolna's orchestration play, Smallest.ai's thesis is vertical integration — owning every layer of the voice pipeline. Their model family now covers the full stack, with Hydra (speech-to-speech) announced as a next-generation architecture.

Exhibit A — Smallest.ai Model Stack: Owned vs. Optional Third-Party

Telephony

resold / 3rd party

Twilio (+1, +91) Custom BYOT SIP trunking EU/UK/LATAM/APAC custom vendor dependency

↓

ASR / STT

Pulse — Proprietary

Pulse (36 languages) Code-switching support World's fastest RTF claimed Streaming + batch Deepgram (optional fallback) own inference

↓

Agent Logic

Atoms — Proprietary

Multi-node graph architecture Per-node fallback definitions Real-time barge-in 100+ corner case handlers Simulated pre-deployment testing ← Core Moat #1

↓

LLM

Electron V2 + BYO

Electron V2 (beats GPT mini) 45ms TTFT Fine-tunable on enterprise data OpenAI GPT-4 (optional) Anthropic Claude (optional) Bring Your Own LLM ← Core Moat #2

↓

TTS / Synthesis

Lightning V2 — Proprietary

Lightning V2 (100ms TTFB) Non-autoregressive architecture <1GB VRAM 30 languages, 1000s of accents Voice clone from 10s audio $0.05 per 10K chars ← Core Moat #3

↓

Post-Call

Analytics + Insights

Custom analytics dashboard Call logs + disposition tracking Per-token observability Model stall detection CRM via API / webhooks growing capability

Hydra

Speech-to-Speech

Full duplex multimodal Long context + tool calling Emotional voice output Eliminates ASR→LLM→TTS hops ← Next-gen architecture (announced)

Owned = Smallest.ai-trained proprietary model. Mixed = own model + optional 3rd party. Vendor = resold/API. The dark border indicates proprietary moat layers.

This is the defining difference between Smallest and every other voice AI company in India. The comparison below shows proprietary ownership by stack layer — this is what the "not a wrapper" claim rests on.

Exhibit B — Vertical Integration by Layer: Smallest.ai vs. Bolna vs. Retell

Smallest.ai

Bolna

Retell/Bland

ASR / STT

Pulse (own)

vendor

LLM / Brain

Electron V2 + BYO

vendor

TTS / Voice

Lightning V2 (own)

partial

vendor

Orchestration

Atoms (own)

orchestration layer

partial

On-Prem Deploy

full support

enterprise only

cloud only

Fine-tuning

enterprise + data

limited

not offered

Bar width ≈ degree of proprietary ownership / control of that layer. Smallest.ai's ownership is disproportionately deeper across 5 of 6 dimensions.

These companies are not competing on the same strategy. Bolna is betting on distribution + orchestration. Smallest is betting on model ownership + vertical integration. One will look smarter in 3 years depending on how the AI model commodity curve plays out.

Exhibit C — Strategic Comparison: Smallest.ai vs. Bolna

Dimension

Smallest.ai

Bolna

Architecture

Full vertical stackWIN
Owns ASR, LLM, TTS

Orchestration layer
Stitches 3rd-party APIs

TTS Latency

100ms TTFBWIN
Lightning V2 benchmarked vs ElevenLabs

~150–250ms via ElevenLabs
Pre-recorded buffers reduce perceived delay

LLM Control

Electron V2 (own) + BYOWIN
Fine-tunable on private data

100% 3rd-party (OpenAI, Claude)
No fine-tuning pathway today

India Languages

16 languages
Strong Hindi/English

10+ Indian languagesWIN
50+ accents, Hinglish, TRAI compliance

On-Prem / Security

Full on-prem + air-gapWIN
SOC2, HIPAA, ISO 27001, GDPR, PCI

On-prem enterprise only
India + US data residency

No-Code Builder

3-click agent creationWIN
Simulated pre-deployment testing

Agent builder exists
No visual flow designer (gap)

Traction

Millions of calls/month
BFSI + healthcare strong

200K calls/dayWIN
1,050 paying customers, 25+ case studies

Pricing

$0.01/min at scaleWIN
4× cheaper than peers; transparent tiers

$0.03/min starting
Volume discounts available

Research Depth

Published papers, CPAL 2026WIN
ASI framework, SonoEdit, Hydra

No research output
Engineering-led

WIN badge indicates which company holds the structural advantage on that dimension today. This is not a zero-sum race — both can win in different enterprise segments.

Exhibit D — Smallest.ai's Moat Layers (Bottom = Commodity → Top = Deepest Defense)

Telephony Infrastructure

Twilio/Plivo resold. Same as every competitor.

COMMODITY

Agent Platform (Atoms)

Graph-based orchestration with node-level fallbacks. Replicable in 6–9 months.

LOW MOAT

Lightning V2 TTS Performance

Non-autoregressive architecture. 100ms TTFB benchmark lead. 4× cost advantage.

REAL MOAT

Electron V2 — Fine-Tunable Voice LLM

SLM purpose-built for spoken language. Enterprises train on proprietary data → lock-in.

STRONG MOAT

Vertical Fine-Tuning + Enterprise Data Loops

BFSI, healthcare, retail-specific models trained on customer data. Compounding advantage.

DATA FLYWHEEL

Hydra — Speech-to-Speech Architecture

Full-duplex multimodal. Eliminates the ASR→LLM→TTS latency chain entirely. If this ships, moat widens to years.

POTENTIAL FORTRESS

Research Credibility + Talent Loop

Published papers, CPAL 2026, ASI framework. Attracts researchers other voice AI companies can't hire.

DEEPEST MOAT

The top two layers are speculative but structurally important. If Hydra ships and works, Smallest leapfrogs all orchestration-based competitors simultaneously.

The same vertical integration that creates Smallest's moat also creates a specific class of risk: everything is their problem. When Bolna's TTS degrades, it's ElevenLabs' problem. When Smallest's does, it's theirs. Here are the real gaps a PM would flag.

Exhibit E — Gap Analysis: Where Smallest.ai is Exposed

Gap	Current State	Risk to Business	Severity
Customer Traction Disclosure	"Millions of calls/month" — no customer count, no named case studies equivalent to Bolna's 1,050 customers. Stealth go-to-market.	Harder to win enterprise deals without social proof. Sales cycle drags. Harder to raise Series A on ARR story.	HIGH
Indian Language Depth vs. Bolna	Pulse supports 36 languages but the India-specific depth (Hinglish, 50+ accents, TRAI compliance, DND) lags Bolna's 2+ year head start.	Loses India enterprise deals in tier-2/3 cities and vernacular-heavy sectors (rural BFSI, agri-fintech)	HIGH
Integration Ecosystem	No native CRM connectors. No Zapier/Make listed. API-only for most enterprise integrations.	Non-technical buyers cannot deploy without developer support. Blocks SMB/mid-market.	HIGH
Hydra Execution Risk	Full-duplex speech-to-speech announced but not production-ready. Speech-to-speech is a hard ML problem — GPT-4o Voice has shown its own limitations.	If Hydra is delayed, the roadmap premium embedded in their valuation deflates.	HIGH
Outbound Campaign Infrastructure	No mention of batch calling campaigns, concurrent call tiers, or campaign management UI. Bolna does 200K calls/day with campaign tools.	Misses the large SMB outbound use case (sales, collections, reminders) — biggest call volume segment in India.	MED
Model Maintenance Burden	3 proprietary models (Lightning, Electron, Pulse) to maintain + Hydra in development. Each needs continuous retraining as data and use cases evolve.	Engineering bandwidth gets consumed by model upkeep rather than product features. Bolna can ship faster at the application layer.	MED
Multi-Channel (Email, Chat, Social)	Website mentions "voice, email, chat, social" but actual product is voice-first. Channel breadth appears aspirational.	Credibility gap if enterprise evaluators test omnichannel claims and find voice-only depth.	LOW

This is where Smallest diverges sharply from Bolna. You cannot replicate Smallest by reading a GitHub repo. The proprietary models require ML research talent, compute, and curated speech data that take years — not months — to accumulate.

Exhibit F — Replication Effort by Layer (3-person engineering team + 2 ML researchers)

Layer	Effort Visualization	Time Est.	Hardest Part
Basic agent pipeline ASR + LLM + TTS (3rd party)	2 weeks	2 wks	Nothing hard
Atoms-style graph Multi-node orchestration	5–8 wks	5–8 wks	Edge case handling
TTS model (match quality) Like Lightning V2	6–12 months · 100K+ hrs speech data	6–12 mo	Training data + VRAM optimization
SLM (match Electron V2) Voice-optimized reasoning	9–18 months · PhD-level work	9–18 mo	Benchmarking + hallucination control
ASR (Pulse-quality) 36 language streaming	8–14 months · multilingual data	8–14 mo	Code-switching training data
On-prem + compliance ISO 27001, HIPAA, air-gap	4–6 months	4–6 mo	Audit + certification cost
Speech-to-speech (Hydra) Full duplex multimodal	18–36 months · frontier ML research	18–36 mo	No proven blueprint exists yet

Total to replicate: 24–36 months for a well-funded team (5+ people including 2 senior ML researchers). Unlike Bolna, there is no open-source shortcut. The models are the product.

Exhibit G — Risk Matrix: Smallest.ai's Existential Threats

← Impact →

High Impact / High Likelihood

OpenAI releases cheap S2S Hydra ships late Engineering bottleneck from 3-model upkeep

High Impact / Low Likelihood

Google enters India voice AI ElevenLabs acquires Retell TRAI bans automated outbound calls

Low Impact / High Likelihood

Bolna launches no-code builder (hurts conversion) Competitor copies TTS architecture Pricing pressure from Vapi/Bland

Low Impact / Low Likelihood

Key researcher leaves On-prem customer security breach

← Likelihood →

The top-left quadrant is the most critical: if OpenAI's Realtime API (or similar) becomes cheap enough, it may commoditize the SLM advantage that Electron V2 provides. Hydra must ship before that window closes.

Final PM Verdict — Smallest.ai

Replication Time

30mo

For a well-funded 5-person team including 2 ML researchers. No open-source shortcut. The models ARE the product. This is a moat in time.

Defensibility Horizon

5yr

Deeper than Bolna's 3-year window. If Hydra ships and Electron fine-tuning compounds per enterprise customer, this becomes a 5–7 year structural advantage.

The Bull Case

Smallest.ai is building the infrastructure layer of Indian voice AI — not the application. They own TTS, ASR, and LLM simultaneously, which means their cost structure at scale is fundamentally better than every orchestration-based competitor. If Hydra ships, they leapfrog the entire ASR→LLM→TTS latency chain and become the only real-time speech-to-speech platform in production. The 20× cost reduction they've already demonstrated (from $0.20 to $0.01/min) is the template for what happens next.

The Bear Case

Full-stack ownership means full-stack maintenance burden. Three proprietary models to continuously retrain, benchmark, and defend — while Bolna ships product features twice as fast by composing APIs. The traction gap is real: Bolna has 1,050 named customers and 200K calls/day; Smallest says "millions of calls/month" with no customer names. In enterprise sales, social proof is a product feature. The clock is ticking: OpenAI Realtime, Gemini Live, and similar products are making the SLM advantage narrower every quarter.

Smallest.ai
Deep Dive

Core Claim

Architecture Type

Positioning

Replication Time

Defensibility Horizon

The Bull Case

The Bear Case

Smallest.aiDeep Dive

Core Claim

Architecture Type

Positioning

Replication Time

Defensibility Horizon

The Bull Case

The Bear Case

Smallest.ai
Deep Dive