In one of the recent issue of The Batch, Andrew Ng argues that voice UIs will become as ubiquitous as the mouse and touchscreen - and highlights Vocal Bridge, an agentic voice platform, as an example of the infrastructure making this possible. To demonstrate, he added a Vocal Bridge voice agent to a math-quiz app he built for his daughter in under an hour.
Reading his article motivated me to build a quick prototype on Databricks using Vocal Bridge, Databricks Apps, Foundation Models, and Lakebase.
This post describes VoiceInsight - a reference implementation I built that closes that gap by integrating Vocal Bridge, Databricks Apps, and Lakebase into a unified voice-to-insight pipeline.
Architecture overview:
Vocal Bridge handles real-time audio capture and transcription via a LiveKit WebRTC data channel, abstracting STT infrastructure entirely
A FastAPI backend deployed as a Databricks App orchestrates token exchange, LLM inference, and session persistence
Gemma 3 12B, served via Databricks Foundation Model API, performs structured analysis across four modes: summarization, Q&A, entity extraction, and content generation
Every session is written to Lakebase (managed PostgreSQL on Databricks) - queryable, auditable, and Unity Catalog governed
The entire implementation was developed using Claude Code as the coding agent, paired with the Databricks ai-dev-kit - reducing time-to-deployment to just a few hours.
Applicable domains include meeting intelligence, field data collection, customer support analytics, regulatory compliance, and conversational data exploration.
Full architecture, implementation notes, and lessons learned on Medium: https://lnkd.in/gsf3yyAU