Databricks Community

vedanthv · 4 hours ago

Hi Everyone!

This is my official submission for DAIS 2026 Community Virtual Contest!

Deal2Delivery: How I Built an End-to-End AI Sales Intelligence Platform on Databricks

Every sales team has the same nightmare: a deal closes, and then nobody knows if the product can actually ship
on time. Sales lives in Salesforce. Supply lives in SAP. And the gap between them? That's where revenue quietly
leaks.

I built Deal2Delivery to close that gap - a full Lakehouse AI platform that takes raw enterprise data from SAP
HANA and Salesforce and turns it into actionable intelligence: which customers are about to churn, which
products are running short, and where the next demand spike is coming from. Here's how I built it, layer by
layer.

Live Demo : https://deal2delivery.vercel.app/

The Data Foundation: Bronze to Gold in Three Layers

I started with a medallion architecture on Databricks Unity Catalog - three catalogs for dev, staging, and
prod, each with clean schema separation.

Bronze lands five raw tables straight from SAP HANA and Salesforce: customer master (KNA1), sales orders (VBAK, VBAP), customer interactions (ZCUST_INTERACTIONS), and live inventory positions (MARD_STOCK). No
transformations, no assumptions - just the raw truth from the source systems.

Silver is where the business logic lives. I used Lakeflow Declarative Pipelines (DLT) on serverless compute to
build five clean, joined datasets: dim_customer_unified, fact_sap_orders, fact_customer_interactions,
fact_opportunity, and fact_case. DLT handles schema evolution, data quality expectations, and lineage
automatically - I write the transformation logic, Databricks handles the rest.

Gold is the presentation layer - eight analytical views purpose-built for business consumption:
- gold_customer_360 - a single view of every customer's orders, interactions, and health score
- gold_sales_to_fulfillment_pipeline - maps every open deal to its supply chain status
- gold_demand_vs_supply_gap - the crown jewel: surfaces exactly where demand will outpace inventory
- gold_product_demand_forecast, gold_customer_engagement_360, and three metrics views for sales performance,product trends, and customer health.

Three ML Models, All in Production

This is where it gets interesting. I didn't train one model and call it a day - I shipped three, all tracked in
MLflow, all registered in Unity Catalog under a @Champion alias so the serving layer always pulls the latest
validated version.

XGBoost Churn Model (v2): I moved beyond simple recency-based churn labels. The model uses a composite
behavioral label built from order frequency drops, interaction decay, and case escalation patterns. Optuna runs
15-trial hyperparameter tuning with 5-fold cross-validation. The predictions land in a churn_predictions table
that feeds the Customer Risk page in real time.

XGBoost Demand Forecast: Lag features across 3, 6, and 12-month windows give the model memory of seasonality
and growth trends. It projects six months forward per SKU per region and writes to demand_forecast_predictions
- the same table powering the supply gap analysis in Gold.

K-Means Customer Segmentation: I ran RFM (Recency, Frequency, Monetary) clustering and landed on five segments
- Champions, Loyal, At-Risk, Hibernating, and Prospects - with silhouette score tracked as a model quality
metric. Every customer now has a segment label that unlocks personalized recommendations in the dashboard.

Genie AI/BI: Business Users Ask, Databricks Answers

I connected all eight Gold views to a Genie Space so business users can query their data in plain English - no
SQL, no analyst bottleneck. But I didn't just wire it up and hope for the best. I built an LLM-as-a-Judge
evaluation loop with seven scorers to measure answer quality, and used Claude Opus to remediate low-scoring
responses and improve the Genie instructions iteratively. The result is a Genie space that actually answers
business questions reliably.

The Product: Six Pages, Built for Action

The front-end is a Next.js 14 app deployed on Vercel, talking to Databricks SQL via eight API routes. I use ISR
(Incremental Static Regeneration) at 300 seconds combined with Databricks SQL Result Cache at 24 hours - so
the app feels instant without hammering the warehouse.

- Demand Forecast - 6-month forward projections with product and region filters
- Simulator - adjust sales assumptions and instantly see the impact on supply gaps
- Inventory - live stock positions mapped against forecast demand
- Customer Risk - churn probability, segment label, and an OpenAI GPT-4o generated explainer per customer
- Dashboard - KPIs, revenue trends, and sales pipeline health at a glance
- About - architecture overview for stakeholders

Every customer risk card includes a GPT-4o insight strip that explains why that customer is at risk in plain
English - not just a score, but a narrative a sales rep can act on immediately.

CI/CD and Governance

I deployed everything through Databricks Asset Bundles with a full three-environment CI/CD pipeline on GitHub
Actions:

- Push to develop -> auto-deploys to dev
- Merge to main -> auto-deploys to staging
- Production -> manual trigger with required reviewer approval

Every environment gets its own Unity Catalog, its own pipeline, and its own job configuration - parameterized
through the DAB target system. No copy-paste deployments, no environment drift.

Deal2Delivery isn't a proof-of-concept - it's a blueprint for how modern enterprises should think about data.
SAP and Salesforce hold the raw truth of your business. Databricks turns that truth into predictions. And a
well-designed front-end puts those predictions in the hands of the people who can act on them.

Tech Stack

Ingestion
- SAP HANA + Salesforce - source systems for orders, customers, inventory, and interactions

Pipeline
- Bronze - 5 raw Unity Catalog tables, no transformations
- Silver - Lakeflow Declarative Pipelines (DLT) on serverless compute
- Gold - 8 Databricks SQL analytical views

Machine Learning
- XGBoost Churn Model v2 - Optuna 15-trial HPO, 5-fold CV, composite behavioral label
- XGBoost Demand Forecast - lag features, 6-month forward projections per SKU
- K-Means RFM Segmentation - 5 customer tiers, silhouette score tracked

Model Governance
- MLflow experiment tracking + Unity Catalog Model Registry
- @Champion alias on every model for safe promotion

AI/BI

- Reporting Dashboards
- Databricks Genie Space - natural language queries over all 8 Gold views
- LLM-as-a-Judge evaluation loop - 7 scorers, Claude Opus remediation

Frontend & API
- Next.js 14 on Vercel - 6 pages (Dashboard, Forecast, Simulator, Inventory, Customer Risk, About)
- 8 Databricks SQL API routes with ISR (300s) + SQL Result Cache (24h)
- OpenAI GPT-4o - per-customer churn explainers + KPI insight strips

CI/CD
- Databricks Asset Bundles - dev, staging, prod environments
- GitHub Actions - develop->dev auto, main->staging auto, prod manual with approval

The gap between deal and delivery is a data problem. And data problems are exactly what Databricks was built to
solve.

Love to receive feedback on other features I can add or improve!

Thanks,

Vedanth