Databricks Community

anupkalburgi · ‎06-10-2025

As organizations become more data-driven, there’s growing pressure to bring powerful data and AI capabilities directly to the people making business decisions. This means enabling non-technical users to interact with curated data, run complex logic, and trigger workflows through intuitive, purpose-built applications.

Common examples include:

Audit and compliance workflows that let users review large-scale transactional data, apply business rules, and manage mapping tables (e.g., account classifications, thresholds). Editable dimensions, approvals, and audit trails ensure regulatory compliance and integrity.
Scenario modeling for credit and market risk that enables teams to tweak assumptions and compare outputs across massive datasets, supporting what-if analysis without needing to write SQL or touch raw data.
Document intake and GenAI processing pipelines that process user-submitted documents in real time, apply validation rules, extract structured data, and trigger LLM-based summarization or redaction—all while maintaining audit trails.

Traditional BI tools fall short in these cases. They don’t allow users to modify dimension tables, launch jobs, or coordinate multi-user workflows. What’s needed is a scalable, flexible application framework that integrates deeply with the Databricks Lakehouse, which offers the performance, governance, and openness required for enterprise-grade solutions.

In this post, we’ll walk through a modern 3-tier architecture built on:

React for a responsive, user-friendly frontend
A FastAPI backend as an asynchronous intermediary for handling business logic and APIs
The Databricks platform with secure, scalable data processing and workflow management, and built-in intelligence for governance, observability, and automation.

We’ll explore how these components work together to deliver secure, interactive, and scalable data applications, bringing the full power of Databricks to business users who drive real outcomes.

Why 3-Tier Architecture?

This 3-tier architecture cleanly separates responsibilities across the frontend, backend, and data platform—enabling each layer to scale independently, be maintained in isolation, and contribute to a more secure, performant, and cost-efficient solution.

Factor	React	FastAPI	Databricks
Separation of Concerns	Handles user input, triggers jobs, and displays results through interactive UIs	Acts as the API layer: validates requests, translates them into Spark/SQL via Databricks Connect, formats responses, and manages integrations	Executes complex workflows and data transformations with low management overhead
Tech	React with component-based design; state management via useState, useContext, Redux, etc.	FastAPI—async-ready with Pydantic validation and auto-generated OpenAPI docs	Databricks Serverless, Delta Lake, Unity Catalog, and Databricks Connect for secure communication with the backend
Scalability	Static frontend assets served via CDN	Scales horizontally with load	Auto-scales compute up or down (even to zero) based on workload demand; ideal for bursty or interactive workloads
Enhanced Security & Governance	Minimal exposure; no direct access to data	Databricks tokens represent user identity; acts as a secure intermediary	Unity Catalog enforces fine-grained access control and governs data usage via Databricks Connect
Maintainability & Flexibility	UI updates isolated from backend and compute layers	Modular endpoints; decouples frontend from data logic	Logic is centralized at the data layer using PySpark or SQL; easily reused across workflows
Cost Efficiency	Low hosting cost for static files	Lightweight server footprint, low infra overhead	Pay-per-use Serverless compute scales to zero, eliminating the need for persistent clusters and ensuring charges only for execution time

Sample Application and Component Walkthrough

An example application that demonstrates all the components described above is provided in this GitHub project: db-connect-webapp

This simple application provides a React.js WebUI interface that allows a user to pick a Delta table in a Databricks Unity Catalog schema (“database”) and interactively query and filter results from it.

This example demonstrates the concepts of:

A front-end application connecting to the FastAPI service using API calls (frontend/app/lib/api.ts)
Custom application logic is coded in the FastAPI service (backend/app/main.py)
The FastAPI service uses Databricks Connect to authenticate to the Databricks data platform by importing backend/app/DataSource.py
Data platform services provided by the FastAPI application are covered by Unit Tests in backend/app/tests. These unit tests cover the specifics of the transformation logic and data operations of the application.

Running the example application

Backend Prerequisites

Access to a Databricks workspace is required. Authentication via an M2M Service Principal or user PAT token must be available.
The pre-installed Databricks samples.nyctaxi catalog schema can be used for testing, or grant access to the above Service Principal for a different data-set.

Start the Backend services.

Check the Backend Configure and Run section in the main README file. It provides instructions on setting up environment variables that specify the Databricks authentication details.
Use the backend/run.sh script to start the FastAPI service with a backend connection to Databricks

Frontend Prerequisites

Check the Frontend Configure and Run section in the main README.md file. This provides instructions on installing the web server dependencies (npm or yarn)
Follow the README.md instructions for setting environment variables to specify the Backend API service URL and the Databricks data Catalog and Schema (eg. samples, nyctaxi)

Start the Frontend services.

Use the frontend/run.sh script to start the Node.js web services.

The default configuration starts a web-server process listening to the local host address on port 3000 (http://localhost:3000).

Extending the application

The concepts of the example web application can be extended by introducing a customizable middle API tier. This architectural choice offers significant flexibility beyond a simple web query interface. Within this API layer, leveraging the Databricks data platform interface via the Spark DataFrame API is key. This empowers developers to easily implement complex data transformations, dynamic business logic, and robust unit testing. Furthermore, it directly unlocks more sophisticated operations, such as invoking registered machine learning models and integrating with Large Language Models (LLMs).

Advantages of Spark DataFrame API with Databricks Connect in UI-Driven Applications

When building interactive UI-driven data applications, leveraging the Spark DataFrame API via Databricks Connect provides distinct advantages over traditional SQL interfaces, especially for complex logic, dynamic transformations, and robust testability.

Dynamic Transformations Based on User Inputs

In user-centric applications, backend queries often need to adapt dynamically to user-selected filters, columns, or metrics. Consider a scenario where new columns need to be dynamically added based on specific user selections, such as flagging certain data elements. The DataFrame API handles these cases gracefully:

# Dynamically add columns based on user input
for col in selected_columns:
    df = df.withColumn(f"{col}_flag", df[col] > 0)

Such dynamic manipulations are cumbersome to implement and maintain with traditional SQL queries.

Expressing Complex Business Logic Clearly

Many business rules involve conditional logic, window functions, or complex joins. DataFrame API simplifies these implementations significantly:

from pyspark.sql.functions import when, col
df = df.withColumn("category", when(col("score") > 90, "Gold").otherwise("Standard"))

Equivalent SQL queries often become nested and challenging to debug or understand clearly, reducing maintainability.

Enhanced Testability and Validation

Writing DataFrame transformations as modules allows for robust unit testing of logic through PySpark testing suites and local Spark sessions:

def add_category_column(df):
    return df.withColumn("category", when(df["score"] > 90, "Gold").otherwise("Standard"))
# Unit test with a small DataFrame
test_df = spark.createDataFrame([(1, 95), (2, 70)], ["id", "score"])
result_df = add_category_column(test_df)

SQL-based transformations, by contrast, typically require extensive integration testing with full execution contexts, complicating validation and compliance certification.

Deployment Options

Follow your organization's established deployment practices and infrastructure guidelines when deploying applications. For quick deployment and end-to-end testing, deploy the application to Databricks Apps using the steps below.

To prepare and deploy, run databricks_app_build.sh in the root folder. This script performs the following actions

Build frontend assets and organize the deployment directory with all necessary components.
Deploy the application to Databricks Apps
Set environment variables required for the application
Start the application using Uvicorn with four worker processes

For a comprehensive understanding of Databricks Apps and their capabilities, including features, benefits, and integration with the Lakehouse Architecture, refer to the official Databricks blog post on Databricks Apps.

Conclusion

Combining a dynamic React frontend, an efficient FastAPI backend, and the power of Databricks Serverless Compute via Databricks Connect creates a modern, scalable, and cost-effective architecture for interactive data applications. This 3-tier setup provides excellent separation of concerns, robust security, and leverages the auto-scaling, pay-per-use benefits of Serverless compute, reducing management overhead significantly.

Check out the code repository here, experiment, and see how this architecture can accelerate your data application development!

References

Introduction to Databricks Connect - https://www.databricks.com/blog/2019/06/14/databricks-connect-bringing-the-capabilities-of-hosted-ap...
Databricks Connect - https://docs.databricks.com/aws/en/dev-tools/databricks-connect/
Databricks Apps - https://docs.databricks.com/aws/en/dev-tools/databricks-apps

Databricks Community

Building Interactive Web Applications with Databricks Connect

Why 3-Tier Architecture?

Sample Application and Component Walkthrough

Running the example application

Extending the application

Advantages of Spark DataFrame API with Databricks Connect in UI-Driven Applications

Dynamic Transformations Based on User Inputs

Expressing Complex Business Logic Clearly

Enhanced Testability and Validation

Deployment Options

To prepare and deploy, run databricks_app_build.sh in the root folder. This script performs the following actions

Conclusion

Metadata-Driven ETL Framework in Databricks (Part-1)

Top 10 query performance tuning tips for Databricks Serverless SQL

Best practices for safe data experimentation with Databricks