cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
anupkalburgi
Databricks Employee
Databricks Employee

As organizations become more data-driven, there’s growing pressure to bring powerful data and AI capabilities directly to the people making business decisions. This means enabling non-technical users to interact with curated data, run complex logic, and trigger workflows through intuitive, purpose-built applications. 

Common examples include: 

  • Audit and compliance workflows that let users review large-scale transactional data, apply business rules, and manage mapping tables (e.g., account classifications, thresholds). Editable dimensions, approvals, and audit trails ensure regulatory compliance and integrity.
  • Scenario modeling for credit and market risk that enables teams to tweak assumptions and compare outputs across massive datasets, supporting what-if analysis without needing to write SQL or touch raw data.
  • Document intake and GenAI processing pipelines that process user-submitted documents in real time, apply validation rules, extract structured data, and trigger LLM-based summarization or redaction—all while maintaining audit trails.

Traditional BI tools fall short in these cases. They don’t allow users to modify dimension tables, launch jobs, or coordinate multi-user workflows. What’s needed is a scalable, flexible application framework that integrates deeply with the Databricks Lakehouse, which offers the performance, governance, and openness required for enterprise-grade solutions.

In this post, we’ll walk through a modern 3-tier architecture built on:

  • React for a responsive, user-friendly frontend
  • A FastAPI backend as an asynchronous intermediary for handling business logic and APIs
  • The Databricks platform with secure, scalable data processing and workflow management, and built-in intelligence for governance, observability, and automation.

anupkalburgi_1-1748874898128.png

We’ll explore how these components work together to deliver secure, interactive, and scalable data applications, bringing the full power of Databricks to business users who drive real outcomes.

Why 3-Tier Architecture?

This 3-tier architecture cleanly separates responsibilities across the frontend, backend, and data platform—enabling each layer to scale independently, be maintained in isolation, and contribute to a more secure, performant, and cost-efficient solution.

Factor

React

FastAPI

Databricks

Separation of Concerns

Handles user input, triggers jobs, and displays results through interactive UIs

Acts as the API layer: validates requests, translates them into Spark/SQL via Databricks Connect, formats responses, and manages integrations

Executes complex workflows and data transformations with low management overhead

Tech

React with component-based design; state management via useState, useContext, Redux, etc.

FastAPI—async-ready with Pydantic validation and auto-generated OpenAPI docs

Databricks Serverless, Delta Lake, Unity Catalog, and Databricks Connect for secure communication with the backend

Scalability

Static frontend assets served via CDN

Scales horizontally with load

Auto-scales compute up or down (even to zero) based on workload demand; ideal for bursty or interactive workloads

Enhanced Security & Governance

Minimal exposure; no direct access to data

Databricks tokens represent user identity; acts as a secure intermediary

Unity Catalog enforces fine-grained access control and governs data usage via Databricks Connect

Maintainability & Flexibility

UI updates isolated from backend and compute layers

Modular endpoints; decouples frontend from data logic

Logic is centralized at the data layer using PySpark or SQL; easily reused across workflows

Cost Efficiency

Low hosting cost for static files

Lightweight server footprint, low infra overhead

Pay-per-use Serverless compute scales to zero, eliminating the need for persistent clusters and ensuring charges only for execution time 

Sample Application and Component Walkthrough

An example application that demonstrates all the components described above is provided in this GitHub project: db-connect-webapp

This simple application provides a React.js WebUI interface that allows a user to pick a Delta table in a Databricks Unity Catalog schema (“database”) and interactively query and filter results from it. 

 

This example demonstrates the concepts of: 

  • A front-end application connecting to the FastAPI service using API calls (frontend/app/lib/api.ts)
  • Custom application logic is coded in the FastAPI service (backend/app/main.py) 
  • The FastAPI service uses Databricks Connect to authenticate to the Databricks data platform by importing backend/app/DataSource.py
  • Data platform services provided by the FastAPI application are covered by Unit Tests in backend/app/tests. These unit tests cover the specifics of the transformation logic and data operations of the application.

Running the example application

  • Backend Prerequisites
    • Access to a Databricks workspace is required. Authentication via an M2M Service Principal or user PAT token must be available.
    • The pre-installed Databricks samples.nyctaxi catalog schema can be used for testing, or grant access to the above Service Principal for a different data-set. 
  • Start the Backend services.  
    • Check the Backend Configure and Run section in the main README file. It provides instructions on setting up environment variables that specify the Databricks authentication details.
    • Use the backend/run.sh script to start the FastAPI service with a backend connection to Databricks
  • Frontend Prerequisites 
    • Check the Frontend Configure and Run section in the main README.md file.  This provides instructions on installing the web server dependencies (npm or yarn) 
    • Follow the README.md instructions for setting environment variables to specify the Backend API service URL and the Databricks data Catalog and Schema (eg. samples, nyctaxi)
  • Start the Frontend services.  
    • Use the frontend/run.sh script to start the Node.js web services.

The default configuration starts a web-server process listening to the local host address on port 3000 (http://localhost:3000).

anupkalburgi_2-1748875012351.png

Extending the application

The concepts of the example web application can be extended by introducing a customizable middle API tier. This architectural choice offers significant flexibility beyond a simple web query interface. Within this API layer, leveraging the Databricks data platform interface via the Spark DataFrame API is key. This empowers developers to easily implement complex data transformations, dynamic business logic, and robust unit testing. Furthermore, it directly unlocks more sophisticated operations, such as invoking registered machine learning models and integrating with Large Language Models (LLMs).  

 

Advantages of Spark DataFrame API with Databricks Connect in UI-Driven Applications

When building interactive UI-driven data applications, leveraging the Spark DataFrame API via Databricks Connect provides distinct advantages over traditional SQL interfaces, especially for complex logic, dynamic transformations, and robust testability.

Dynamic Transformations Based on User Inputs

In user-centric applications, backend queries often need to adapt dynamically to user-selected filters, columns, or metrics. Consider a scenario where new columns need to be dynamically added based on specific user selections, such as flagging certain data elements. The DataFrame API handles these cases gracefully:

# Dynamically add columns based on user input
for col in selected_columns:
    df = df.withColumn(f"{col}_flag", df[col] > 0)

Such dynamic manipulations are cumbersome to implement and maintain with traditional SQL queries.

Expressing Complex Business Logic Clearly

Many business rules involve conditional logic, window functions, or complex joins. DataFrame API simplifies these implementations significantly:

from pyspark.sql.functions import when, col
df = df.withColumn("category", when(col("score") > 90, "Gold").otherwise("Standard"))

Equivalent SQL queries often become nested and challenging to debug or understand clearly, reducing maintainability.

Enhanced Testability and Validation

Writing DataFrame transformations as modules allows for robust unit testing of logic through PySpark testing suites and local Spark sessions:

def add_category_column(df):
    return df.withColumn("category", when(df["score"] > 90, "Gold").otherwise("Standard"))
# Unit test with a small DataFrame
test_df = spark.createDataFrame([(1, 95), (2, 70)], ["id", "score"])
result_df = add_category_column(test_df)

SQL-based transformations, by contrast, typically require extensive integration testing with full execution contexts, complicating validation and compliance certification.

Deployment Options

Follow your organization's established deployment practices and infrastructure guidelines when deploying applications. For quick deployment and end-to-end testing, deploy the application to Databricks Apps using the steps below.

  • To prepare and deploy, run databricks_app_build.sh in the root folder. This script performs the following actions
    • Build frontend assets and organize the deployment directory with all necessary components. 
    • Deploy the application to Databricks Apps
    • Set environment variables required for the application
    • Start the application using Uvicorn with four worker processes

For a comprehensive understanding of Databricks Apps and their capabilities, including features, benefits, and integration with the Lakehouse Architecture, refer to the official Databricks blog post on Databricks Apps.

Conclusion

Combining a dynamic React frontend, an efficient FastAPI backend, and the power of Databricks Serverless Compute via Databricks Connect creates a modern, scalable, and cost-effective architecture for interactive data applications. This 3-tier setup provides excellent separation of concerns, robust security, and leverages the auto-scaling, pay-per-use benefits of Serverless compute, reducing management overhead significantly.

Check out the code repository here, experiment, and see how this architecture can accelerate your data application development!

References

  1. Introduction to Databricks Connect - https://www.databricks.com/blog/2019/06/14/databricks-connect-bringing-the-capabilities-of-hosted-ap...
  2. Databricks Connect - https://docs.databricks.com/aws/en/dev-tools/databricks-connect/
  3. Databricks Apps - https://docs.databricks.com/aws/en/dev-tools/databricks-apps