Databricks Community

mtilgner

Django and Databricks Apps

Did you know that you can run Django on Databricks Apps? If you’re a Django developer looking for a simple way to host your apps and bring them closer to your data, you've found the right post. Developers are already running Flask, Streamlit, Dash, Gradio, and other Python frameworks on Databricks Apps. So why not Django? After all, it’s one of the most popular and well-established Python web frameworks.

Before we get into the how, though, let’s first tackle the why. Databricks Apps is a great place to host Django apps for several reasons. It:

provides security and governance out of the box
integrates seamlessly with the Databricks Platform, especially Lakebase, Databricks’ managed Postgres offering
lets you deploy apps in seconds

In this blog, we show you how to deploy a Django application on Databricks Apps and connect it to a Lakebase PostgreSQL database. We use a simple to-do management app as an example.

Complementing this blog, we provide an extensible template for deploying Django on Databricks Apps with Lakebase on GitHub.

Django and Lakebase: the dream team

One aspect that makes Django and Databricks Apps a great fit is the combination of Django’s object-relational mapper (ORM) and Lakebase. Django’s ORM is one of its most powerful features. As a developer, it lets you create, modify, and query databases in a Pythonic way.

Lakebase, on the other hand, is the managed PostgreSQL offering of Databricks. Its innovative architecture separates storage from compute, which lets you start up and scale faster than ever before. Plus, it has advanced capabilities such as scale-to-zero, branching, and instant restore. Learn more about Lakebase here.

A brief aside: When it comes to testing Django migrations, Lakebase’s copy-on-write branching is a game changer letting you branch off from your Lakebase project’s main branch in seconds. Each branch is an isolated, disposable Postgres environment that enables you to test model/schema changes in Django against real data.

The upshot? Breaking changes won’t blow up your staging or production environment because you can catch them during development. Just create a branch, run your code against it, and discard it when done. This simplifies the app development lifecycle by making experimentation safe and simple.

Securely connecting to Lakebase

With all this talk about Lakebase, let’s get into how we actually set it up as our app’s database backend. In the snippet below, we configure the lakebase engine, which extends Django's built-in PostgreSQL backend:

# config/settings.py
DATABASES = {
    "default": {
        "ENGINE": "lakebase",
        "NAME": os.environ.get("PGDATABASE", ""),
        "USER": os.environ.get("PGUSER", ""),
        "HOST": os.environ.get("PGHOST", ""),
        "PORT": os.environ.get("PGPORT", ""),
        "OPTIONS": {
            "sslmode": os.environ.get("PGSSLMODE", "require"),
            "options": f"-c search_path={get_schema_name()}",
        },
    }
}

We connect to Lakebase using our Databricks App’s dedicated service principal. This is known as app authorization. With the help of the Databricks SDK’s WorkspaceClient, app authorization is just a single line of code; there’s no need to manage service principal client IDs or secrets.

How it works is simple: we use the SDK to mint a short-lived, down-scoped OAuth token, which we inject as a password into our connection parameters. This lets us securely connect to our Lakebase backend. The token itself is cached and refreshed when it’s within 30 seconds of expiring.

# lakebase/base.py
def _get_token(endpoint):
    """Return a cached OAuth token, refreshing only when near expiry."""
    
    # ...omitted for brevity...

       credential = _get_workspace_client()\
.postgres.generate_database_credential(endpoint=endpoint)
        _cached_token = credential.token
        # expire_time is a protobuf Timestamp; .seconds is Unix epoch seconds.
        _token_expires_at = float(credential.expire_time.seconds)
        return _cached_token

class DatabaseWrapper(pg_base.DatabaseWrapper):
    """PostgreSQL backend that authenticates to Lakebase via OAuth."""

    def __init__(self, settings_dict, alias=DEFAULT_DB_ALIAS):
        super().__init__(settings_dict, alias)
        self._endpoint = os.environ.get("PGENDPOINT", "")

    def get_connection_params(self):
        """Inject a cached OAuth token as the connection password."""
        params = super().get_connection_params()
        params["password"] = _get_token(self._endpoint)
        return params

Using the Django ORM with Lakebase

Having wired up the database to our app, we can let ORM work its magic. Suppose we add a new task to our list. A task is just a Python class with a task text field, a completed boolean, and a created_at datetime. When a user submits a form to add a task, the corresponding view validates the form and calls form.save().

Django translates this call into an “INSERT INTO…VALUES…” statement and sends it to Lakebase using the engine we configured in the previous section. The lines of SQL written to accomplish this? Zero. The same applies to querying, updating, and deleting tasks in our database.

# todos/views.py
@require_POST
def add(request):
    """Add a new todo."""
    form = TodoForm(request.POST)
    if form.is_valid():
        form.save()
        messages.success(request, "Todo added successfully!")
    else:
        messages.error(request, "Please enter a task.")
    return redirect("todos:index")

Deploying the app with Gunicorn

Hold on! We still actually need to deploy our app to Databricks Apps. To do this, we use a short “entrypoint.sh” script. Databricks Apps normally ask for a single command to start an app - the command array in our “app.yaml”.

But Django needs a few more commands before it can start handling requests, e.g., collecting static files, ensuring the database schema exists, and running migrations. Rather than squeezing everything into “app.yaml”, we collect all commands in “entrypoint.sh” and run this script on startup.

#!/bin/bash
set -e
python manage.py collectstatic --noinput
python manage.py ensure_schema
python manage.py migrate --noinput
PORT="${DATABRICKS_APP_PORT:-8000}"
exec gunicorn config.wsgi:application --bind "0.0.0.0:$PORT"

To shorten app startup time and follow the principle of least privilege, you could also run some of these commands, e.g., migrations, with a different service principal in your CI/CD pipeline. Then, you can give your dedicated app service principal a restricted set of permissions on your database.

Note that we are using gunicorn with synchronous workers for this template, a standard Django deployment model. We chose gunicorn because all our views are synchronous, so there isn’t much benefit to an async model. Moreover, it makes the behavior of the connection pool predictable.

To start the app, we bind to host 0.0.0.0 and listen on the DATABRICKS_APP_PORT, which is set by the Databricks runtime.

Static files and templating

Before we wrap up, let’s briefly discuss how we deliver the app’s frontend. Our app does this in two ways: static file serving and server-side rendering of HTML templates. Django’s built-in admin portal, for instance, comes with its own CSS and JS as static files.

The collectstatic command gathers these files and places them in the staticfiles/ directory at the project root. The WhiteNoise middleware then allows our app to serve these files directly from the directory, routing them through gunicorn.

For the todo app, which we kept deliberately simple, there are no static files to manage. Every action is just a standard HTML form of type <form method="POST">. That means when a user sends a request to the todo app, Django loads the template todos/templates/todos/index.html, renders it with the required data, and returns the resulting HTML as the HTTP response body. Moreover, since the todo app inlines its CSS in the HTML template, Django delivers the app to the client’s browser as a single response.

Conclusion

That’s our quick primer on deploying a Django application on Databricks Apps with Lakebase! Now, it’s your turn: go ahead and give this template a spin and let us know if there’s anything else you’d like us to cover.

We hope you’ll see how easy it is to get started with your favorite frameworks on Databricks Apps, bringing your applications closer to your data. With Lakebase, you also get a fully managed, autoscaling Postgres database built for the data and AI era so that you can build and ship apps faster than ever before.

We can’t wait to see what you build!

Databricks Community

Deploying Django apps on Databricks Apps with Lakebase

Django and Databricks Apps

Django and Lakebase: the dream team

Securely connecting to Lakebase

Using the Django ORM with Lakebase

Deploying the app with Gunicorn

Static files and templating

Conclusion

Metadata-Driven ETL Framework in Databricks (Part-1)

Top 10 query performance tuning tips for Databricks Serverless SQL

Best practices for safe data experimentation with Databricks