cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
epandya
Databricks Employee
Databricks Employee

[Written and tested by James Broadhead, Engineering Manager; Edited and published by Evan Pandya, Product Marketing Manager] 

Intro

Let's say your team has analytics data in a SQL Warehouse and leadership wants more than a dashboard. They want an app where ops reps can look up bookings, flag issues, add notes, and ask questions in plain English. You could wire up Express, a React app, SQL connectors, a Postgres pool, caching, SSE, and auth from scratch. Or you could use AppKit with your favorite coding agent and have it running by lunch.

This post builds a real operations app using `samples.wanderbricks` - a vacation rental marketplace dataset that ships with every Databricks workspace. It has 16 tables covering bookings, users, properties, payments, reviews, destinations, clickstream, and more.

All code snippets live in the git repo. Note: you don't need to clone the repo - we will curl individual files as we go.

What we're building

A single-page app with four capabilities:

  • Revenue by destination — analytics chart and table, queried from a SQL Warehouse
  • Booking manager — ops reps look up a booking and see guest details, powered by a SQL Warehouse query
  • User actions — reps can flag bookings for review and add notes. These actions are persisted in Lakebase.
  • Conversational AI — a Genie chat panel where users ask questions about the data in plain English

This goes beyond what a standard dashboard can do. Dashboards display data. This app lets people act on it — flag a suspicious booking, leave a note for the next shift, ask a follow-up question in natural language.

Architecture

Screenshot 2026-04-29 at 2.42.35 PM.png

Since your analytics data already lives in the warehouse, you don't need to copy it anywhere. However, apps need to write too - flags, notes, status changes. That's where Lakebase comes in. Not as a mirror of your warehouse data, but a fully-managed serverless Postgres database that can store your app's state.

  • SQL Warehouse handles all reads - revenue aggregations, booking lookups, everything that queries `samples.wanderbricks`
  • Lakebase handles writes - only the data the app creates: booking flags and notes. The server auto-creates these tables on first startup.
  • Genie provides conversational AI over the same warehouse data

The fast path: hand it to a coding agent

If you use an AI coding assistant (Cursor, Copilot, Claude Code, Genie Code, etc.), you can skip the manual walkthrough entirely:

```bash

# install AI skills; this command will no longer be experimental in future databricks versions

databricks experimental aitools install 

curl -sL https://github.com/databricks/devhub/blob/main/examples/vacation-rentals/blog-post-snippets/setup/ag... -o agent_prompt.md 

claude> “Follow the instructions in agent_prompt.md”   # this is the last step! Grab a coffee!

```

The agent will detect your workspace, find or create a SQL Warehouse, set up a Genie space, scaffold the project, wire up all the plugins, deploy to Databricks, and print the URL of the running app. No manual steps required. View the prompt here.

The rest of this post walks through the same app by hand, which will be useful if you want to understand what each layer does or if you want to customize the agent's output.

Before you start

Before running through this walkthrough, confirm your local tools and Databricks workspace are ready. The fastest way:
curl -sL https://github.com/databricks/devhub/blob/main/examples/vacation-rentals/blog-post-snippets/setup/ve... | bash

This script probes everything that's pre-flight-checkable via the CLI and prints a green/red checklist. 

What it expects:

  • Databricks CLI is installed and logged in
  • jq, curl, bash
  • Node.js 20 or newer + npm

Workspace access

  • Permission to read samples.wanderbricks (the dataset must exist and be UC-shared to you)
  • Genie, Databricks Apps, and Lakebase enabled in your workspace
  • Permission to list SQL warehouses
  • If you have multiple workspaces configured with the CLI, the script will print instructions on how to select the right one. 

Note: Permissions the CLI cannot pre-check - the Databricks CLI has no "can I create X?" probe. The walkthrough creates a SQL warehouse (if you have none), a Genie space, and an App deployment - those will fail at create time if you lack the right entitlements. If that happens, the setup scripts print contextual error messages explaining the most common causes, and the troubleshooting section at the bottom of this post covers the rest.

Step 1: Scaffold the project

```bash

databricks apps init

```

The CLI walks you through naming your app and selecting plugins. Pick Analytics, Genie, and Lakebase when prompted.

```bash

cd wanderbricks-ops

npm install

```

You'll see this structure:

wanderbricks-ops/
├── server/
│   └── server.ts
├── client/
│   ├── src/
│   │   ├── main.tsx
│   │   └── App.tsx
│   └── vite.config.ts
├── config/
│   └── queries/
├── .env
└── package.json

Step 2: Configure your workspace connection

This script detects your host, finds (or creates) a SQL Warehouse, creates a Genie space, provisions a Lakebase Autoscaling Postgres project, and writes everything to .env:

curl -sL https://github.com/databricks/devhub/blob/main/examples/vacation-rentals/blog-post-snippets/setup/co...  | bash

Here's what it does:

  1. Reads DATABRICKS_HOST from your CLI auth config
  2. Picks the first Pro serverless warehouse it can find, preferring one that's already running. If you have none eligible, creates a serverless appkit-dev warehouse
  3. Creates a Genie space called "Wanderbricks" backed by the sample bookings, properties, destinations, and reviews tables
  4. Provisions a Lakebase Autoscaling project (appkit-dev) and discovers its default branch, primary endpoint, and database
  5. Writes everything to .env

After it runs, your .env will look like:

```

# Workspace identity                                                                                                 

DATABRICKS_HOST=https://my-workspace.cloud.databricks.com

# Resource IDs the runtime reads

DATABRICKS_WAREHOUSE_ID=abc123def456

DATABRICKS_GENIE_SPACE_ID=01ef234567890abc

LAKEBASE_ENDPOINT=projects/appkit-dev/branches/production/endpoints/primary

PGHOST=ep-xxxx.database.us-west-2.cloud.databricks.com

PGDATABASE=databricks_postgres                                                                                                    

# Used by 'databricks apps init --set' in Step 3 (not by the running app)

LAKEBASE_BRANCH=projects/appkit-dev/branches/production LAKEBASE_DATABASE=projects/appkit-dev/branches/production/databases/db-xxxx

```

Note: If you set DATABRICKS_CONFIG_PROFILE to pick a workspace earlier, the script also writes that line

That's the entire environment setup. No tables to copy, no pipelines to create, no connection strings to find.

Step 3: Set up the server

Pull down the server file:

curl -sL https://github.com/databricks/devhub/blob/main/examples/vacation-rentals/blog-post-snippets/server/s...
-o server/server.ts

It does three things: 1) registers the AppKit plugins, 2) auto-creates the booking_flags and booking_notes Lakebase tables on first start, and 3) adds five custom routes on top of the plugin-provided ones:

 

Route

Purpose

POST /api/bookings/:id/flag

Flag a booking for review

DELETE /api/bookings/:id/flag

Unflag

GET /api/bookings/:id/flag

Check flag status

POST /api/bookings/:id/notes

Add a note

GET /api/bookings/:id/notes

List notes (newest first)

The Lakebase queries use standard pg.Pool API with $1 placeholders. AppKit handles the Databricks-specific parts: OAuth token rotation, connection pooling, and OpenTelemetry instrumentation.

Step 4: Add the analytics queries

AppKit uses file-based SQL queries. Drop a `.sql` file in `config/queries/` and it becomes an API endpoint automatically.

```bash

curl -sL https://github.com/databricks/devhub/blob/main/examples/vacation-rentals/blog-post-snippets/config/q... \

  -o config/queries/revenue_by_destination.sql

curl -sL https://github.com/databricks/devhub/blob/main/examples/vacation-rentals/blog-post-snippets/config/q... \

  -o config/queries/booking_detail.sql

```

Both query `samples.wanderbricks` directly — no copying or syncing needed. A few things to notice:

  • `:paramName` placeholders are parameterized — no SQL injection risk
  • `-- @Param` annotations declare the parameter types (`DATE`, `NUMERIC`, `STRING`, etc.)
  • The filename is the query key — `bookingdetail.sql` maps to `"bookingdetail"` in your frontend code

Service principal vs. user context. The filename controls who runs the query:

 

File

Executes as

Cache scope

`revenue_by_destination.sql`

Service principal

Shared across all users

`revenue_by_destination.obo.sql`

Logged-in user

Per-user cache

For a shared dashboard, service principal queries (the default) are what you want — one cache entry serves everyone.

Step 5: Build the React frontend

Fetch all five client components:

```

for file in RevenueByDestination.tsx RevenueChart.tsx BookingManager.tsx WanderbricksChat.tsx App.tsx; do curl -sL "https://github.com/databricks/devhub/tree/main/examples/vacation-rentals/blog-post-snippets/client/s... ${file}"
-o "client/src/${file}" done

```

The five files together make up the UI:

Component: RevenueByDestination.tsx  
Purpose: Table of revenue by destination 
Notable hook: useAnalyticsQuery("revenue_by_destination", …) 
──────────────────────────────────────── 
Component: RevenueChart.tsx  
Purpose: Bar chart over the same query ──────────────────────────────────────── 
Component: BookingManager.tsx   
Purpose: Look up a booking, flag it, add notes 
Notable hook: useAnalyticsQuery("booking_detail", …) + fetch() to Lakebase routes ──────────────────────────────────────── 
Component: WanderbricksChat.tsx   
Purpose: Genie chat panel 
──────────────────────────────────────── 
Component: App.tsx 
Purpose: Two-column grid of the four panels (replaces the scaffolded App.tsx) 
Notable hook: – 

Step 6: Run it

```bash

npm run dev

```

This starts the AppKit backend (with hot-reload via `tsx watch`) and the Vite dev server with HMR. Open `http://localhost:8000` and you'll see your app querying live data from your SQL Warehouse, with flags and notes persisted in Lakebase.

Step 7: Deploy to Databricks

From inside wanderbricks-ops/:

```

databricks apps deploy

```

This runs the full pipeline: validates the project (build, typecheck, lint), uploads it to your workspace, and starts the app. 

Once the deploy finishes, fetch the running URL:

```

databricks apps get wanderbricks-ops --output json | jq -r '.url'

```

In production, AppKit serves the built React app as static files and connects to your workspace using the service principal configured in your app deployment. The platform auto-injects PGHOST, PGUSER, PGDATABASE, PGPORT, and PGSSLMODE from your postgres Apps resource — only LAKEBASE_ENDPOINT carries over from local .env.

Summary

In roughly 200 lines of application code, you built an app that:

  • Queries a SQL Warehouse for analytics with parameterized, cached SQL
  • Looks up individual bookings with guest and property details
  • Lets users flag bookings and add notes, persisted in Lakebase
  • Renders interactive charts from live data
  • Gives users a conversational AI interface to explore the data
  • Streams results over SSE (no polling, no WebSockets to manage)
  • Auto-creates its own database tables on first startup
  • Runs locally with hot reload and deploys to Databricks with one command

AppKit's plugin system handled: writing an Express route handler for SSE, a SQL connector, a caching layer, a Postgres connection manager with token rotation, and a streaming implementation. 

Where to go next

  • More queries — drop `.sql` files in `config/queries/` and they're immediately available via `useAnalyticsQuery`
  • Per-user queries — rename a file to `.obo.sql` and it executes as the logged-in user instead of the service principal
  • Multiple Genie spaces — add more aliases in the `genie({ spaces: { ... } })` config to let users explore different data domains
  • Synced tables — for sub-10ms lookups on high-traffic data, sync warehouse tables into Lakebase
  • File attachments — add the `files()` plugin to upload and serve documents from Unity Catalog Volumes
  • Type generation — run `npx @DataBricks/appkit generate-types` to get full TypeScript autocomplete on query parameters and results
  • Custom plugins — when the built-in plugins aren't enough, scaffold your own with `npx @DataBricks/appkit plugin create`

AppKit is an open-source Node.js + React SDK for Databricks Apps. GitHub · Docs

Cleanup

A note on cost

Three of the resources you just created are billed while they exist: the SQL Warehouse, the Lakebase Autoscaling project, and the deployed App. Idle costs are minimal — the warehouse auto-stops after a few minutes of no queries, and Lakebase Autoscaling suspends to zero compute when there's no load. If you're done experimenting, the Cleanup section below shows how to remove all three.

Run from inside wanderbricks-ops/:

1. Tear down the deployed app via the bundle

```databricks bundle destroy```

2. Trash the Genie space

```set -a; source .env; set +a databricks genie trash-space "$DATABRICKS_GENIE_SPACE_ID"```

3. Delete the Lakebase project (also removes the branch, endpoint, and database)

```databricks postgres delete-project appkit-dev```

4. Optional: delete the SQL warehouse if configure_env.sh auto-created it

(skip this if you reused an existing warehouse — the script only creates a

new one when no Pro serverless warehouse exists)

```databricks warehouses delete "$DATABRICKS_WAREHOUSE_ID"```

Deleting the local wanderbricks-ops/ directory is independent - the Databricks resources are tracked separately and stay billed until removed via the commands above.

Troubleshooting

The scripts print contextual error blocks at the points they're most likely to fail. If something else goes wrong, in roughly the order you'd hit it:

bash configure_env.sh fails to create a resource. Read the printed error block — it lists common causes for the specific step that failed (warehouse, Genie space, Lakebase project, or endpoint). Most failures are workspace-level: the feature isn't enabled, or you don't have create permission. Fix the underlying issue (often an admin ask) and re-run.

databricks apps init exits with "missing required resource ... use --set ...". Step 2 didn't populate everything Step 3 expects. Check .env has DATABRICKS_WAREHOUSE_ID, DATABRICKS_GENIE_SPACE_ID, LAKEBASE_BRANCH, and LAKEBASE_DATABASE. Re-run configure_env.sh if any are missing.

npm run dev errors with default auth: cannot configure default credentials. The Databricks SDK can't find a working profile. Run databricks auth login, or export DATABRICKS_CONFIG_PROFILE=<name> if you have multiple profiles, then restart the dev server.

npm run dev errors with missing PGUSER or DATABRICKS_CLIENT_ID. The Lakebase plugin can't figure out the postgres user. Set DATABRICKS_CLIENT_ID (your service principal client ID) or PGUSER (your postgres role) in .env. In production deploy, the platform auto-injects this — only local dev needs it explicitly.

Browser shows a blank page or "Cannot GET /". Vite is on a different port, or the backend crashed. Check the terminal — AppKit logs listening on http://localhost:8000 when ready. If port 8000 is taken, set DATABRICKS_APP_PORT=8001 in .env.

databricks apps deploy fails on validation. AppKit projects fail-fast on type or build errors before uploading. Run npm run build and npm run typecheck to see which one breaks.

databricks apps get wanderbricks-ops returns no URL. The deploy is still in progress. Wait ~30 seconds and re-run — the URL is populated once the first start completes.