Databricks Community

Nivethan_Venkat · ‎08-12-2025

Introduction
Why this matters:
Sneak-Peak inside:
Prerequisites:
Lakebase Pricing refresh:
Reusable SQL Views:
1. Creating a home for cost:
2. Usage by day / user / database_instance (comp_qty logic):
3. Effective rates:
4.Workspace names:
5.Daily cost + monthly cumulative window:
6.Monthly roll-up by workspace:
Kickstart with the repo for reusable Dashboard and Deployment:
Quick start (Workspace):
Best Practices for Cost Optimisation:
Conclusion:
References:

Introduction

In Parts 1 and 2 we explored what Databricks Lakebase is and how it performs. Now we’ll make Lakebase costs fully transparent—and operational. This article shows exactly how to calculate spend from system tables, standardize the math as reusable SQL views, and publish a shareable AI/BI dashboard that your team can deploy in minutes.

Why this matters:

Lakebase is always-on OLTP. Unlike auto-stoppable OLAP compute, its cost profile hinges on continuous readiness. Getting cost wrong (or opaque) risks budget surprises; getting it right unlocks predictable operations, fair back-charging, and faster decisions.

Sneak-Peak inside:

Cost model from first principles: Join usage (DBU/DSU) with effective list prices over time windows.
Reusable SQL views in leveraged under the preferred catalog.schema that mirror the working logic (storage ×10 factor, database_instance_id, user, and monthly cumulative totals) in a reproducible repo.
AI/BI dashboard you can import from Lakebase Cost Overview.lvdash.jsonor deploy via Databricks Asset Bundles—complete with filters, trends, monthly rollups, price bands, and a 7-day anomaly lens.
Automation path: A tiny Labs LSQL runner + bundle so you can clone → deploy → reuse the same dashboard.
Practical cost controls (right-sizing, commitments, hygiene, alerts).

Prerequisites:

Access to system.billing.usage, system.billing.list_prices, and system.access.workspaces_latest
A SQL Warehouse ID and permission to create views & dashboards.

Lakebase Pricing refresh:

Compute is billed in DBUs via Lakebase SKUs.
Storage is billed in DSUs / GB-month.
Prices can change over time, so you must time-band list prices and join them to the exact usage hour/day they applied to.
This guide implements that join and adds daily totals + a monthly cumulative window for “how much so far this month?”.

Reusable SQL Views:

1. Creating a home for cost:

USE CATALOG catalog_name; -- TODO: Replace with your catalog name
CREATE SCHEMA IF NOT EXISTS schema_name; -- TODO: Replace with your schema name

2. Usage by day / user / database_instance (comp_qty logic):

CREATE OR REPLACE VIEW catalog_name.schema_name.usage_daily_user_instance AS
SELECT
  SUM(usage_quantity) AS compute_qty,
  CAST(usage_date AS DATE) AS usage_date,
  sku_name,
  usage_unit,
  usage_metadata.database_instance_id AS database_instance_id,
  COALESCE(identity_metadata.run_as,
           identity_metadata.created_by,
           identity_metadata.owned_by) AS user,
  workspace_id
FROM system.billing.usage
WHERE sku_name IN (
  'PREMIUM_DATABASE_SERVERLESS_COMPUTE_US_EAST',
  'PREMIUM_DATABRICKS_STORAGE_US_EAST'
)
GROUP BY
  CAST(usage_date AS DATE),
  sku_name,
  usage_unit,
  usage_metadata.database_instance_id,
  COALESCE(identity_metadata.run_as, identity_metadata.created_by, identity_metadata.owned_by),
  workspace_id;

Daily Cost by Unit Cost Share by Database Instance

3. Effective rates:

CREATE OR REPLACE VIEW catalog_name.schema_name.effective_rates AS
SELECT
  CASE
    WHEN sku_name = 'PREMIUM_DATABRICKS_STORAGE_US_EAST'
      THEN pricing.effective_list['default']
    ELSE pricing.effective_list['default']
  END AS effective_rate,
  usage_unit,
  sku_name,
  account_id,
  currency_code,
  price_start_time,
  COALESCE(price_end_time, TIMESTAMP '2999-12-31 23:59:59Z') AS price_end_time
FROM system.billing.list_prices
WHERE sku_name IN (
  'PREMIUM_DATABASE_SERVERLESS_COMPUTE_US_EAST',
  'PREMIUM_DATABRICKS_STORAGE_US_EAST'
);

Effective Price over Time (per SKU)

4. Workspace names:

CREATE OR REPLACE VIEW catalog_name.schema_name.workspaces AS
SELECT DISTINCT workspace_id, workspace_name
FROM system.access.workspaces_latest;

5. Daily cost + monthly cumulative window:

CREATE OR REPLACE VIEW catalog_name.schema_name.cost_lakebase_daily_accum AS
WITH comp_qty AS (
  SELECT * FROM catalog_name.schema_name.usage_daily_user_instance
),
rate AS (
  SELECT * FROM catalog_name.schema_name.effective_rates
),
workspace AS (
  SELECT * FROM catalog_name.schema_name.workspaces
)
SELECT
  w.workspace_name,
  c.compute_qty * r.effective_rate AS cost,
  c.usage_date,
  TRUNC(c.usage_date, 'MONTH') AS month,
  c.usage_unit,
  r.currency_code,
  c.database_instance_id,
  c.user,
  SUM(c.compute_qty * r.effective_rate) OVER (
    PARTITION BY w.workspace_name, c.user, c.database_instance_id, r.currency_code, c.usage_unit,
                 TRUNC(c.usage_date, 'MONTH')
    ORDER BY c.usage_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS accum_cost
FROM comp_qty c
JOIN rate r
  ON  c.sku_name = r.sku_name
  AND c.usage_unit = r.usage_unit
  AND TIMESTAMP(c.usage_date) >= r.price_start_time
  AND TIMESTAMP(c.usage_date) <  r.price_end_time
JOIN workspace w
  ON w.workspace_id = c.workspace_id;

Potential Spikes (> 1.25× MA7)

Daily Cost vs 7-Day Moving Average

Cumulative Cost (All Units) - Monthly Window

6. Monthly roll-up by workspace:

CREATE OR REPLACE VIEW catalog_name.schema_name.cost_lakebase_by_workspace_monthly AS
SELECT
  month,
  workspace_name,
  usage_unit,
  currency_code,
  SUM(cost)       AS monthly_cost,
  MAX(accum_cost) AS month_end_accum_cost
FROM catalog_name.schema_name.cost_lakebase_daily_accum
GROUP BY ALL;

Monthly Cost by Workspace

Region-agnostic option: replace the two sku_name IN (...) filters with

sku_name LIKE 'PREMIUM_DATABASE_SERVERLESS_COMPUTE_%'

OR sku_name LIKE 'PREMIUM_DATABRICKS_STORAGE_%'

Kickstart with the repo for reusable Dashboard and Deployment:

Quick start (Workspace):

Workspace → Create → Git folder. Clone the following repo: https://github.com/dediggibyte/diggi_lakebase.git
For benchmarking of Lakebase, please refer the benchmark folder inside the repo.
For Lakebase cost and AI/BI Dashboads, navigate to the lakebase_cost/lakebase-cost-observability section of the repo, as highlighted below:
Databricks Lakebase cost repo
After the repository is cloned and the prerequisites, you can able to validate the bundle to deploy the Dashboard as highlighted below:
Lakebase cost Dashboard Deployment bundle
Once the bundle is validated and deployed, you can navigate to the Dashboards section to see the deployed Dashboard:
Lakebase Cost Dashboard

To be considered: Point datasets at the views above if you changed catalog/schema. If you iterate visually, Export again and commit the updated JSON to your repo so your bundle stays the source of truth.

Best Practices for Cost Optimisation:

Monitor usage: Use System tables and Databricks cost dashboards to track DBU DSU consumption.
Choose the right instance types: Match your workload to the appropriate compute resources.
Leverage committed-use discounts: If you have predictable workloads, consider committing to usage for lower rates.

Conclusion:

Databricks Lakebase offers a flexible, pay-as-you-go pricing model that scales with your needs. By understanding the core components — compute (DBUs), storage (DSU), and other costs you can accurately estimate and optimise your costs. Always refer to the official Databricks pricing page and use the calculator for the most up-to-date rates and personalised estimates.

References:

gjsau · ‎09-08-2025

There is contradictory information about Lakebase and its "serverless" nature. This article confirms my own experience, it is not serverless: ie, to be ready to accept a query, it has to be started and is charged at a minimum of 1 DBU/hour while it is running.

This seems to contradict directly with the product description here. Emphasis has been added below to highlight the contradiction:

Serverless: Lakebases are lightweight, and can scale elastically instantly, up and down, all the way to zero. At zero, the cost of the lakebase is just the cost of storing the data on cheap data lakes.