Databricks Community

chris_koester

Introduction

Unity Catalog Metric Views provide the ability to build a semantic layer in Databricks, using YAML and SQL. This shifts the semantic layer left into your catalog, moving it closer to your data. This avoids having the semantic layer siloed in a BI tool, and having to use a proprietary language to develop and maintain it.

After a Metric View has been created, we need a simple and repeatable way to deploy it to higher environments such as test and production. This post will demonstrate how to deploy a dimensional model, UC Metric Views, and an AI/BI dashboard with Databricks Asset Bundles (DABs). This shows not only how to deploy Metric Views with DABs, but how to use them in an end-to-end analytics solution. For this example, we’ll create metric views for Databricks SQL usage and query history, using data from System Tables. A Databricks administrator can use this pattern to explore DBSQL trends with their own data.

Key Components

The solution follows the typical pattern of using a dimensional model, semantic layer, and BI tooling. A Databricks job orchestrates the entire solution, and also conveniently shows how the key components fit together.

Databricks job orchestrating end-to-end analytics solution with Metric Views

Fact and dimension tables: A simple dimensional model sourced mostly from Databricks System Tables, with a focus on DBSQL usage and query history.
UC Metric Views: Serves as the semantic layer, where metrics and model relationships are defined.
AI/BI dashboard: Used to visualize and explore metrics.

1. Fact and Dimension Tables

A dimensional model is the foundation of our analytical solution. A calendar dimension allows us to filter and aggregate by time periods such as date, week, month, etc. The compute dimension contains warehouse attributes such as type, size, scaling settings, etc. The workspace dimension provides us with a friendly workspace name so we aren’t limited to IDs.

The usage and query history fact tables contain corresponding metrics. Metrics include usage in DBUs, and a rich set of metrics about queries (Duration, status, source, etc.). The list of tables, their sources, and a diagram showing their relationships are below.

dim_calendar: Calendar/date dimension from Time at the Lakehouse.
dim_compute: DBSQL warehouses from system.compute.warehouses. Configured as an SCD1 for simplicity. Can be extended to other compute types.
dim_workspace: Databricks workspaces from system.access.workspaces_latest.
fct_usage: Billable usage sourced from system.billing.usage.
fct_query_history: Query history from system.query.history.

Dimensional model for UC Metric Views

Deploying Tables with DABs

Tables can be deployed to different environments by using a notebook or SQL job task containing a create statement, and by setting the catalog and schema with parameter markers. By setting the catalog and schema with use statements and parameters, identical DDL statements can easily be run in different environments. With DABs, values for catalog and schema are specified in DAB target definitions or overridden at deployment time.

use catalog identifier(:catalog);
use schema identifier(:schema);

create or replace table dim_workspace (
  workspace_key string comment 'The primary key (PK) of the table. This is the currently the same as the workspace_id, but enables using a surrogate key in the future without changing relationships.',
  account_id string comment 'The ID of the parent account of this workspace.',
  workspace_id string comment 'The ID of the workspace.',
  workspace_name string comment 'The human-readable name of the workspace.',
  workspace_url string comment 'URL of the databricks workspace.',
  create_time timestamp comment 'Timestamp of when the workspace was created.',
  status string comment 'Lifecycle status of the workspace: NOT_PROVISIONED | PROVISIONING | RUNNING | FAILED | BANNED',
  primary key (workspace_key) rely
)
comment 'Workspace dimension sourced from system.access.workspaces_latest.'
cluster by (workspace_id);

Query optimization using primary key constraints

Dimensional models are meant to be fast, so let’s add primary key (PK) and foreign key (FK) relationships and use RELY in our dimension tables. This results in optimizing some common types of queries, and also provides the important benefit of documenting the relationships.

Primary Key with RELY

primary key (workspace_key) rely

Define PK/FK Relationships in Fact Table

constraint fk_fct_usage_dim_calendar foreign key (calendar_key) references dim_calendar(calendar_key),
constraint fk_fct_usage_dim_compute foreign key (compute_key) references dim_compute(compute_key),
constraint fk_fct_usage_dim_workspace foreign key (workspace_key) references dim_workspace(workspace_key)

With table relationships defined, we can view them in Unity Catalog by clicking “View relationships” from the FK side of the relationship.

View table relationships in Unity Catalog

Table relationships in Unity Catalog

2. UC Metric Views

Metric Views allow us to build a centralized semantic layer in Unity Catalog. This is where we'll define dimensions, measures, and table relationships for consumption in BI and AI tooling. Since metrics are defined independently of dimensions, users can aggregate them across any dimension at runtime.

Metric View in Unity Catalog

Metric View in Unity Catalog

Deploying Metric Views with DABs

Metric Views are not a supported resource with DABs, so they need to be deployed using a SQL statement. Like with tables, we’ll use parameter markers for catalog and schema. This only sets the location of the metric view, however. In the YAML definition, fully qualified names are required for the source and joined tables. This means that the SQL statement needs to be created dynamically, and then executed with EXECUTE IMMEDIATE.

-- In the YAML definition, fully qualified names are required for the source and joined tables
use catalog identifier(:catalog);
use schema identifier(:schema);

declare or replace qry_str string;

set var qry_str = 
  "create or replace view metv_usage
  (
    Date comment 'Date of the usage record, this field can be used for faster aggregation by date.',
    `Account ID` comment 'The ID of the Databricks account.',
    `Workspace Name` comment 'The name of the workspace.',
    `Warehouse Name` comment 'The name of the SQL warehouse.',
    `Usage Unit` comment 'Unit this usage is measured in. Possible values include DBUs.',
    `Usage Type` comment 'The type of usage attributed to the product or workload for billing purposes. Possible values are COMPUTE_TIME, STORAGE_SPACE, NETWORK_BYTES, API_CALLS, TOKEN, or GPU_TIME.',
    `Billing Origin Product` comment 'The product that originated the usage. Some products can be billed as different SKUs. For possible values, see [View information about the product associated with the usage](https://docs.databricks.com/en/admin/system-tables/billing.html#features).',
    `SKU Name` comment 'Name of the SKU',
    `Sum of Usage` comment 'Number of units consumed for this record.'
  )
  with metrics
  language yaml
  as $$
  version: 0.1
  
  source: " || :catalog || '.' || :schema || ".vw_fct_usage
  
  joins:
    - name: calendar
      source: " || :catalog || '.' || :schema || ".vw_dim_calendar
      on: calendar.calendar_key = source.calendar_key
  
    - name: compute
      source: " || :catalog || '.' || :schema || ".vw_dim_compute
      on: compute.compute_key = source.compute_key
  
    - name: workspace
      source: " || :catalog || '.' || :schema || ".vw_dim_workspace
      on: workspace.workspace_key = source.workspace_key
  
  dimensions:
    - name: Date
      expr: calendar.calendar_date
    - name: Account ID
      expr: workspace.account_id
    - name: Workspace Name
      expr: workspace.workspace_name
    - name: Warehouse Name
      expr: compute.warehouse_name
    - name: Usage Unit
      expr: usage_unit
    - name: Usage Type
      expr: usage_type
    - name: Billing Origin Product
      expr: billing_origin_product
    - name: SKU Name
      expr: sku_name
  
  measures:
    - name: Sum of Usage
      expr: SUM(usage_quantity)
$$";

execute immediate qry_str;

3. AI/BI Dashboard

With the Metric View in place, the AI/BI dashboard can now be deployed. Metric Views can also be queried from other tools such as Genie spaces and alerts.

Deploying AI/BI Dashboards with DABs

Dashboards are deployed from an lvdash.json file. When viewing a dashboard in draft mode, the json file can be exported by clicking File actions -> Export.

Export AI/BI dashboard

This file is then referenced for the dashboard resource in the bundle.

resources:
  dashboards:
    dbsql_metrics_dashboard:
      display_name: 'DBSQL Metrics'
      file_path: ../src/dbsql_metrics.lvdash.json
      warehouse_id: ${var.warehouse_id}

With the dashboard and other resources in place, the DAB can now be deployed.

$ databricks bundle deploy --target dev --profile DEFAULT

# OR for PROD

$ databricks bundle deploy --target prod --profile PROD

Below is a sample dashboard showing DBSQL usage per warehouse (With warehouse names excluded) for the past 60 days. By using a date range filter, we can automatically default to showing the past N days.

Databricks SQL usage visualization in AI/BI

With the query history metric view, we can explore metrics such as query duration percentiles.

Databricks SQL query duration percentiles visualization in AI/BI

Conclusion

In this post, we demonstrated how to deploy UC Metric Views and related resources with DABs. With this pattern, we can deploy end-to-end analytics solutions on Databricks easily and consistently, accelerating the path to production.

The solution described in this post can be found in the https://github.com/databricks-solutions/databricks-dab-examples repository.

Databricks Community

How to Deploy Metric Views with DABs

Introduction

Key Components

1. Fact and Dimension Tables

Deploying Tables with DABs

Query optimization using primary key constraints

2. UC Metric Views

Deploying Metric Views with DABs

3. AI/BI Dashboard

Deploying AI/BI Dashboards with DABs

Conclusion

Metadata-Driven ETL Framework in Databricks (Part-1)

Top 10 query performance tuning tips for Databricks Serverless SQL

Best practices for safe data experimentation with Databricks