Unity Catalog Metric Views provide the ability to build a semantic layer in Databricks, using YAML and SQL. This shifts the semantic layer left into your catalog, moving it closer to your data. This avoids having the semantic layer siloed in a BI tool, and having to use a proprietary language to develop and maintain it.
After a Metric View has been created, we need a simple and repeatable way to deploy it to higher environments such as test and production. This post will demonstrate how to deploy a dimensional model, UC Metric Views, and an AI/BI dashboard with Databricks Asset Bundles (DABs). This shows not only how to deploy Metric Views with DABs, but how to use them in an end-to-end analytics solution. For this example, we’ll create metric views for Databricks SQL usage and query history, using data from System Tables. A Databricks administrator can use this pattern to explore DBSQL trends with their own data.
The solution follows the typical pattern of using a dimensional model, semantic layer, and BI tooling. A Databricks job orchestrates the entire solution, and also conveniently shows how the key components fit together.
Databricks job orchestrating end-to-end analytics solution with Metric Views
A dimensional model is the foundation of our analytical solution. A calendar dimension allows us to filter and aggregate by time periods such as date, week, month, etc. The compute dimension contains warehouse attributes such as type, size, scaling settings, etc. The workspace dimension provides us with a friendly workspace name so we aren’t limited to IDs.
The usage and query history fact tables contain corresponding metrics. Metrics include usage in DBUs, and a rich set of metrics about queries (Duration, status, source, etc.). The list of tables, their sources, and a diagram showing their relationships are below.
Dimensional model for UC Metric Views
Tables can be deployed to different environments by using a notebook or SQL job task containing a create statement, and by setting the catalog and schema with parameter markers. By setting the catalog and schema with use statements and parameters, identical DDL statements can easily be run in different environments. With DABs, values for catalog and schema are specified in DAB target definitions or overridden at deployment time.
use catalog identifier(:catalog);
use schema identifier(:schema);
create or replace table dim_workspace (
workspace_key string comment 'The primary key (PK) of the table. This is the currently the same as the workspace_id, but enables using a surrogate key in the future without changing relationships.',
account_id string comment 'The ID of the parent account of this workspace.',
workspace_id string comment 'The ID of the workspace.',
workspace_name string comment 'The human-readable name of the workspace.',
workspace_url string comment 'URL of the databricks workspace.',
create_time timestamp comment 'Timestamp of when the workspace was created.',
status string comment 'Lifecycle status of the workspace: NOT_PROVISIONED | PROVISIONING | RUNNING | FAILED | BANNED',
primary key (workspace_key) rely
)
comment 'Workspace dimension sourced from system.access.workspaces_latest.'
cluster by (workspace_id);
Dimensional models are meant to be fast, so let’s add primary key (PK) and foreign key (FK) relationships and use RELY in our dimension tables. This results in optimizing some common types of queries, and also provides the important benefit of documenting the relationships.
Primary Key with RELY
primary key (workspace_key) rely
Define PK/FK Relationships in Fact Table
constraint fk_fct_usage_dim_calendar foreign key (calendar_key) references dim_calendar(calendar_key),
constraint fk_fct_usage_dim_compute foreign key (compute_key) references dim_compute(compute_key),
constraint fk_fct_usage_dim_workspace foreign key (workspace_key) references dim_workspace(workspace_key)
With table relationships defined, we can view them in Unity Catalog by clicking “View relationships” from the FK side of the relationship.
View table relationships in Unity Catalog
Table relationships in Unity Catalog
Metric Views allow us to build a centralized semantic layer in Unity Catalog. This is where we'll define dimensions, measures, and table relationships for consumption in BI and AI tooling. Since metrics are defined independently of dimensions, users can aggregate them across any dimension at runtime.
Metric View in Unity Catalog
Metric View in Unity Catalog
Metric Views are not a supported resource with DABs, so they need to be deployed using a SQL statement. Like with tables, we’ll use parameter markers for catalog and schema. This only sets the location of the metric view, however. In the YAML definition, fully qualified names are required for the source and joined tables. This means that the SQL statement needs to be created dynamically, and then executed with EXECUTE IMMEDIATE.
-- In the YAML definition, fully qualified names are required for the source and joined tables
use catalog identifier(:catalog);
use schema identifier(:schema);
declare or replace qry_str string;
set var qry_str =
"create or replace view metv_usage
(
Date comment 'Date of the usage record, this field can be used for faster aggregation by date.',
`Account ID` comment 'The ID of the Databricks account.',
`Workspace Name` comment 'The name of the workspace.',
`Warehouse Name` comment 'The name of the SQL warehouse.',
`Usage Unit` comment 'Unit this usage is measured in. Possible values include DBUs.',
`Usage Type` comment 'The type of usage attributed to the product or workload for billing purposes. Possible values are COMPUTE_TIME, STORAGE_SPACE, NETWORK_BYTES, API_CALLS, TOKEN, or GPU_TIME.',
`Billing Origin Product` comment 'The product that originated the usage. Some products can be billed as different SKUs. For possible values, see [View information about the product associated with the usage](https://docs.databricks.com/en/admin/system-tables/billing.html#features).',
`SKU Name` comment 'Name of the SKU',
`Sum of Usage` comment 'Number of units consumed for this record.'
)
with metrics
language yaml
as $$
version: 0.1
source: " || :catalog || '.' || :schema || ".vw_fct_usage
joins:
- name: calendar
source: " || :catalog || '.' || :schema || ".vw_dim_calendar
on: calendar.calendar_key = source.calendar_key
- name: compute
source: " || :catalog || '.' || :schema || ".vw_dim_compute
on: compute.compute_key = source.compute_key
- name: workspace
source: " || :catalog || '.' || :schema || ".vw_dim_workspace
on: workspace.workspace_key = source.workspace_key
dimensions:
- name: Date
expr: calendar.calendar_date
- name: Account ID
expr: workspace.account_id
- name: Workspace Name
expr: workspace.workspace_name
- name: Warehouse Name
expr: compute.warehouse_name
- name: Usage Unit
expr: usage_unit
- name: Usage Type
expr: usage_type
- name: Billing Origin Product
expr: billing_origin_product
- name: SKU Name
expr: sku_name
measures:
- name: Sum of Usage
expr: SUM(usage_quantity)
$$";
execute immediate qry_str;
With the Metric View in place, the AI/BI dashboard can now be deployed. Metric Views can also be queried from other tools such as Genie spaces and alerts.
Dashboards are deployed from an lvdash.json file. When viewing a dashboard in draft mode, the json file can be exported by clicking File actions -> Export.
Export AI/BI dashboard
This file is then referenced for the dashboard resource in the bundle.
resources:
dashboards:
dbsql_metrics_dashboard:
display_name: 'DBSQL Metrics'
file_path: ../src/dbsql_metrics.lvdash.json
warehouse_id: ${var.warehouse_id}
With the dashboard and other resources in place, the DAB can now be deployed.
$ databricks bundle deploy --target dev --profile DEFAULT
# OR for PROD
$ databricks bundle deploy --target prod --profile PROD
Below is a sample dashboard showing DBSQL usage per warehouse (With warehouse names excluded) for the past 60 days. By using a date range filter, we can automatically default to showing the past N days.
Databricks SQL usage visualization in AI/BI
With the query history metric view, we can explore metrics such as query duration percentiles.
Databricks SQL query duration percentiles visualization in AI/BI
In this post, we demonstrated how to deploy UC Metric Views and related resources with DABs. With this pattern, we can deploy end-to-end analytics solutions on Databricks easily and consistently, accelerating the path to production.
The solution described in this post can be found in the https://github.com/databricks-solutions/databricks-dab-examples repository.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.