cancel
Showing results for 
Search instead for 
Did you mean: 
Community Articles
Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
cancel
Showing results for 
Search instead for 
Did you mean: 

Define KPIs Once with Unity Catalog Business Semantics

rishav_sharma
New Contributor II

Unity Catalog Business Semantics

Most analytics teams have seen the same problem in different forms: one dashboard says revenue is 10.2M, another says 10.6M, a spreadsheet says 10.4M, and nobody is sure which number should be trusted.

The issue is usually not the data platform. It is the business logic.

Metric definitions, filters, joins, display names, and formatting rules often spread across SQL notebooks, BI semantic models, dashboards, extracts, and local files. Once that happens, every team can be technically correct and still produce a different answer.

Unity Catalog Business Semantics gives Databricks teams a governed way to bring that logic closer to the data. Instead of redefining KPIs in every downstream tool, teams can define business metrics once, govern them in Unity Catalog, and make them available to SQL users, dashboards, notebooks, Genie spaces, alerts, and compatible external consumption paths.

This article explains the core idea, the two main building blocks, and a practical adoption path for teams that want trusted AI/BI self-service without letting metric logic drift.

Why Business Semantics Matters

Business users usually ask questions in business language:

Show sales by customer tier this quarter.

They do not think in terms of physical table names, technical column names, join paths, date truncation logic, or whether "sales" should mean gross sales, net revenue, bookings, or recognized revenue.

That gap creates three recurring problems.

1. Metric Drift

The same KPI gets recreated in dashboards, SQL notebooks, spreadsheets, semantic models, and local extracts. A revenue metric may use one filter in a BI model, another in a SQL notebook, and a third in a spreadsheet. Over time, teams spend more effort reconciling numbers than making decisions.

2. AI Ambiguity

AI/BI tools and LLM-powered agents need more than table schemas. They need business context. Without governed definitions and vocabulary, an agent may choose the wrong table, join, dimension, aggregation, or time filter.

3. Self-Service Friction

Business users know terms such as revenue, bookings, margin, segment, region, and fiscal quarter. They should not need to understand every physical column name or every join path before they can ask a useful question.

The goal is simple: move business logic into governed Unity Catalog semantic assets that every approved consumer can reuse.

What Databricks Provides

Unity Catalog Business Semantics provides a governed semantic layer for business metrics, KPIs, and AI-friendly vocabulary.

At a high level, the flow looks like this:

  1. Source data lives in tables, views, SQL queries, or curated star and snowflake models.
  2. Business semantics are defined in Unity Catalog through metric views and metadata.
  3. Governance, ownership, permissions, lineage, and audit controls are applied through Unity Catalog.
  4. Consumers query governed metrics through SQL, notebooks, dashboards, Genie, alerts, and supported ecosystem tools.

The two core components are:

Component Purpose

Metric viewsDefine and govern reusable business KPIs, measures, and dimensions.
Agent metadataTeach AI/BI systems the business vocabulary, display names, synonyms, and formatting rules.

Used together, these components help standardize metric definitions, certify metadata quality, and scale trusted consumption.

Core Component 1: Metric Views

Metric views are reusable semantic objects that separate measure definitions from dimensions. That matters because not every metric can be safely calculated by first aggregating raw values and then slicing the result in a dashboard.

Ratios, distinct counts, filtered measures, and time-based calculations can produce incorrect results if every downstream tool implements the logic differently.

A metric view lets teams define:

Area Example

Sourceprod.sales.orders_enriched
DimensionsRegion, country, product, channel, order month, customer segment
MeasuresTotal revenue, customer count, gross margin, revenue per customer
MetadataDisplay names, synonyms, comments, formats, and business descriptions

Here is a simplified metric view concept for a sales semantic object:

version: 1.1
source: prod.sales.orders_enriched

dimensions:
  - name: region
    expr: customer_region

  - name: order_month
    expr: DATE_TRUNC('month', order_date)

  - name: customer_segment
    expr: segment

measures:
  - name: total_revenue
    expr: SUM(net_sales)

  - name: customer_count
    expr: COUNT(DISTINCT customer_id)

  - name: revenue_per_customer
    expr: SUM(net_sales) / COUNT(DISTINCT customer_id)

Once defined, a user can query the metric at the grain they need:

SELECT
  region,
  MEASURE(total_revenue) AS total_revenue
FROM prod.analytics.sales_metrics
GROUP BY region;

The important point is that the metric logic stays in the governed semantic object. Users can analyze flexibly, but the calculation remains consistent.

Core Component 2: Agent Metadata

Metric definitions are only part of the problem. AI/BI systems also need language context.

Agent metadata helps convert technical fields into discoverable, readable, and consistently formatted business concepts.

For example:

Metadata type Why it helps

Display namesShow "Total Revenue" instead of total_revenue
SynonymsMap terms such as sales, bookings, revenue, customer tier, or segment to governed measures and dimensions.
FormattingDefine currency, percentages, dates, abbreviated numbers, and other output formats.
DescriptionsExplain how a business concept should be interpreted.

If a user asks:

Show sales by customer tier this quarter.

The metadata can help the system interpret:

User phrase Governed interpretation

salestotal_revenue
customer tiercustomer_segment
this quarterCurrent-quarter date filter on the order date
display formatCurrency, abbreviated

This is where business semantics becomes especially useful for AI/BI. The goal is not just to expose tables to an assistant. The goal is to give the assistant governed business meaning.

Governance: Treat Semantics Like Data Products

Semantic definitions should be governed with the same seriousness as other reusable data products. A metric view can shape executive reporting, AI-generated answers, alerts, and operational dashboards. That makes ownership and review essential.

Unity Catalog governance brings four important controls into the semantic layer:

Control What it enables

Access controlCentral privileges, ownership, and inheritance on governed objects.
LineageTrace metrics back to upstream tables and downstream consumers.
AuditingMonitor access and system activity for security and compliance.
OwnershipAssign group ownership for collaborative editing and review.

A practical operating model can split responsibility like this:

Role Owns Reviews

Domain SMEBusiness definition and KPI meaningAccuracy and business acceptance
Analytics engineerSQL/YAML implementation and performanceQuery behavior and maintainability
Governance teamAccess, PII, controls, lineage, and audit postureSecurity and compliance readiness

This avoids a common failure mode: analytics engineers define technically correct metrics that business teams do not recognize, or business teams define metrics that cannot be governed or scaled.

Operational Considerations

Business semantics should not stop at definition. Teams also need to think about authoring, performance, ecosystem fit, and lifecycle management.

Authoring

Metric views can be authored with SQL DDL or through Catalog Explorer. For teams starting out, a low-friction path is to prototype in the UI, validate with SQL queries, and then move repeatable definitions into version-controlled YAML or DDL patterns.

Performance

Semantic reuse can increase query demand. For high-value or high-volume metric views, materialization can help pre-compute aggregations. The optimizer can route eligible queries to materialized results and fall back to source data where needed.

Start with correctness first, then tune performance based on actual usage.

Consumption

Use metric views where they make the metric definition more trusted and reusable:

  1. SQL editors and notebooks for analyst workflows.
  2. AI/BI dashboards for governed reporting.
  3. Genie spaces for natural language analytics.
  4. Alerts for operational monitoring.
  5. Compatible BI and JDBC/ODBC workflows where feature support has been validated.

Do not assume every downstream connector behaves the same way. Validate current feature availability, BI connector support, and workspace/runtime requirements before treating metric views as a universal replacement for existing semantic models.

Common Pitfalls to Avoid

Treating Metric Views as Only a Technical Feature

If the business definition is unclear, the metric view will only centralize confusion. Start with agreed definitions.

Skipping Metadata

Without display names, synonyms, formats, and descriptions, AI/BI tools may still struggle to map user language to the right metrics.

Moving Too Many Metrics at Once

Start with a KPI family that has visible pain and clear owners. Prove the pattern, then expand.

Ignoring Ownership

Every certified metric needs an owner, a review path, and a change process. Otherwise, governed metrics can drift too.

Overpromising Ecosystem Compatibility

Metric views are powerful, but connector behavior and feature availability can vary. Validate the current platform support for each target consumer.

Summary

Unity Catalog Business Semantics helps teams move from scattered metric logic to governed, reusable business definitions.

Metric views define KPIs once. Agent metadata teaches AI/BI systems how the business talks about those KPIs. Unity Catalog governance provides the controls needed to manage ownership, access, lineage, and auditability.

For analytics teams, the benefit is consistency. For business users, it is simpler self-service. For AI/BI, it is better context.

Start small: pick one KPI family, define a metric view, add metadata, validate through SQL and Genie, then publish it with clear ownership. Once that pattern works, repeat it across domains.

References

0 REPLIES 0