cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results for 
Search instead for 
Did you mean: 

Metric Views

playnicekids
New Contributor II

Hi

I think I’ve found a reproducible bug / or am misunderstanding some syntax / capabilities of Metric Views when joining a calendar scaffold to an SCD2 table.
The same SQL query works perfectly, but the Metric View always returns a constant 1 per month when using COUNT(DISTINCT …) on a joined column. I've added some simple code / metric view that reproduces the behaviour I'm seeing.

Minimal Repro Code:

-- Base calendar
CREATE OR REPLACE TABLE dim_calendar_month AS
SELECT explode(sequence(DATE'2025-06-01', DATE'2025-08-01', INTERVAL 1 MONTH)) AS month_start,
last_day(month_start) AS month_end;

-- Simple SCD2 table
CREATE OR REPLACE TABLE dim_contact AS VALUES
('A','Open','client',DATE'2024-01-01',NULL),
('B','Open','client',DATE'2025-06-01',DATE'2025-07-01'),
('B','Closed','client',DATE'2025-07-01',NULL),
('D','Open','client',DATE'2025-07-15',NULL)
AS dim_contact(contact_id,status,contact_type,from_date,to_date);

-- Works fine (expected 2,2,2)
WITH calendar_cte AS (SELECT month_start, month_end FROM dim_calendar_month)
SELECT calendar_cte.month_end, COUNT(DISTINCT c.contact_id) AS open_contacts
FROM calendar_cte
LEFT JOIN dim_contact c
ON c.from_date <= calendar_cte .month_end
AND (c.to_date > calendar_cte .month_end OR c.to_date IS NULL)
AND lower(c.status)='open'
GROUP BY calendar_cte .month_end
ORDER BY calendar_cte .month_end;

Metric View:

version: 1.1
source: MY_CATALOG.mv_bug_demo.dim_calendar_month
joins:
  - name: contact
    source:mv_bug_demo.dim_contact
    "on": contact.from_date <= source.month_end AND (contact.to_date > source.month_end
      OR contact.to_date IS NULL) AND lower(contact.status) = 'open' AND contact.contact_type
      = 'client'
dimensions:
  - name: month
    expr: source.month_end
    display_name: Month
  - name: contact_id
    expr: contact.contact_id
    display_name: Contact ID
measures:
  - name: eligible_contacts
    expr: COUNT(DISTINCT contact_id)
    display_name: Eligible Contacts
  - name: rows_all
    expr: COUNT(*)
  - name: contacts_distinct
    expr: COUNT(DISTINCT contact.contact_id)
  - name: dup_rows
    expr: MEASURE(rows_all) - MEASURE(contacts_distinct)
  - name: max_age
    expr: MAX(contact.age)

SQL:

-- Returns 1, 1, 1

SELECT
  month,
  MEASURE(eligible_contacts) AS eligible_contacts
FROM dim_calendar_month_metric_view
GROUP BY month
ORDER BY month;

Environment

  • Unity Catalog enabled

  • Runtime: 16.0

  • DBSQL metric view syntax: version 1.1

 

1 ACCEPTED SOLUTION

Accepted Solutions

Louis_Frolio
Databricks Employee
Databricks Employee

Hey @playnicekids , I dig some digging and have come up with some helpful hints/tips to get you past your issue:

 
This behavior is due to how metric view joins are defined and executed.
 

Diagnosis

 
The join in your metric view is a many-to-many temporal join (calendar month → multiple open contact rows). Metric view joins are intended to be many-to-one; when they encounter many-to-many, the engine selects only the first matching row from the joined table for each source row. That collapses your month-to-contacts result to a single match per month, so COUNT(DISTINCT contact_id) returns 1 for every month.
This difference explains why your raw SQL works (it produces the cartesian month–contact set) while the metric view does not (it reduces the join to one match per source row).
Two additional notes that are relevant to your setup: * Metric views can use a SQL query as the source, which is the recommended way to handle many-to-many expansions like calendar scaffolds, because the expansion happens inside the source rather than in the metric view join layer.
  • Metric views don’t support adding new joins at query time; the join semantics must be modeled in the view definition or the view’s source SQL. Using a pre-expanded SQL source avoids this limitation while preserving correct DISTINCT semantics.
 

Fix Options

Pick one of the following approaches to get correct counts.
 

Option A:

Push the temporal join into the source SQL Define the metric view over a SQL query that enumerates month–contact pairs (the same logic you used in plain SQL). This keeps the many-to-many expansion out of the metric view’s join semantics and preserves correct COUNT(DISTINCT) results.
-- Metric view definition (YAML in SQL)
ALTER VIEW my_catalog.mv_bug_demo.dim_calendar_month_metric_view AS
version: 1.1
source: |
  SELECT
    m.month_end AS month,
    c.contact_id
  FROM my_catalog.mv_bug_demo.dim_calendar_month m
  LEFT JOIN my_catalog.mv_bug_demo.dim_contact c
    ON c.from_date <= m.month_end
   AND (c.to_date > m.month_end OR c.to_date IS NULL)
   AND lower(c.status) = 'open'
   AND c.contact_type = 'client'
dimensions:
  - name: month
    expr: month
    display_name: Month

  - name: contact_id
    expr: contact_id
    display_name: Contact ID

measures:
  - name: eligible_contacts
    expr: COUNT(DISTINCT contact_id)
    display_name: Eligible Contacts
 
Query: sql SELECT month, MEASURE(eligible_contacts) AS eligible_contacts FROM my_catalog.mv_bug_demo.dim_calendar_month_metric_view GROUP BY month ORDER BY month;
This reproduces your working SQL result (2, 2, 2) because the source is already at the month–contact grain, and COUNT(DISTINCT contact_id) is evaluated correctly at the month grouping.
 

Option B:

Materialize a helper view and use it as the source If you prefer to keep the metric view YAML simpler, create a view that precomputes the scaffolding and then point the metric view at it.
CREATE OR REPLACE VIEW my_catalog.mv_bug_demo.v_month_contact_open AS
SELECT
  m.month_end AS month,
  c.contact_id
FROM my_catalog.mv_bug_demo.dim_calendar_month m
LEFT JOIN my_catalog.mv_bug_demo.dim_contact c
  ON c.from_date <= m.month_end
 AND (c.to_date > m.month_end OR c.to_date IS NULL)
 AND lower(c.status) = 'open'
 AND c.contact_type = 'client';
 
Metric view (YAML): yaml version: 1.1 source: my_catalog.mv_bug_demo.v_month_contact_open dimensions: - name: month expr: month display_name: Month - name: contact_id expr: contact_id display_name: Contact ID measures: - name: eligible_contacts expr: COUNT(DISTINCT contact_id) display_name: Eligible Contacts
 
Query: sql SELECT month, MEASURE(eligible_contacts) FROM my_catalog.mv_bug_demo.dim_calendar_month_metric_view GROUP BY month ORDER BY month;
Same correct result, with the expansion done in the view rather than in the metric view join.
 

Why the original metric view returns 1

 
Metric view joins are designed for star/snowflake schema where the source fact joins to dimensions in a many-to-one relationship. In many-to-many cases (like calendar scaffolds), metric views pick the first matching row from the joined table for each source row; that’s why you see a single joined contact per month and a constant COUNT(DISTINCT contact_id) = 1.
 

Small correctness tips

  • You already quoted the “on” key in YAML; that’s important because YAML 1.1 parsers can misinterpret unquoted “on” as a boolean and break joins.
  • When consuming metric views, make sure the attached compute meets the supported runtime requirement (Databricks Runtime 16.4+ for querying metric views), even though your simple query may still run on earlier versions.

 

Hope this helps, Louis.

 

View solution in original post

2 REPLIES 2

Louis_Frolio
Databricks Employee
Databricks Employee

Hey @playnicekids , I dig some digging and have come up with some helpful hints/tips to get you past your issue:

 
This behavior is due to how metric view joins are defined and executed.
 

Diagnosis

 
The join in your metric view is a many-to-many temporal join (calendar month → multiple open contact rows). Metric view joins are intended to be many-to-one; when they encounter many-to-many, the engine selects only the first matching row from the joined table for each source row. That collapses your month-to-contacts result to a single match per month, so COUNT(DISTINCT contact_id) returns 1 for every month.
This difference explains why your raw SQL works (it produces the cartesian month–contact set) while the metric view does not (it reduces the join to one match per source row).
Two additional notes that are relevant to your setup: * Metric views can use a SQL query as the source, which is the recommended way to handle many-to-many expansions like calendar scaffolds, because the expansion happens inside the source rather than in the metric view join layer.
  • Metric views don’t support adding new joins at query time; the join semantics must be modeled in the view definition or the view’s source SQL. Using a pre-expanded SQL source avoids this limitation while preserving correct DISTINCT semantics.
 

Fix Options

Pick one of the following approaches to get correct counts.
 

Option A:

Push the temporal join into the source SQL Define the metric view over a SQL query that enumerates month–contact pairs (the same logic you used in plain SQL). This keeps the many-to-many expansion out of the metric view’s join semantics and preserves correct COUNT(DISTINCT) results.
-- Metric view definition (YAML in SQL)
ALTER VIEW my_catalog.mv_bug_demo.dim_calendar_month_metric_view AS
version: 1.1
source: |
  SELECT
    m.month_end AS month,
    c.contact_id
  FROM my_catalog.mv_bug_demo.dim_calendar_month m
  LEFT JOIN my_catalog.mv_bug_demo.dim_contact c
    ON c.from_date <= m.month_end
   AND (c.to_date > m.month_end OR c.to_date IS NULL)
   AND lower(c.status) = 'open'
   AND c.contact_type = 'client'
dimensions:
  - name: month
    expr: month
    display_name: Month

  - name: contact_id
    expr: contact_id
    display_name: Contact ID

measures:
  - name: eligible_contacts
    expr: COUNT(DISTINCT contact_id)
    display_name: Eligible Contacts
 
Query: sql SELECT month, MEASURE(eligible_contacts) AS eligible_contacts FROM my_catalog.mv_bug_demo.dim_calendar_month_metric_view GROUP BY month ORDER BY month;
This reproduces your working SQL result (2, 2, 2) because the source is already at the month–contact grain, and COUNT(DISTINCT contact_id) is evaluated correctly at the month grouping.
 

Option B:

Materialize a helper view and use it as the source If you prefer to keep the metric view YAML simpler, create a view that precomputes the scaffolding and then point the metric view at it.
CREATE OR REPLACE VIEW my_catalog.mv_bug_demo.v_month_contact_open AS
SELECT
  m.month_end AS month,
  c.contact_id
FROM my_catalog.mv_bug_demo.dim_calendar_month m
LEFT JOIN my_catalog.mv_bug_demo.dim_contact c
  ON c.from_date <= m.month_end
 AND (c.to_date > m.month_end OR c.to_date IS NULL)
 AND lower(c.status) = 'open'
 AND c.contact_type = 'client';
 
Metric view (YAML): yaml version: 1.1 source: my_catalog.mv_bug_demo.v_month_contact_open dimensions: - name: month expr: month display_name: Month - name: contact_id expr: contact_id display_name: Contact ID measures: - name: eligible_contacts expr: COUNT(DISTINCT contact_id) display_name: Eligible Contacts
 
Query: sql SELECT month, MEASURE(eligible_contacts) FROM my_catalog.mv_bug_demo.dim_calendar_month_metric_view GROUP BY month ORDER BY month;
Same correct result, with the expansion done in the view rather than in the metric view join.
 

Why the original metric view returns 1

 
Metric view joins are designed for star/snowflake schema where the source fact joins to dimensions in a many-to-one relationship. In many-to-many cases (like calendar scaffolds), metric views pick the first matching row from the joined table for each source row; that’s why you see a single joined contact per month and a constant COUNT(DISTINCT contact_id) = 1.
 

Small correctness tips

  • You already quoted the “on” key in YAML; that’s important because YAML 1.1 parsers can misinterpret unquoted “on” as a boolean and break joins.
  • When consuming metric views, make sure the attached compute meets the supported runtime requirement (Databricks Runtime 16.4+ for querying metric views), even though your simple query may still run on earlier versions.

 

Hope this helps, Louis.

 

Incredible, thanks very much Louis!

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now