cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results for 
Search instead for 
Did you mean: 

Strange metric view window grouping interaction

Malthe
Valued Contributor II

With the new metric views, I am unable to understand the grouping logic in the following setup:

I have a table with timestamps and I define dimensions as follows:

dimension:
  - name: timestamp
    expr: timestamp
  - name: date
    expr: DATE(timestamp)
  - name: month
    expr: DATE_TRUNC('month', timestamp)

Now, here's a measure that counts the distinct days appearing in the month current to each row:

measures:
  - name: distinctDays
    expr: count(DISTINCT date)
    window:
      - order: date
        range: all
        semiadditive: last
      - order: month
        range: current
        semiadditive: last

Now, the result of this measure varies whether or not we group by:

  1. date
  2. DATE(timestamp)

In the first case, we correctly get the distinct days in the same month as the date of each row. In the second case, we just get a 1.

I would expect that it's possible to say that for each row, give me the number of distinct days for that month. What am I missing?

1 ACCEPTED SOLUTION

Accepted Solutions

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @Malthe,

This is a nuanced aspect of how metric views resolve window measure dimensions. The key behavior you are seeing comes down to how the metric view engine matches your query's GROUP BY columns to the dimensions defined in the window clause.

WHAT IS HAPPENING

When you define a window measure with order: date, the engine uses the named dimension "date" as the window ordering key. At query time, the engine needs to match the columns in your GROUP BY to the dimensions defined in the metric view so it can correctly apply the window logic.

1. When you GROUP BY date (the named dimension), the engine recognizes that this is the same dimension referenced in your window's order clause. It correctly partitions the window by the month dimension and orders by date within each month, giving you the expected count of distinct days per month.

2. When you GROUP BY DATE(timestamp) (the raw expression), the engine does not recognize this as the named "date" dimension, even though the expression is functionally equivalent. Because the engine cannot match it to the window's order dimension, the window behavior changes. The semiadditive: last logic kicks in differently, and you end up with a result of 1 because the window effectively collapses to a single value per group.

WHY THIS HAPPENS

Metric views resolve dimension references by name, not by expression equivalence. The window clause order: date is a reference to the named dimension "date" in your metric view definition. When your query uses the raw expression DATE(timestamp) instead of the dimension name, the engine treats it as a different column that does not participate in the window ordering logic. This means the semiadditive behavior applies as if the ordering dimension were absent from the GROUP BY, which produces the unexpected result.

HOW TO FIX THIS

Always reference the named dimension in your queries rather than recomputing the expression:

SELECT
date,
month,
MEASURE(distinctDays) AS distinct_days
FROM your_metric_view
GROUP BY date, month

If you need to apply a transformation to a dimension in your query, define it as a separate named dimension in the metric view YAML and reference that name in both the window clause and your GROUP BY.

As a general rule with metric views: use the dimension names you defined, not the underlying expressions. This ensures the engine can properly match your GROUP BY columns to the window ordering dimensions and apply the correct aggregation logic.

DOCUMENTATION REFERENCE

The window measures documentation covers the relationship between window order fields and dimensions:
https://docs.databricks.com/aws/en/metric-views/data-modeling/window-measures

The general metric views overview is also helpful for understanding how dimensions and measures interact:
https://docs.databricks.com/aws/en/metric-views/

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

View solution in original post

1 REPLY 1

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @Malthe,

This is a nuanced aspect of how metric views resolve window measure dimensions. The key behavior you are seeing comes down to how the metric view engine matches your query's GROUP BY columns to the dimensions defined in the window clause.

WHAT IS HAPPENING

When you define a window measure with order: date, the engine uses the named dimension "date" as the window ordering key. At query time, the engine needs to match the columns in your GROUP BY to the dimensions defined in the metric view so it can correctly apply the window logic.

1. When you GROUP BY date (the named dimension), the engine recognizes that this is the same dimension referenced in your window's order clause. It correctly partitions the window by the month dimension and orders by date within each month, giving you the expected count of distinct days per month.

2. When you GROUP BY DATE(timestamp) (the raw expression), the engine does not recognize this as the named "date" dimension, even though the expression is functionally equivalent. Because the engine cannot match it to the window's order dimension, the window behavior changes. The semiadditive: last logic kicks in differently, and you end up with a result of 1 because the window effectively collapses to a single value per group.

WHY THIS HAPPENS

Metric views resolve dimension references by name, not by expression equivalence. The window clause order: date is a reference to the named dimension "date" in your metric view definition. When your query uses the raw expression DATE(timestamp) instead of the dimension name, the engine treats it as a different column that does not participate in the window ordering logic. This means the semiadditive behavior applies as if the ordering dimension were absent from the GROUP BY, which produces the unexpected result.

HOW TO FIX THIS

Always reference the named dimension in your queries rather than recomputing the expression:

SELECT
date,
month,
MEASURE(distinctDays) AS distinct_days
FROM your_metric_view
GROUP BY date, month

If you need to apply a transformation to a dimension in your query, define it as a separate named dimension in the metric view YAML and reference that name in both the window clause and your GROUP BY.

As a general rule with metric views: use the dimension names you defined, not the underlying expressions. This ensures the engine can properly match your GROUP BY columns to the window ordering dimensions and apply the correct aggregation logic.

DOCUMENTATION REFERENCE

The window measures documentation covers the relationship between window order fields and dimensions:
https://docs.databricks.com/aws/en/metric-views/data-modeling/window-measures

The general metric views overview is also helpful for understanding how dimensions and measures interact:
https://docs.databricks.com/aws/en/metric-views/

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.