cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Can metric views be used to achieve sql cube functionality

IM_01
Contributor III
 
8 REPLIES 8

balajij8
Contributor III

Metric Views is not a direct cube engine. It's a semantic abstraction layer. It allows to define dimensions, fact, measures with aggregations and then query it flexibly across tools.

You get multiple choices

  • Star Schema Approach

    Adopt a star schema (Fact & Dimension along with joins in metric views) design as it simplifies metric view creation with easier governance and extensibility

  • Modular Metric Views- Create separate modular metric views based on purpose (Sales by State - Metric View with a join between sales and state dimension, Sales by Maker - Metric view with a join between sales and product dimension). It aligns well with domain driven design and easy to manage

More details here

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @IM_01,

Yes. Metric views are explicitly designed to give you SQL cube-like behaviour.

A metric view lets you define measures once, independent of dimensions, then aggregate those measures over any combination of dimensions at query time, which is the core behaviour you get from cubes.

When querying a metric view, you can use GROUP BY GROUPING SETS (and thus CUBE/ROLLUP patterns) on its dimensions, so you can generate detail rows, subtotals, and grand totals in a single query, just like with GROUP BY CUBE.

The main difference vs. a classic OLAP cube is that metric views are SQL/UC-native and compute aggregations on demand (optionally with materialization for performance), rather than maintaining a separate multidimensional cube engine.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

IM_01
Contributor III

@Ashwin_DSA  could you please provide an example .

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @IM_01,

Here is a simple example..

%sql
CREATE SCHEMA IF NOT EXISTS uc_customer_demo_catalog.metric_demo;

CREATE OR REPLACE TABLE uc_customer_demo_catalog.metric_demo.orders (
  order_id   INT,
  order_date DATE,
  country    STRING,
  product    STRING,
  revenue    DOUBLE
);

INSERT INTO uc_customer_demo_catalog.metric_demo.orders VALUES
  (1, '2025-01-01', 'US', 'Laptop', 1200.0),
  (2, '2025-01-02', 'US', 'Laptop',  800.0),
  (3, '2025-01-03', 'US', 'Phone',   500.0),
  (4, '2025-01-01', 'UK', 'Laptop', 1000.0),
  (5, '2025-01-02', 'UK', 'Phone',   400.0),
  (6, '2025-01-03', 'UK', 'Phone',   600.0),
  (7, '2025-01-04', 'DE', 'Laptop',  700.0),
  (8, '2025-01-04', 'DE', 'Phone',   300.0);

Created a metric view as below.

metric_view.png

 And here is the query I used.. 

SELECT
  country,
  product,
  MEASURE(total_revenue) AS total_revenue
FROM uc_customer_demo_catalog.metric_demo.orders_metric_view
GROUP BY GROUPING SETS (
  (country, product),
  (country),
  (product),
  ()
)
ORDER BY country, product;

Ashwin_DSA_0-1776098623292.png

Does this give you an idea?

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

 

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

IM_01
Contributor III

Hi @Ashwin_DSA 

Thanks for the example . So it abstracts the logic of metrics can we make the group by cols dynamic actually the scenario is dynamically based on filters like - all or specific cols selection calculate metrics. So is it good to go with metric views or should I calculate & persist the metrics as table columns could you please suggest

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @IM_01,

You define the metric logic once (measures) and a set of dimensions. At query time, you can group by any subset of those dimensions (or none), so your "All" case (no group by), "group by 1 column", "group by 3 columns", etc., are all just different SQL queries over the same metric view.

Example patterns over the same metric view:

-- Grand total (like "All")
SELECT MEASURE(total_revenue) FROM uc_customer_demo_catalog.metric_demo.orders_metric_view;

-- Group by whatever dimensions the user selected
SELECT
  country,
  MEASURE(total_revenue)
FROM uc_customer_demo_catalog.metric_demo.orders_metric_view
GROUP BY country;

SELECT
  country,
  product,
  MEASURE(total_revenue)
FROM uc_customer_demo_catalog.metric_demo.orders_metric_view
GROUP BY country, product;

So for a UI where users can change filters and choose any combination of dimensions, metric views are the recommended approach. They keep your metric definitions centralized and let the UI/SQL decide the group-by dynamically.

You only need to pre-calculate and persist aggregates in tables when you have a small, fixed set of aggregates and you need very tight latency and are happy to manage extra tables.

Even then, Databricks also offers metric view materialization (pre-aggregated views behind the scenes) so you can keep the semantic layer clean and still get performance.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

 

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

IM_01
Contributor III

Hi @Ashwin_DSA 
Thanks for the response ‌‌🙂
I was thinking if the results are precomputed using cube and persisted using mv that would be retrieve results faster than metric view. Could u please let me know if my understanding is correct

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @IM_01,

A metric view itself is just a semantic definition (measures + dimensions) on top of some source table. By default, queries aggregate on the fly over the source... similar to querying a normal fact table with GROUP BY. A CUBE result persisted in a table/materialised view is a physically pre-aggregated table. If your query exactly matches those groupings, it usually is faster because it’s just scanning/aggregating much less data. So yes... if you compare fully precomputed cube table vs metric view reading raw fact each time, then the cube table can be faster for those specific group-bys.

However, metric views also support precomputation. The YAML can include a materialisation section that defines one or more aggregated materialised views (for specific dimension sets), and the engine automatically rewrites queries to hit those pre-aggregations. That gives you the same kind of pre-aggregated data you’d get from a hand-rolled cube table, with the added benefit of metric definitions staying in one place. You can still query other dimension combos... only the heavy/common ones use materialisations.

The takeaway is that... If you have a small, fixed set of critical group-bys and extreme latency requirements, pre-aggregated tables (or metric-view materialisations for those combos) are the right optimisation. For interactive analytics where users slice/dice across many combinations, it’s usually better to define the metrics in a metric view and turn on materialisation for the most common/expensive group-bys, rather than maintaining your own cube tables. That way you can get essentially the same performance benefits while keeping the semantic layer clean and flexible.

Check this out if you are interested. 

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***