- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-10-2026 03:00 AM - edited 03-10-2026 03:02 AM
Hi folks,
I have a question on whether I am using materialized views right.
Our pipeline looks like this:
1. A spark job creates a table `source` with columns a, b and c.
2. A materialized view `target` is created on `source`. I want to partition it by `a`, so I add a `CLUSTER BY`.
3. PowerBI queries `target` by doing `select * from target where a is between x and y` (the reason why I want a partition)
CREATE MATERIALIZED VIEW IF NOT EXISTS target
TBLPROPERTIES ('delta.columnMapping.mode' = 'name')
TRIGGER ON UPDATE
select
a,
b,
sum(c)
from source
group by all
CLUSTER BY (`a`)Unfortunately the PowerBi querying is still very slow (compared to a simple view without clustering).
But this could be a bandwith issue, bit hard to find out.
Two questions:
1. Is the definition of the materialized view correct?
2. Would it matter to add partitioning additionally in the source table (let's assume this table is not accessed otherwise)?
For any clarifications or hints, please let me know.
Thanks and best regards