Greetings @malterializedvw , I did some digging and have some helpful hints for your to consider as you work through your scenario.
Your MV definition looks syntactically fine, but there are a few things I’d check.
First, CLUSTER BY on a materialized view applies liquid clustering to the materialized output, not the source table. Since you’re doing a GROUP BY with sum(c), the MV is effectively becoming a pre-aggregated table clustered by a, which is the right general idea for a Power BI range-filter use case.
That said, liquid clustering is only helpful once the data has actually been physically reorganized. If the MV was just created or recently refreshed, the file layout may not yet reflect the clustering strategy. I’d verify whether compaction and clustering work have actually kicked in. You can check the table’s clustering metrics, or run OPTIMIZE on the MV directly to force the issue.
A couple of other things are worth looking at as well:
-
delta.columnMapping.mode = 'name' can introduce a bit of overhead and occasionally causes compatibility quirks with certain BI connectors. If you do not specifically need column rename/drop support, it may be worth removing that setting just to see whether behavior improves.
-
TRIGGER ON UPDATE simply means refreshes happen when explicitly triggered by your pipeline or process. That is fine, but make sure those refreshes are actually happening after source-table changes so the MV stays current.
On your second question: partitioning the source table is unlikely to help Power BI query performance in this case. The MV materializes its own data, so source-table layout mainly affects refresh efficiency, not read performance against the MV itself.
For the slow Power BI queries specifically, this is where I’d start:
-
Check the query profile in Databricks SQL and confirm whether file pruning is happening on the a filter. If pruning is not happening, the clustering likely is not effective yet.
-
Confirm whether Power BI is using DirectQuery or Import mode. DirectQuery adds translation and round-trip overhead on every interaction, which can become noticeable very quickly.
-
Keep the bandwidth point in mind too. Even with good clustering, if the filtered result set is still fairly large, data transfer itself may be the bottleneck.
Hope this helps, Louis.