4 weeks ago
I have trouble understanding why, for every report in a dashboard that refers to the same data source, the query re-runs each time
Ideally I would want queries being used to fuel the tables for the dashboard run exactly once, and then have the filters/parameters run on top of this already built-table
Is that an incorrect expectation? Or is there a better way around this?
4 weeks ago - last edited 4 weeks ago
Hi @nanditakrishnan ,
There's already something like that in databricks dashboards, but some conditions need to be fulfilled (i.e queries need to share same group by). One of dataset optimization techniques that databricks team implemented is doing following:
"For visualization queries sent to the backend, separate visualization queries against the same dataset that share the same GROUP BY clauses and filter predicates are combined into a single query for processing. In this case, users may see a combined query in the query history that fetches results for multiple visualizations or filters."
There are also other optimization tricks that they're doing - you can read about them below:
Dataset optimization and caching - Azure Databricks | Microsoft Learn
4 weeks ago - last edited 4 weeks ago
Hi @nanditakrishnan ,
There's already something like that in databricks dashboards, but some conditions need to be fulfilled (i.e queries need to share same group by). One of dataset optimization techniques that databricks team implemented is doing following:
"For visualization queries sent to the backend, separate visualization queries against the same dataset that share the same GROUP BY clauses and filter predicates are combined into a single query for processing. In this case, users may see a combined query in the query history that fetches results for multiple visualizations or filters."
There are also other optimization tricks that they're doing - you can read about them below:
Dataset optimization and caching - Azure Databricks | Microsoft Learn
4 weeks ago
I see multiple queries in my query history instead of a combined one. Thanks for sharing the doc though - my data being greater than 100K rows and dashboard being parameterized could be the reason.
4 weeks ago
Your expectation is reasonable, but most dashboards are designed to recompute queries per visual. The best workaround is to precompute the base dataset in a table or view, and have all visuals query that instead. This reduces query load and ensures consistent results across visuals.
4 weeks ago
The base table/query for all of my visuals is the same - it's a single query coming from the Data Tab
Unless you mean a precomputed table that actually lives in a schema, in which case the problem is I can't give users the flexibility of choosing the date ranges
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now