cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks Dashboard Optimization

nanditakrishnan
New Contributor III

I have trouble understanding why, for every report in a dashboard that refers to the same data source, the query re-runs each time 

Ideally I would want queries being used to fuel the tables for the dashboard run exactly once, and then have the filters/parameters run on top of this already built-table

Is that an incorrect expectation? Or is there a better way around this?

4 REPLIES 4

szymon_dybczak
Esteemed Contributor III

Hi @nanditakrishnan ,

There's already something like that in databricks dashboards, but some conditions need to be fulfilled (i.e queries need to share same group by). One of dataset optimization techniques that databricks team implemented is doing following:

"For visualization queries sent to the backend, separate visualization queries against the same dataset that share the same GROUP BY clauses and filter predicates are combined into a single query for processing. In this case, users may see a combined query in the query history that fetches results for multiple visualizations or filters."

There are also other optimization tricks that they're doing - you can read about them below:

Dataset optimization and caching - Azure Databricks | Microsoft Learn

I see multiple queries in my query history instead of a combined one. Thanks for sharing the doc though - my data being greater than 100K rows and dashboard being parameterized could be the reason.

Poorva21
Visitor

Your expectation is reasonable, but most dashboards are designed to recompute queries per visual. The best workaround is to precompute the base dataset in a table or view, and have all visuals query that instead. This reduces query load and ensures consistent results across visuals.

The base table/query for all of my visuals is the same - it's a single query coming from the Data Tab

Unless you mean a precomputed table that actually lives in a schema, in which case the problem is I can't give users the flexibility of choosing the date ranges