cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Improve query performance of direct query with Databricks

viniciuscini
New Contributor

I’m building a dashboard in Power BI’s Pro Workspace, connecting data via Direct Query from Databricks (around 60 million rows from 15 combined tables), using a SQL Serverless (small size and 4 clusters).

The problem is that the dashboard is taking too long to load.

Considering that I can’t change my workspace to Premium, work with import mode, or use aggregation tables, is there anything else I can do besides what I’ve already done (listed below)?

  • Model tables in a “Star schema” in Power BI;
  • Materialize all tables in Databricks, avoiding as much as possible calculations and joins inside Power BI;
  • Check the “Referential Integrity” box when joining tables in Power BI;
  • Avoid using bi-directional filters;
  • Avoid using calculated columns or any field with “line-to-line interaction”;
  • Avoid nesting one calculation inside another;
  • Set up dimension tables in Dual Storage Mode;
  • Add an “Apply” button to apply all changes at once.
  • Increase the “Maximum connections per data source” in Power BI's configuration;
  • Increase the “Maximum number of simultaneous evaluations” in Power BI's configuration;
  • Increase the “Maximum memory used per simultaneous evaluation” in Power BI's configuration;
  • Uncheck “Auto date/time” in Power BI's configuration;
  • Uncheck “Allow data preview to download in the background” in Power BI's configuration;
  • Uncheck “Enable parallel loading of tables” in Power BI's configuration;
  • Uncheck “Turn on Q&A to ask natural language questions about your data” in Power BI's configuration;
  • Uncheck “Cross highlighting/filtering by default” in Power BI's configuration;

P.S.: I also tried to scale up the SQL Serverless (both memory and the number of nodes), but despite the dashboard taking 3-4 minutes to load, only 20%-30% of the compute power is being used. So, it does not seem to be a matter of compute power.

Due to that, I suspect that connection latency might have something to do with the slow performance.

Vini
1 REPLY 1

Brahmareddy
Honored Contributor

Hi @viniciuscini, How are you doing?

It sounds like you've already taken a lot of the right steps to optimize your Power BI dashboard, especially given the constraints you mentioned. Since you've ruled out many common optimizations and noticed that the compute power isn't fully utilized, connection latency could indeed be a significant factor.

To speed up your Power BI dashboard, try reducing network latency by ensuring Power BI and Databricks are in the same region. Optimize your SQL queries in Databricks, and use caching and partitioning to improve performance. Minimize complex calculations and measures in Power BI, and check that query folding is happening. Monitor both Databricks and Power BI for any bottlenecks, and consider tweaking the parallelism settings. Lastly, ensure your machine and browser are optimized for performance.

Regards,

Brahma

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group