cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How I Tuned Databricks Query Performance from Power BI Desktop

Brahmareddy
Contributor III

As someone who frequently works with large datasets in Power BI, I’ve had my fair share of frustrations with slow query performance, especially when pulling data from Databricks. After countless hours of tweaking and experimenting, I’ve finally found a series of strategies that significantly improved my workflow. Here’s a detailed account of what worked for me, in the hopes it might help you too.

The Initial Frustration: Slow Queries and Lagging Reports

When I first started connecting Power BI Desktop to Databricks, everything seemed straightforward. I could pull data, create visuals, and build reports just like I was used to. But as the datasets grew, so did the lag. My reports were taking longer to load, and queries that once ran smoothly were now crawling. I knew I needed to find a solution.

Step 1: Optimize Queries at the Source

The first thing I learned was the importance of optimizing queries directly in Databricks. Here’s what I did:

Filter Early: Initially, I was pulling entire datasets into Power BI and then applying filters. This was a big mistake. Instead, I started applying filters directly in my Databricks SQL queries. For example, instead of pulling all sales data and then filtering by region in Power BI, I added a WHERE clause in my Databricks query to filter by region before the data even left the server. This simple change drastically reduced the amount of data being transferred and sped up my reports.

SELECT * FROM sales_data WHERE region = ‘North America’

Aggregate Data: I also realized I didn’t always need granular data in Power BI. For instance, instead of pulling every transaction, I aggregated data at the monthly level directly in Databricks. This not only reduced the data size but also made the subsequent analysis much quicker.

Read full article here -

https://medium.com/towards-data-engineering/how-i-tuned-databricks-query-performance-from-power-bi-d...

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group