cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Effectively refresh Power BI report based on Delta Lake

alesventus
Contributor

Hi, I have several Power BI reports based on Delta Lake tables that are refreshed every 4 hours. ETL process in Databricks is much cheaper that refresh of these Power BI reports. My questions are: if approach described below is correct and if there is any better way how to work with Power BI in Databricks.

My scenario:

I have several star schemas in Gold delta lake in Databricks. I use all purpose cluster for connection in Power BI.

alesventus_0-1723191725173.png

Storage mode in Power BI for all tables is set to Import. So I import all data to PBIX file.

In Databricks job I have another cluster to run ELT and run notebook with API call to trigger PBI refresh. If interested here is tutorial.

In azure portal I can clearly see what are costs for ELT and for PBI refresh based on cluster tag.

And costs for PBI refresh are much higher that for ELT.

alesventus_1-1723192163389.png

I tried to use sql dwh cluster for PBI refresh since star schema is written in sql, but costs were even higher despite what documentation says - "Run all SQL and BI applications at scale with up to 12x better price-performance".

Im thinking now about incremental load to PBI but there are some disadvantages of this approach - you cannot download PBIX file after this change.

Another ways could be (did not test yet) to load data to asql and connect PBI to asql instead of delta tables. There will be costs to load data to asql from delta tables after every ELT run.

Or should I create connection to Power BI in direct mode? Wouldn't be then cluster calculate data after every filter selection? 

Any advice on this topic?

 

 

1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee

Current Approach Assessment

  • Power BI Import Mode: Importing all table data results in full dataset refreshes, driving up compute and data transfer costs during each refresh.

  • Delta Lake as Source: Databricks clusters are used for both ETL and responding to Power BI refresh queries. All-purpose clusters can be expensive since they are high-performing and designed for interactive workloads.

  • SQL Warehouse Attempt: Switching to Databricks SQL Warehouse did not reduce your refresh costs as much as expected. Real-world workloads may not always match the advertised savings, especially with frequent large imports and complex star schemas.

  • Incremental Refresh: This can reduce refresh time and cost, but introduces complexity (e.g., PBIX download restrictions and additional setup).

  • Azure SQL Alternative: Loading data from Delta to Azure SQL Database could allow Power BI to connect using a more BI-tuned engine, but incurs its own charges and transfer overhead.

  • DirectQuery Option: This could reduce Power BI refresh costs by delegating query execution to Databricks on-demand, but might introduce latency and increased compute cost on cluster side with every report/filter interaction.

Recommended Alternatives

1. Optimize Databricks Cluster Usage

  • Use Databricks SQL Warehouses rather than all-purpose clusters for Power BI connections. They are optimized for BI workloads and billed based on usage and capacity units, not cluster uptime.

  • Adjust warehouse size and auto-stop settings to avoid unnecessary compute costs.

2. Consider Incremental Refresh in Power BI

  • Incremental refresh lowers refresh costs and improves performance by updating only new/changed data. If PBIX file access is critical, use deployment pipelines to maintain access to the source file and separate development from production.

  • Carefully partition your tables by date or ID to maximize incremental refresh benefits.

3. Explore DirectQuery Mode

  • DirectQuery sends queries to Databricks only when users interact with reports, potentially reducing overall Power BI refresh frequency and costs.

  • However, it can increase Databricks compute at query time and may impact report responsiveness, especially with large or complex models.

4. Offload Aggregations to ETL/SQL Warehouse

  • Pre-aggregate or flatten data during your Databricks ELT process, reducing the data size and complexity Power BI ingests.

  • If feasible, expose aggregate or summary tables in Gold layer for reporting.

5. Hybrid Azure SQL Solution

  • If Azure SQL Database is affordable for your volumes, consider moving only critical reporting tables from Delta Lake to SQL.

  • Use Databricks for transformation and Azure Data Factory or native features to copy results.

Cost Management Tips

  • Monitor all-purpose vs. SQL Warehouse vs. Azure SQL costs and query volumes to guide future decisions.

  • Use Power BI Premium for larger datasets and advanced incremental refresh, which may further optimize costs in some scenarios.

  • Tag clusters/warehouses and analyze usage patterns to reduce unnecessary spend.

Summary Table: Power BI + Databricks Cost/Performance Options

Approach Cost Efficiency Performance Disadvantages
Import from Delta Lake Moderate to High Fast (cached) High refresh costs
Databricks SQL Warehouse Potentially High Better BI tuning Still costly if scaled up
Incremental Refresh High Fast refresh PBIX download limitations
DirectQuery to Delta Pay per query Real-time Possible latency, high query cost
Azure SQL as source Varies Moderate to Fast ETL and storage costs
 
 

Final Recommendations

  • Begin with Databricks SQL Warehouses and optimize Power BI refresh strategy with incremental loading.

  • Test DirectQuery for specific reports where real-time data is critical and cost is less sensitive.

  • Aggregate your data as much as possible before loading into Power BI.

  • Monitor/compare actual costs per approach at scale; each environment's cost profiles vary.

This should help you choose the best combination for your workflow, balancing cost, performance, and reporting needs.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now