- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-25-2021 10:39 AM
Hi Team, we are planning to connect Power BI directly to Data bricks, however data fetching using direct query isn't giving great performance, though we are using Zorder by and Partition etc..
We decided to use Delta Cache, but the cache tables area getting wiped out if a cluster restarts, In our prod environment if this happens customers experience the delay in Direct Query.
Simple work around is make sure you will never let your cluster do down... But we can't go with that option as its costly. So wanted to understand if there is a way to persist the delta cache even after cluster restart
Thanks.
Manoj
- Labels:
-
databricks
-
Delta
-
Delta Cache
-
Powerbi
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-25-2021 11:35 AM
It runs on the tables/databases defined in Databricks and uses an optimized reader (Photon).
We are also expecting a serverless offering, without the need to create a cluster (so pay-per-use ).
It is not hard to set up so you can test it against a classic cluster.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-25-2021 11:09 AM
Have you tried the Databricks SQL environment?
It is designed for online querying. There is also a serverless env, but that is not yet GA.
Also something to try: import the data into power bi. If your dataset is not too big (or you have a premium subscription) .
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-25-2021 11:26 AM
@Werner Stinckens , Thanks for reply, Is the SQL environment is going to run the queries on hive data base ? Wanted to know what data retrieval optimization it has, when we compare querying on data bricks cluster vs SQL End Point
Power BI premium capacity is X times costlier than a 1TB modern SSD that we get in a Delta Accelerated Worker. Importing data on a daily basis to power bi is kind of tedious for large data sets
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-25-2021 11:35 AM
It runs on the tables/databases defined in Databricks and uses an optimized reader (Photon).
We are also expecting a serverless offering, without the need to create a cluster (so pay-per-use ).
It is not hard to set up so you can test it against a classic cluster.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-25-2021 02:04 PM
@Werner Stinckens I am going to it test and see if it can help solve the problem. Thanks for the suggestion
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-26-2021 04:01 AM
If you are using delta lake (some dbfs mount) you can read it by Power BI directly and create Power BI dataset so you will have all data in Power BI model and query there (you can set daily refresh on Power BI). Not sure how big your data and how often is changing (Power BI Premium handle by default up to 100 GB).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-26-2021 05:27 AM
@Hubert Dudek , we have couple of tables that are of 36GB in size, when we import all the data to create an enterprise model, its giving us heard time both size wise and for doing the refresh, So wanted to use some of the benefits of composite modeling with Direct Query.
other challenge that we are facing is power bi data set can't detect the incremental changes and apply on the data set, as it doesn't do inserts and updates. So the refresh needs to full refresh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-26-2021 05:33 AM
yes I understand that problem. I have currently request to implement incremental import to Power BI from data lake so I will see how it will go 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-26-2021 08:41 AM
@Hubert Dudek , I got a good news, I agree with @Werner Stinckens , SQL End Point is super fast, I have tested for 143 million records with Direct Query from power bi, result returned in 10-12 seconds.
Don't even try doing incremental in power bi, power bi isn't meant for it, we ended up fishing the data while doing so and that wasn't appropriate.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-27-2021 01:43 AM
Thank you for your advise regarinf Power BI. I will follow it and left just full reload.

