โ10-25-2021 10:39 AM
Hi Team, we are planning to connect Power BI directly to Data bricks, however data fetching using direct query isn't giving great performance, though we are using Zorder by and Partition etc..
We decided to use Delta Cache, but the cache tables area getting wiped out if a cluster restarts, In our prod environment if this happens customers experience the delay in Direct Query.
Simple work around is make sure you will never let your cluster do down... But we can't go with that option as its costly. So wanted to understand if there is a way to persist the delta cache even after cluster restart
Thanks.
Manoj
โ10-25-2021 11:35 AM
It runs on the tables/databases defined in Databricks and uses an optimized reader (Photon).
We are also expecting a serverless offering, without the need to create a cluster (so pay-per-use ).
It is not hard to set up so you can test it against a classic cluster.
โ10-25-2021 11:09 AM
Have you tried the Databricks SQL environment?
It is designed for online querying. There is also a serverless env, but that is not yet GA.
Also something to try: import the data into power bi. If your dataset is not too big (or you have a premium subscription) .
โ10-25-2021 11:26 AM
@Werner Stinckensโ , Thanks for reply, Is the SQL environment is going to run the queries on hive data base ? Wanted to know what data retrieval optimization it has, when we compare querying on data bricks cluster vs SQL End Point
Power BI premium capacity is X times costlier than a 1TB modern SSD that we get in a Delta Accelerated Worker. Importing data on a daily basis to power bi is kind of tedious for large data sets
โ10-25-2021 11:35 AM
It runs on the tables/databases defined in Databricks and uses an optimized reader (Photon).
We are also expecting a serverless offering, without the need to create a cluster (so pay-per-use ).
It is not hard to set up so you can test it against a classic cluster.
โ10-25-2021 02:04 PM
@Werner Stinckensโ I am going to it test and see if it can help solve the problem. Thanks for the suggestion
โ10-26-2021 04:01 AM
If you are using delta lake (some dbfs mount) you can read it by Power BI directly and create Power BI dataset so you will have all data in Power BI model and query there (you can set daily refresh on Power BI). Not sure how big your data and how often is changing (Power BI Premium handle by default up to 100 GB).
โ10-26-2021 05:27 AM
@Hubert Dudekโ , we have couple of tables that are of 36GB in size, when we import all the data to create an enterprise model, its giving us heard time both size wise and for doing the refresh, So wanted to use some of the benefits of composite modeling with Direct Query.
other challenge that we are facing is power bi data set can't detect the incremental changes and apply on the data set, as it doesn't do inserts and updates. So the refresh needs to full refresh
โ10-26-2021 05:33 AM
yes I understand that problem. I have currently request to implement incremental import to Power BI from data lake so I will see how it will go ๐
โ10-26-2021 08:41 AM
@Hubert Dudekโ , I got a good news, I agree with @Werner Stinckensโ , SQL End Point is super fast, I have tested for 143 million records with Direct Query from power bi, result returned in 10-12 seconds.
Don't even try doing incremental in power bi, power bi isn't meant for it, we ended up fishing the data while doing so and that wasn't appropriate.
โ10-27-2021 01:43 AM
Thank you for your advise regarinf Power BI. I will follow it and left just full reload.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group