Databricks Community

Manoj · ‎10-25-2021

Hi Team, we are planning to connect Power BI directly to Data bricks, however data fetching using direct query isn't giving great performance, though we are using Zorder by and Partition etc..

We decided to use Delta Cache, but the cache tables area getting wiped out if a cluster restarts, In our prod environment if this happens customers experience the delay in Direct Query.

Simple work around is make sure you will never let your cluster do down... But we can't go with that option as its costly. So wanted to understand if there is a way to persist the delta cache even after cluster restart

Thanks.

Manoj

-werners- · ‎10-25-2021

It runs on the tables/databases defined in Databricks and uses an optimized reader (Photon).

We are also expecting a serverless offering, without the need to create a cluster (so pay-per-use ).

It is not hard to set up so you can test it against a classic cluster.

View solution in original post

-werners- · ‎10-25-2021

Have you tried the Databricks SQL environment?

It is designed for online querying. There is also a serverless env, but that is not yet GA.

Also something to try: import the data into power bi. If your dataset is not too big (or you have a premium subscription) .

Manoj · ‎10-25-2021

@Werner Stinckens , Thanks for reply, Is the SQL environment is going to run the queries on hive data base ? Wanted to know what data retrieval optimization it has, when we compare querying on data bricks cluster vs SQL End Point

Power BI premium capacity is X times costlier than a 1TB modern SSD that we get in a Delta Accelerated Worker. Importing data on a daily basis to power bi is kind of tedious for large data sets

-werners- · ‎10-25-2021

It runs on the tables/databases defined in Databricks and uses an optimized reader (Photon).

We are also expecting a serverless offering, without the need to create a cluster (so pay-per-use ).

It is not hard to set up so you can test it against a classic cluster.

Manoj · ‎10-25-2021

@Werner Stinckens I am going to it test and see if it can help solve the problem. Thanks for the suggestion

Hubert-Dudek · ‎10-26-2021

If you are using delta lake (some dbfs mount) you can read it by Power BI directly and create Power BI dataset so you will have all data in Power BI model and query there (you can set daily refresh on Power BI). Not sure how big your data and how often is changing (Power BI Premium handle by default up to 100 GB).

Manoj · ‎10-26-2021

@Hubert Dudek , we have couple of tables that are of 36GB in size, when we import all the data to create an enterprise model, its giving us heard time both size wise and for doing the refresh, So wanted to use some of the benefits of composite modeling with Direct Query.

other challenge that we are facing is power bi data set can't detect the incremental changes and apply on the data set, as it doesn't do inserts and updates. So the refresh needs to full refresh

Hubert-Dudek · ‎10-26-2021

yes I understand that problem. I have currently request to implement incremental import to Power BI from data lake so I will see how it will go 🙂

Manoj · ‎10-26-2021

@Hubert Dudek , I got a good news, I agree with @Werner Stinckens , SQL End Point is super fast, I have tested for 143 million records with Direct Query from power bi, result returned in 10-12 seconds.

Don't even try doing incremental in power bi, power bi isn't meant for it, we ended up fishing the data while doing so and that wasn't appropriate.