cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Is there a way to persist the delta cache even after the cluster restart?

Manoj
Contributor II

Hi Team, we are planning to connect Power BI directly to Data bricks, however data fetching using direct query isn't giving great performance, though we are using Zorder by and Partition etc.. 

We decided to use Delta Cache, but the cache tables area getting wiped out if a cluster restarts, In our prod environment if this happens customers experience the delay in Direct Query.

Simple work around is make sure you will never let your cluster do down... But we can't go with that option as its costly. So wanted to understand if there is a way to persist the delta cache even after cluster restart

Thanks.

Manoj

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

It runs on the tables/databases defined in Databricks and uses an optimized reader (Photon).

We are also expecting a serverless offering, without the need to create a cluster (so pay-per-use ).

It is not hard to set up so you can test it against a classic cluster.

View solution in original post

9 REPLIES 9

-werners-
Esteemed Contributor III

Have you tried the Databricks SQL environment?

It is designed for online querying. There is also a serverless env, but that is not yet GA.

Also something to try: import the data into power bi. If your dataset is not too big (or you have a premium subscription) .

Manoj
Contributor II

@Werner Stinckensโ€‹ , Thanks for reply, Is the SQL environment is going to run the queries on hive data base ? Wanted to know what data retrieval optimization it has, when we compare querying on data bricks cluster vs SQL End Point

Power BI premium capacity is X times costlier than a 1TB modern SSD that we get in a Delta Accelerated Worker. Importing data on a daily basis to power bi is kind of tedious for large data sets

-werners-
Esteemed Contributor III

It runs on the tables/databases defined in Databricks and uses an optimized reader (Photon).

We are also expecting a serverless offering, without the need to create a cluster (so pay-per-use ).

It is not hard to set up so you can test it against a classic cluster.

Manoj
Contributor II

@Werner Stinckensโ€‹  I am going to it test and see if it can help solve the problem. Thanks for the suggestion

Hubert-Dudek
Esteemed Contributor III

If you are using delta lake (some dbfs mount) you can read it by Power BI directly and create Power BI dataset so you will have all data in Power BI model and query there (you can set daily refresh on Power BI). Not sure how big your data and how often is changing (Power BI Premium handle by default up to 100 GB).

Manoj
Contributor II

@Hubert Dudekโ€‹ , we have couple of tables that are of 36GB in size, when we import all the data to create an enterprise model, its giving us heard time both size wise and for doing the refresh, So wanted to use some of the benefits of composite modeling with Direct Query.

other challenge that we are facing is power bi data set can't detect the incremental changes and apply on the data set, as it doesn't do inserts and updates. So the refresh needs to full refresh

Hubert-Dudek
Esteemed Contributor III

yes I understand that problem. I have currently request to implement incremental import to Power BI from data lake so I will see how it will go ๐Ÿ™‚

Manoj
Contributor II

@Hubert Dudekโ€‹ , I got a good news, I agree with @Werner Stinckensโ€‹ , SQL End Point is super fast, I have tested for 143 million records with Direct Query from power bi, result returned in 10-12 seconds.

Don't even try doing incremental in power bi, power bi isn't meant for it, we ended up fishing the data while doing so and that wasn't appropriate.

Hubert-Dudek
Esteemed Contributor III

Thank you for your advise regarinf Power BI. I will follow it and left just full reload.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group