cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Cost as per the Databricks demo

AJDJ
New Contributor III

Hi there,

I came across this Databricks demo from the below link.

https://youtu.be/BqB7YQ1-KKc

Kindly Fastforward to time 16:30 or 16:45 of the video and watch few mins of the video related to cost. My understanding is the data is in the lake and databricks performed computation in top of that.

Question 1: What does he refer to as "lake"? did he mean an container and files in azure or aws storage location? I know Databricks can read from any storage location.

Question 2: 

Correct me if im wrong, is my below understanding of the best practice correct to have the cost minimal by doing the below steps?

1) Make data files available in storage accounts (probably as parquet format)

2) Create notebooks to compute everything on the fly,

3) Write the processed output file or files back to storage locations,

4) Add the notebook or books to pipeline and run the pipeline

5) Automatically shutdown all clusters.

This way the Databricks cost is way less? is that right? Again plz correct me if im wrong.

Question 3:

Now does the same above methods apply to Delta lake as well? Like delta live tables, etc.? or delta is a feature applicable only as long as the data is inside databricks and not in container storage locations in azure or aws.

Question 4:

Appreciate if you could share any articles or videos which share step by step best practice to reduce cost in Databricks so I can do a small PoC and share it with my client (ingest data from api, store 30-50gb of data, how that data gets processed in pipeline, shutdown all db clusters automatically, now the data is available for reporting from containers).

As of my skillset, I have a long working history on datawarehouse, staging tables, facts, dimensions, incremental loads, partitions, indexes, etc... im just trying to make my client move into Databricks.

any best practice articles you could share would be helpful.

Thanks

5 REPLIES 5

Kaniz
Community Manager
Community Manager

Hi @AJ DJ​, Here is an interesting article from Databricks on Databricks lake. Please have a look at it.

Kaniz
Community Manager
Community Manager

Hi @AJ DJ​ ​, We haven’t heard from you since the last response from me, and I was checking back to see if you have a resolution yet.

If you have any solution, please share it with the community as it can be helpful to others. Otherwise, we will respond with more details and try to help.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

AJDJ
New Contributor III

Thank you. However i'm afraid the above link you shared, didnt answer specific details related to the above questions.

Kaniz
Community Manager
Community Manager

Thank you for your response @AJ DJ​ !

Let me get back to you on this.

Anonymous
Not applicable

Hi @AJ DJ​ 

Hope all is well!

Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.