cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Cost as per the Databricks demo

AJDJ
New Contributor III

Hi there,

I came across this Databricks demo from the below link.

https://youtu.be/BqB7YQ1-KKc

Kindly Fastforward to time 16:30 or 16:45 of the video and watch few mins of the video related to cost. My understanding is the data is in the lake and databricks performed computation in top of that.

Question 1: What does he refer to as "lake"? did he mean an container and files in azure or aws storage location? I know Databricks can read from any storage location.

Question 2: 

Correct me if im wrong, is my below understanding of the best practice correct to have the cost minimal by doing the below steps?

1) Make data files available in storage accounts (probably as parquet format)

2) Create notebooks to compute everything on the fly,

3) Write the processed output file or files back to storage locations,

4) Add the notebook or books to pipeline and run the pipeline

5) Automatically shutdown all clusters.

This way the Databricks cost is way less? is that right? Again plz correct me if im wrong.

Question 3:

Now does the same above methods apply to Delta lake as well? Like delta live tables, etc.? or delta is a feature applicable only as long as the data is inside databricks and not in container storage locations in azure or aws.

Question 4:

Appreciate if you could share any articles or videos which share step by step best practice to reduce cost in Databricks so I can do a small PoC and share it with my client (ingest data from api, store 30-50gb of data, how that data gets processed in pipeline, shutdown all db clusters automatically, now the data is available for reporting from containers).

As of my skillset, I have a long working history on datawarehouse, staging tables, facts, dimensions, incremental loads, partitions, indexes, etc... im just trying to make my client move into Databricks.

any best practice articles you could share would be helpful.

Thanks

2 REPLIES 2

AJDJ
New Contributor III

Thank you. However i'm afraid the above link you shared, didnt answer specific details related to the above questions.

Anonymous
Not applicable

Hi @AJ DJ​ 

Hope all is well!

Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group