Databricks Community

AJDJ · ‎10-11-2022

Hi there,

I came across this Databricks demo from the below link.

Kindly Fastforward to time 16:30 or 16:45 of the video and watch few mins of the video related to cost. My understanding is the data is in the lake and databricks performed computation in top of that.

Question 1: What does he refer to as "lake"? did he mean an container and files in azure or aws storage location? I know Databricks can read from any storage location.

Question 2:

Correct me if im wrong, is my below understanding of the best practice correct to have the cost minimal by doing the below steps?

1) Make data files available in storage accounts (probably as parquet format)

2) Create notebooks to compute everything on the fly,

3) Write the processed output file or files back to storage locations,

4) Add the notebook or books to pipeline and run the pipeline

5) Automatically shutdown all clusters.

This way the Databricks cost is way less? is that right? Again plz correct me if im wrong.

Question 3:

Now does the same above methods apply to Delta lake as well? Like delta live tables, etc.? or delta is a feature applicable only as long as the data is inside databricks and not in container storage locations in azure or aws.

Question 4:

Appreciate if you could share any articles or videos which share step by step best practice to reduce cost in Databricks so I can do a small PoC and share it with my client (ingest data from api, store 30-50gb of data, how that data gets processed in pipeline, shutdown all db clusters automatically, now the data is available for reporting from containers).

As of my skillset, I have a long working history on datawarehouse, staging tables, facts, dimensions, incremental loads, partitions, indexes, etc... im just trying to make my client move into Databricks.

any best practice articles you could share would be helpful.

Thanks

AJDJ · ‎10-26-2022

Thank you. However i'm afraid the above link you shared, didnt answer specific details related to the above questions.

Anonymous · ‎11-19-2022

Hi @AJ DJ

Hope all is well!

Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.

We'd love to hear from you.

Thanks!

Databricks Community

Cost as per the Databricks demo

Connect with Databricks Users in Your Area

Data + AI Summit 2025 — registration now open!

Women’s Week Challenge: Play, Engage & Win Swag

Databricks DevConnect: Global Community Meetups for Data Engineers

Databricks Community Champion - February 2025 - Stefan Koch