10-11-2022 01:25 PM
Hi there,
I came across this Databricks demo from the below link.
Kindly Fastforward to time 16:30 or 16:45 of the video and watch few mins of the video related to cost. My understanding is the data is in the lake and databricks performed computation in top of that.
Question 1: What does he refer to as "lake"? did he mean an container and files in azure or aws storage location? I know Databricks can read from any storage location.
Question 2:
Correct me if im wrong, is my below understanding of the best practice correct to have the cost minimal by doing the below steps?
1) Make data files available in storage accounts (probably as parquet format)
2) Create notebooks to compute everything on the fly,
3) Write the processed output file or files back to storage locations,
4) Add the notebook or books to pipeline and run the pipeline
5) Automatically shutdown all clusters.
This way the Databricks cost is way less? is that right? Again plz correct me if im wrong.
Question 3:
Now does the same above methods apply to Delta lake as well? Like delta live tables, etc.? or delta is a feature applicable only as long as the data is inside databricks and not in container storage locations in azure or aws.
Question 4:
Appreciate if you could share any articles or videos which share step by step best practice to reduce cost in Databricks so I can do a small PoC and share it with my client (ingest data from api, store 30-50gb of data, how that data gets processed in pipeline, shutdown all db clusters automatically, now the data is available for reporting from containers).
As of my skillset, I have a long working history on datawarehouse, staging tables, facts, dimensions, incremental loads, partitions, indexes, etc... im just trying to make my client move into Databricks.
any best practice articles you could share would be helpful.
Thanks
10-18-2022 02:49 AM
Hi @AJ DJ, Here is an interesting article from Databricks on Databricks lake. Please have a look at it.
10-25-2022 03:38 PM
Hi @AJ DJ , We haven’t heard from you since the last response from me, and I was checking back to see if you have a resolution yet.
If you have any solution, please share it with the community as it can be helpful to others. Otherwise, we will respond with more details and try to help.
Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.
10-26-2022 02:58 PM
Thank you. However i'm afraid the above link you shared, didnt answer specific details related to the above questions.
10-27-2022 04:39 AM
Thank you for your response @AJ DJ !
Let me get back to you on this.
11-19-2022 06:39 AM
Hi @AJ DJ
Hope all is well!
Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.
We'd love to hear from you.
Thanks!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group