<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Cost as per the Databricks demo in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/cost-as-per-the-databricks-demo/m-p/27908#M19746</link>
    <description>&lt;P&gt;Hi there,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I came across this Databricks demo from the below link. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://youtu.be/BqB7YQ1-KKc" alt="https://youtu.be/BqB7YQ1-KKc" target="_blank"&gt;https://youtu.be/BqB7YQ1-KKc&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Kindly Fastforward to time 16:30 or 16:45 of the video and watch few mins of the video related to cost. My understanding is the data is in the lake and databricks performed computation in top of that. &lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;U&gt;Question 1&lt;/U&gt;&lt;/B&gt;: What does he refer to as "lake"? did he mean an container and files in azure or aws storage location? I know Databricks can read from any storage location. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;U&gt;Question 2:&lt;/U&gt;&lt;/B&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Correct me if im wrong, is my below understanding of the best practice correct to have the cost minimal by doing the below steps? &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1) Make data files available in storage accounts (probably as parquet format)&lt;/P&gt;&lt;P&gt;2) Create notebooks to compute everything on the fly,&lt;/P&gt;&lt;P&gt;3) Write the processed output file or files back to storage locations, &lt;/P&gt;&lt;P&gt;4) Add the notebook or books to pipeline and run the pipeline&lt;/P&gt;&lt;P&gt;5) Automatically shutdown all clusters. &lt;/P&gt;&lt;P&gt;This way the Databricks cost is way less? is that right? Again plz correct me if im wrong.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;U&gt;Question 3&lt;/U&gt;&lt;/B&gt;:&lt;/P&gt;&lt;P&gt;Now does the same above methods apply to Delta lake as well? Like delta live tables, etc.? or delta is a feature applicable only as long as the data is inside databricks and not in container storage locations in azure or aws.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;U&gt;Question 4&lt;/U&gt;&lt;/B&gt;:&lt;/P&gt;&lt;P&gt;Appreciate if you could share any articles or videos which share step by step best practice to reduce cost in Databricks so I can do a small PoC and share it with my client (ingest data from api, store 30-50gb of data, how that data gets processed in pipeline, shutdown all db clusters automatically, now the data is available for reporting from containers). &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;As of my skillset, I have a long working history on datawarehouse, staging tables, facts, dimensions, incremental loads, partitions, indexes, etc... im just trying to make my client move into Databricks. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;any best practice articles you could share would be helpful.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 11 Oct 2022 20:25:29 GMT</pubDate>
    <dc:creator>AJDJ</dc:creator>
    <dc:date>2022-10-11T20:25:29Z</dc:date>
    <item>
      <title>Cost as per the Databricks demo</title>
      <link>https://community.databricks.com/t5/data-engineering/cost-as-per-the-databricks-demo/m-p/27908#M19746</link>
      <description>&lt;P&gt;Hi there,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I came across this Databricks demo from the below link. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://youtu.be/BqB7YQ1-KKc" alt="https://youtu.be/BqB7YQ1-KKc" target="_blank"&gt;https://youtu.be/BqB7YQ1-KKc&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Kindly Fastforward to time 16:30 or 16:45 of the video and watch few mins of the video related to cost. My understanding is the data is in the lake and databricks performed computation in top of that. &lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;U&gt;Question 1&lt;/U&gt;&lt;/B&gt;: What does he refer to as "lake"? did he mean an container and files in azure or aws storage location? I know Databricks can read from any storage location. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;U&gt;Question 2:&lt;/U&gt;&lt;/B&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Correct me if im wrong, is my below understanding of the best practice correct to have the cost minimal by doing the below steps? &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1) Make data files available in storage accounts (probably as parquet format)&lt;/P&gt;&lt;P&gt;2) Create notebooks to compute everything on the fly,&lt;/P&gt;&lt;P&gt;3) Write the processed output file or files back to storage locations, &lt;/P&gt;&lt;P&gt;4) Add the notebook or books to pipeline and run the pipeline&lt;/P&gt;&lt;P&gt;5) Automatically shutdown all clusters. &lt;/P&gt;&lt;P&gt;This way the Databricks cost is way less? is that right? Again plz correct me if im wrong.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;U&gt;Question 3&lt;/U&gt;&lt;/B&gt;:&lt;/P&gt;&lt;P&gt;Now does the same above methods apply to Delta lake as well? Like delta live tables, etc.? or delta is a feature applicable only as long as the data is inside databricks and not in container storage locations in azure or aws.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;U&gt;Question 4&lt;/U&gt;&lt;/B&gt;:&lt;/P&gt;&lt;P&gt;Appreciate if you could share any articles or videos which share step by step best practice to reduce cost in Databricks so I can do a small PoC and share it with my client (ingest data from api, store 30-50gb of data, how that data gets processed in pipeline, shutdown all db clusters automatically, now the data is available for reporting from containers). &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;As of my skillset, I have a long working history on datawarehouse, staging tables, facts, dimensions, incremental loads, partitions, indexes, etc... im just trying to make my client move into Databricks. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;any best practice articles you could share would be helpful.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 11 Oct 2022 20:25:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cost-as-per-the-databricks-demo/m-p/27908#M19746</guid>
      <dc:creator>AJDJ</dc:creator>
      <dc:date>2022-10-11T20:25:29Z</dc:date>
    </item>
    <item>
      <title>Re: Cost as per the Databricks demo</title>
      <link>https://community.databricks.com/t5/data-engineering/cost-as-per-the-databricks-demo/m-p/27911#M19749</link>
      <description>&lt;P&gt;Thank you. However i'm afraid the above link you shared, didnt answer specific details related to the above questions. &lt;/P&gt;</description>
      <pubDate>Wed, 26 Oct 2022 21:58:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cost-as-per-the-databricks-demo/m-p/27911#M19749</guid>
      <dc:creator>AJDJ</dc:creator>
      <dc:date>2022-10-26T21:58:34Z</dc:date>
    </item>
    <item>
      <title>Re: Cost as per the Databricks demo</title>
      <link>https://community.databricks.com/t5/data-engineering/cost-as-per-the-databricks-demo/m-p/27913#M19751</link>
      <description>&lt;P&gt;Hi @AJ DJ​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope all is well! &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or &lt;B&gt;mark an answer as best&lt;/B&gt;? Else please let us know if you need more help.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We'd love to hear from you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 19 Nov 2022 14:39:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cost-as-per-the-databricks-demo/m-p/27913#M19751</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-11-19T14:39:47Z</dc:date>
    </item>
  </channel>
</rss>

