<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Need help skipping previously executed cells in a failed Databricks job calling a notebook with multiple SQL cells in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/need-help-skipping-previously-executed-cells-in-a-failed/m-p/5646#M2008</link>
    <description>&lt;P&gt;Thank you for you reply .. Yes I agree with your point on the design part. However, in the current project we have 1000s of sqls and set of sqls are kept under a single notebook to perform operations. But yes, your suggestion sounds good for the CREATE  TABLE IF NOT EXISTS and just now searched how we can make the INSERT CMD idempotent and got the below &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/ingestion/copy-into/" alt="https://learn.microsoft.com/en-us/azure/databricks/ingestion/copy-into/" target="_blank"&gt;&lt;B&gt;&lt;/B&gt;&lt;/A&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/ingestion/copy-into/" target="test_blank"&gt;https://learn.microsoft.com/en-us/azure/databricks/ingestion/copy-into/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Will try to re-visit and see if we can make use of databrick's retries option.&lt;/P&gt;</description>
    <pubDate>Tue, 18 Apr 2023 10:36:51 GMT</pubDate>
    <dc:creator>Sandy84</dc:creator>
    <dc:date>2023-04-18T10:36:51Z</dc:date>
    <item>
      <title>Need help skipping previously executed cells in a failed Databricks job calling a notebook with multiple SQL cells</title>
      <link>https://community.databricks.com/t5/data-engineering/need-help-skipping-previously-executed-cells-in-a-failed/m-p/5644#M2006</link>
      <description>&lt;P&gt;In Azure databricks, I have a job that calls a notebook which has multiple cells with sql queries. In case of any cell fails and when we restart the databricks job then how to skip previous cell which already ran and start only from the failed cell? Any lead would be helpful.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Apr 2023 07:58:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/need-help-skipping-previously-executed-cells-in-a-failed/m-p/5644#M2006</guid>
      <dc:creator>Sandy84</dc:creator>
      <dc:date>2023-04-18T07:58:25Z</dc:date>
    </item>
    <item>
      <title>Re: Need help skipping previously executed cells in a failed Databricks job calling a notebook with multiple SQL cells</title>
      <link>https://community.databricks.com/t5/data-engineering/need-help-skipping-previously-executed-cells-in-a-failed/m-p/5645#M2007</link>
      <description>&lt;P&gt;Hi Sandy!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;My 2 cents on your issue. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This looks more like a design issue rather than a technical issue. From the sound of it, it looks like your notebook is having too many operations and if a failure occurs, everything repeats again which is not ideal or it could cause issues (e.g. data duplication).&lt;/P&gt;&lt;P&gt;A good strategy for every ETL process is that it should be "restartable". Meaning, if it fails, to be able to restart and "clean its own mess" and proceed repeating what it is supposed to do. &lt;/P&gt;&lt;P&gt;So I would say instead of having everything in one notebook and try to figure out how to skip previously executed cells, why not separate the notebooks by logical operations and make sure that each unit is restartable. For instance if on one CMD you create a table and you want to make sure that the command is idempotent, instead of using CREATE TABLE, use CREATE TABLE IF NOT EXIST. So this way if your CMD runs again, if the table is there nothing will happen. That is just an example of course but you get my point I guess.  &lt;/P&gt;</description>
      <pubDate>Tue, 18 Apr 2023 09:47:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/need-help-skipping-previously-executed-cells-in-a-failed/m-p/5645#M2007</guid>
      <dc:creator>Serlal</dc:creator>
      <dc:date>2023-04-18T09:47:25Z</dc:date>
    </item>
    <item>
      <title>Re: Need help skipping previously executed cells in a failed Databricks job calling a notebook with multiple SQL cells</title>
      <link>https://community.databricks.com/t5/data-engineering/need-help-skipping-previously-executed-cells-in-a-failed/m-p/5646#M2008</link>
      <description>&lt;P&gt;Thank you for you reply .. Yes I agree with your point on the design part. However, in the current project we have 1000s of sqls and set of sqls are kept under a single notebook to perform operations. But yes, your suggestion sounds good for the CREATE  TABLE IF NOT EXISTS and just now searched how we can make the INSERT CMD idempotent and got the below &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/ingestion/copy-into/" alt="https://learn.microsoft.com/en-us/azure/databricks/ingestion/copy-into/" target="_blank"&gt;&lt;B&gt;&lt;/B&gt;&lt;/A&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/ingestion/copy-into/" target="test_blank"&gt;https://learn.microsoft.com/en-us/azure/databricks/ingestion/copy-into/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Will try to re-visit and see if we can make use of databrick's retries option.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Apr 2023 10:36:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/need-help-skipping-previously-executed-cells-in-a-failed/m-p/5646#M2008</guid>
      <dc:creator>Sandy84</dc:creator>
      <dc:date>2023-04-18T10:36:51Z</dc:date>
    </item>
    <item>
      <title>Re: Need help skipping previously executed cells in a failed Databricks job calling a notebook with multiple SQL cells</title>
      <link>https://community.databricks.com/t5/data-engineering/need-help-skipping-previously-executed-cells-in-a-failed/m-p/5647#M2009</link>
      <description>&lt;P&gt;Hi @Sandip Rath​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you for posting your question in our community! We are happy to assist you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 24 Apr 2023 04:33:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/need-help-skipping-previously-executed-cells-in-a-failed/m-p/5647#M2009</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-04-24T04:33:41Z</dc:date>
    </item>
  </channel>
</rss>

