<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SkipChangeCommit to True Scenario on Data Loss Possibility in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/skipchangecommit-to-true-scenario-on-data-loss-possibility/m-p/140225#M51360</link>
    <description>&lt;P&gt;The short answer is no: independent operations from different jobs become separate, serialized commits in the Delta transaction log. They won’t be coalesced into one commit unless you explicitly run a single statement that performs both (for example, a MERGE/OVERWRITE that rewrites files and inserts rows).&lt;/P&gt;
&lt;P&gt;Some practical guidelines:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI class="qt3gz91 paragraph"&gt;Keep ingestion appends and retention deletes as separate statements/jobs so they become separate commits and skipChangeCommits only skips the delete commit&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Avoid MERGE/OVERWRITE that mixes rewrites and inserts in the source Bronze table. If you must, expect the commit to be skipped entirely by skipChangeCommits.&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;If concurrent operations overlap in time, they are still serialized as distinct commits. Streaming reads will see them as separate versions in order.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This blog post does a great job of explaining the delta transaction log:&amp;nbsp;&lt;A href="https://www.databricks.com/blog/2019/08/21/diving-into-delta-lake-unpacking-the-transaction-log.html" target="_blank"&gt;https://www.databricks.com/blog/2019/08/21/diving-into-delta-lake-unpacking-the-transaction-log.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 24 Nov 2025 20:27:53 GMT</pubDate>
    <dc:creator>stbjelcevic</dc:creator>
    <dc:date>2025-11-24T20:27:53Z</dc:date>
    <item>
      <title>SkipChangeCommit to True Scenario on Data Loss Possibility</title>
      <link>https://community.databricks.com/t5/data-engineering/skipchangecommit-to-true-scenario-on-data-loss-possibility/m-p/139335#M51163</link>
      <description>&lt;P&gt;Hi Team,&lt;/P&gt;&lt;P&gt;I have Below Scenario,&lt;/P&gt;&lt;P&gt;I have a Spark Streaming Job with trigger of Processing time as 3 secs Running Continuously 365 days.&lt;/P&gt;&lt;P&gt;We are performing a weekly delete job from the source of this streaming job based on custom retention policy. it is a Delete command on the delta table(external).&lt;/P&gt;&lt;P&gt;If i implement&amp;nbsp;SkipChangeCommit to True in my ReadStream, Will i have an Dataloss in my streaming Job...&amp;nbsp;&lt;/P&gt;&lt;P&gt;My source is Bronze delta lake external table loaded in append mode only.&lt;/P&gt;&lt;P&gt;The Reason i want to make sure is the option will skip the entire commit so i want to know if both my weekly delete and an insert to my source data might fall under same commit and the option will skip the entire commit causing the data loss.&lt;/P&gt;&lt;P&gt;Please review and scenario and let me know if there is a potential data loss possibility with this option.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 17 Nov 2025 13:35:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/skipchangecommit-to-true-scenario-on-data-loss-possibility/m-p/139335#M51163</guid>
      <dc:creator>Naveenkumar1811</dc:creator>
      <dc:date>2025-11-17T13:35:24Z</dc:date>
    </item>
    <item>
      <title>Re: SkipChangeCommit to True Scenario on Data Loss Possibility</title>
      <link>https://community.databricks.com/t5/data-engineering/skipchangecommit-to-true-scenario-on-data-loss-possibility/m-p/139362#M51174</link>
      <description>&lt;P&gt;Short answer is: No, implementing skipChangeCommits will not cause data loss in your streaming job from new inserts, assuming your source table operations are transactional (as a Delta table).&lt;/P&gt;&lt;P&gt;If your source was a table that included regular UPDATE or MERGE operations that you &lt;I&gt;did&lt;/I&gt; need to capture, then using skipChangeCommits=true would cause data loss of those updated/merged records. Since your source is an append-only Bronze table, this should not be a concern for you.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 17 Nov 2025 14:30:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/skipchangecommit-to-true-scenario-on-data-loss-possibility/m-p/139362#M51174</guid>
      <dc:creator>Raman_Unifeye</dc:creator>
      <dc:date>2025-11-17T14:30:04Z</dc:date>
    </item>
    <item>
      <title>Re: SkipChangeCommit to True Scenario on Data Loss Possibility</title>
      <link>https://community.databricks.com/t5/data-engineering/skipchangecommit-to-true-scenario-on-data-loss-possibility/m-p/139372#M51178</link>
      <description>&lt;P&gt;It shouldn't. You have append only stream and SkipChangeCommit will ignore any modification that were applied to already existing files&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="szymon_dybczak_0-1763390934234.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/21766i0C01CC6D73EBCB07/image-size/medium?v=v2&amp;amp;px=400" role="button" title="szymon_dybczak_0-1763390934234.png" alt="szymon_dybczak_0-1763390934234.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 17 Nov 2025 14:49:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/skipchangecommit-to-true-scenario-on-data-loss-possibility/m-p/139372#M51178</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-11-17T14:49:03Z</dc:date>
    </item>
    <item>
      <title>Re: SkipChangeCommit to True Scenario on Data Loss Possibility</title>
      <link>https://community.databricks.com/t5/data-engineering/skipchangecommit-to-true-scenario-on-data-loss-possibility/m-p/139637#M51253</link>
      <description>&lt;P&gt;Hi szymon/Raman,&lt;/P&gt;&lt;P&gt;My Question was on the commit it performs with the insert/append via my streaming and the delete operation by the weekly maintenance Job... Is there a way that both transaction will fall into same commit. I need to understand that portion so it gives me clear picture of data loss during my skipchangecommit.&lt;/P&gt;</description>
      <pubDate>Wed, 19 Nov 2025 10:14:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/skipchangecommit-to-true-scenario-on-data-loss-possibility/m-p/139637#M51253</guid>
      <dc:creator>Naveenkumar1811</dc:creator>
      <dc:date>2025-11-19T10:14:59Z</dc:date>
    </item>
    <item>
      <title>Re: SkipChangeCommit to True Scenario on Data Loss Possibility</title>
      <link>https://community.databricks.com/t5/data-engineering/skipchangecommit-to-true-scenario-on-data-loss-possibility/m-p/140225#M51360</link>
      <description>&lt;P&gt;The short answer is no: independent operations from different jobs become separate, serialized commits in the Delta transaction log. They won’t be coalesced into one commit unless you explicitly run a single statement that performs both (for example, a MERGE/OVERWRITE that rewrites files and inserts rows).&lt;/P&gt;
&lt;P&gt;Some practical guidelines:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI class="qt3gz91 paragraph"&gt;Keep ingestion appends and retention deletes as separate statements/jobs so they become separate commits and skipChangeCommits only skips the delete commit&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Avoid MERGE/OVERWRITE that mixes rewrites and inserts in the source Bronze table. If you must, expect the commit to be skipped entirely by skipChangeCommits.&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;If concurrent operations overlap in time, they are still serialized as distinct commits. Streaming reads will see them as separate versions in order.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This blog post does a great job of explaining the delta transaction log:&amp;nbsp;&lt;A href="https://www.databricks.com/blog/2019/08/21/diving-into-delta-lake-unpacking-the-transaction-log.html" target="_blank"&gt;https://www.databricks.com/blog/2019/08/21/diving-into-delta-lake-unpacking-the-transaction-log.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 24 Nov 2025 20:27:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/skipchangecommit-to-true-scenario-on-data-loss-possibility/m-p/140225#M51360</guid>
      <dc:creator>stbjelcevic</dc:creator>
      <dc:date>2025-11-24T20:27:53Z</dc:date>
    </item>
  </channel>
</rss>

