topic Re: SkipChangeCommit to True Scenario on Data Loss Possibility in Data Engineering

SkipChangeCommit to True Scenario on Data Loss Possibility

Naveenkumar1811 — Mon, 17 Nov 2025 13:35:24 GMT

Hi Team,

I have Below Scenario,

I have a Spark Streaming Job with trigger of Processing time as 3 secs Running Continuously 365 days.

We are performing a weekly delete job from the source of this streaming job based on custom retention policy. it is a Delete command on the delta table(external).

If i implement SkipChangeCommit to True in my ReadStream, Will i have an Dataloss in my streaming Job...

My source is Bronze delta lake external table loaded in append mode only.

The Reason i want to make sure is the option will skip the entire commit so i want to know if both my weekly delete and an insert to my source data might fall under same commit and the option will skip the entire commit causing the data loss.

Please review and scenario and let me know if there is a potential data loss possibility with this option.

Re: SkipChangeCommit to True Scenario on Data Loss Possibility

Raman_Unifeye — Mon, 17 Nov 2025 14:30:04 GMT

Short answer is: No, implementing skipChangeCommits will not cause data loss in your streaming job from new inserts, assuming your source table operations are transactional (as a Delta table).

If your source was a table that included regular UPDATE or MERGE operations that you did need to capture, then using skipChangeCommits=true would cause data loss of those updated/merged records. Since your source is an append-only Bronze table, this should not be a concern for you.

Re: SkipChangeCommit to True Scenario on Data Loss Possibility

szymon_dybczak — Mon, 17 Nov 2025 14:49:03 GMT

It shouldn't. You have append only stream and SkipChangeCommit will ignore any modification that were applied to already existing files

Re: SkipChangeCommit to True Scenario on Data Loss Possibility

Naveenkumar1811 — Wed, 19 Nov 2025 10:14:59 GMT

Hi szymon/Raman,

My Question was on the commit it performs with the insert/append via my streaming and the delete operation by the weekly maintenance Job... Is there a way that both transaction will fall into same commit. I need to understand that portion so it gives me clear picture of data loss during my skipchangecommit.

Re: SkipChangeCommit to True Scenario on Data Loss Possibility

stbjelcevic — Mon, 24 Nov 2025 20:27:53 GMT

The short answer is no: independent operations from different jobs become separate, serialized commits in the Delta transaction log. They won’t be coalesced into one commit unless you explicitly run a single statement that performs both (for example, a MERGE/OVERWRITE that rewrites files and inserts rows).

Some practical guidelines:

Keep ingestion appends and retention deletes as separate statements/jobs so they become separate commits and skipChangeCommits only skips the delete commit

Avoid MERGE/OVERWRITE that mixes rewrites and inserts in the source Bronze table. If you must, expect the commit to be skipped entirely by skipChangeCommits.
If concurrent operations overlap in time, they are still serialized as distinct commits. Streaming reads will see them as separate versions in order.

This blog post does a great job of explaining the delta transaction log: https://www.databricks.com/blog/2019/08/21/diving-into-delta-lake-unpacking-the-transaction-log.html