cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

SkipChangeCommit to True Scenario on Data Loss Possibility

Naveenkumar1811
New Contributor II

Hi Team,

I have Below Scenario,

I have a Spark Streaming Job with trigger of Processing time as 3 secs Running Continuously 365 days.

We are performing a weekly delete job from the source of this streaming job based on custom retention policy. it is a Delete command on the delta table(external).

If i implement SkipChangeCommit to True in my ReadStream, Will i have an Dataloss in my streaming Job... 

My source is Bronze delta lake external table loaded in append mode only.

The Reason i want to make sure is the option will skip the entire commit so i want to know if both my weekly delete and an insert to my source data might fall under same commit and the option will skip the entire commit causing the data loss.

Please review and scenario and let me know if there is a potential data loss possibility with this option. 

3 REPLIES 3

Raman_Unifeye
Contributor III

Short answer is: No, implementing skipChangeCommits will not cause data loss in your streaming job from new inserts, assuming your source table operations are transactional (as a Delta table).

If your source was a table that included regular UPDATE or MERGE operations that you did need to capture, then using skipChangeCommits=true would cause data loss of those updated/merged records. Since your source is an append-only Bronze table, this should not be a concern for you.

 

szymon_dybczak
Esteemed Contributor III

It shouldn't. You have append only stream and SkipChangeCommit will ignore any modification that were applied to already existing files

szymon_dybczak_0-1763390934234.png

 

Naveenkumar1811
New Contributor II

Hi szymon/Raman,

My Question was on the commit it performs with the insert/append via my streaming and the delete operation by the weekly maintenance Job... Is there a way that both transaction will fall into same commit. I need to understand that portion so it gives me clear picture of data loss during my skipchangecommit.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now