topic Re: Update code for a streaming job in Production in Data Engineering

Update code for a streaming job in Production

User16783853906 — Wed, 23 Jun 2021 21:52:55 GMT

How to update a streaming job in production with minimal/no downtime when there are significant code changes that may not be compatible with the existing checkpoint state to resume the stream processing?

Re: Update code for a streaming job in Production

aladda — Wed, 23 Jun 2021 23:10:57 GMT

This will likely be use case/situation dependent. Can you provide an example of your current streaming setup and what kind of changes you anticipate that you'd like to perform with minimal downtime?

Re: Update code for a streaming job in Production

Deepak_Bhutada — Thu, 16 Sep 2021 10:38:44 GMT

Please understand the code changes will support the existing checkpoint or else you need to go with the new checkpoint. More information on the type of changes: https://docs.databricks.com/spark/latest/structured-streaming/production.html#types-of-changes
If you are going with a new checkpoint then without mentioning any starting point for the source to fetch, the framework will fetch the whole data from the source. In that case, you should be in a position to handle the duplicates or else duplicates will be added to the sink. To handle the duplicates, you can implement dropDruplicates or merge or row_number based rank filtering of 1.

Re: Update code for a streaming job in Production

Sandeep — Wed, 10 Nov 2021 15:26:49 GMT

Can you provide the source and sink type?

Re: Update code for a streaming job in Production

Himanshi — Thu, 21 Jul 2022 10:33:18 GMT

I have the same scenario, I am using source type as parquet and sink type as delta in Azure Data Lake Gen2. I need to change the checkpoint location, how can we exclude existing files ?. Without using autoloader feature can we do that, please confirm .

Please help asap

Thanks

Re: Update code for a streaming job in Production

Anonymous — Mon, 25 Jul 2022 08:51:13 GMT

Thanks for the information, I will try to figure it out for more. Keep sharing such informative post keep suggesting such post.

MA Health Connector