Structured Streaming: How to handle Schema Changes in source and target Delta Table
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-11-2024 12:21 AM
Hey Community,
we have a streaming pipeline, starting with autoloader to ingest data into a bronze table and this data gets then picked up by another streaming job that transforms this data and writes into a silver table.
Now there are some schema changes (renaming a column and adding a new column) on the bronze table that we also want to propagate to the silver table.
We added the "schemaTrackingLocation" option to the stream that ingests the data from bronze such that the it does not fail because of these non-additive schema changes but the stream now also does not pick up the schema changes at all. The schema that I get from the readstream operation is still the same as before the schema change in bronze.
Does anyone know how to best handle such schema changes using structured streaming without too much downtime of the stream and too much overhead?
Thank you!