Hello Community,
I came across a strange behviour when using structured streaming on top of a delta table.
I have a stream that I wanted to start from a specific version of a delta table using the option option(
"starting_version", x) because I did not want to stream all the data of source the table but only the newly arriving one. To accomodate future (non-additive) schema changes I also set the option option("schemaTrackingLocation", checkpoint_location).
Now, if I change the schema of the source table the DataStreamReader does not pick up the schema changes and writes these to the schemaTrackingLocation but still infers the old schema and I can't get it to pick up the schema changes.
After some trial and error I found out that the starting_version is probably the cause of the issue since I tried changing the schema on a stream without setting the starting_version option and it worked as intended and could pick up the schema changes on the source table.
I'm a bit confused since the starting_version should only have an effect when starting the stream and otherwise be ignored, as from the docs:
Did anybody have a similar problem? Is this an intended behaviour? How can I solve this issue? Where could I raise this issue?