<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Structured Streaming schemaTrackingLocation does not work with starting_version in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/structured-streaming-schematrackinglocation-does-not-work-with/m-p/94296#M38860</link>
    <description>&lt;P&gt;Hello Community,&lt;/P&gt;&lt;P&gt;I came across a strange behviour when using structured streaming on top of a delta table.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a stream that I wanted to start from a specific version&lt;SPAN&gt;&amp;nbsp; of a delta table&amp;nbsp;&lt;/SPAN&gt;using the option &lt;STRONG&gt;&lt;EM&gt;option(&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;DIV&gt;&lt;STRONG&gt;&lt;EM&gt;"starting_version", x&lt;/EM&gt;&lt;/STRONG&gt;&lt;SPAN&gt;&lt;EM&gt;&lt;STRONG&gt;)&lt;/STRONG&gt;&amp;nbsp;&lt;/EM&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;SPAN&gt;because I did not want to stream all the data of source the table but only the newly arriving one. To accomodate future (non-additive) schema changes I also set the option&amp;nbsp;&lt;STRONG&gt;&lt;EM&gt;option(&lt;/EM&gt;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;STRONG&gt;&lt;EM&gt;&lt;SPAN&gt;"schemaTrackingLocation",&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;checkpoint_location&lt;/SPAN&gt;&lt;/EM&gt;&lt;/STRONG&gt;&lt;SPAN&gt;&lt;STRONG&gt;&lt;EM&gt;).&lt;/EM&gt;&lt;/STRONG&gt;&amp;nbsp;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Now, if I change the schema of the source table the DataStreamReader does not pick up the schema changes and writes these to the schemaTrackingLocation but still infers the old schema and I can't get it to pick up the schema changes.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;After some trial and error I found out that the &lt;STRONG&gt;starting_version&lt;/STRONG&gt; is probably the cause of the issue since I tried changing the schema on a stream without setting the starting_version option and it worked as intended and could pick up the schema changes on the source table.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;I'm a bit confused since the starting_version should only have an effect when starting the stream and otherwise be ignored, as from the docs:&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;EM&gt;They take effect only when starting a new streaming query. If a streaming query has started and the progress has been recorded in its checkpoint, these options are ignored.&amp;nbsp;&lt;/EM&gt;&lt;A href="https://docs.databricks.com/en/structured-streaming/delta-lake.html#specify-initial-position" target="_self"&gt;https://docs.databricks.com/en/structured-streaming/delta-lake.html#specify-initial-position&lt;/A&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Did anybody have a similar problem? Is this an intended behaviour? How can I solve this issue? Where could I raise this issue?&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;</description>
    <pubDate>Wed, 16 Oct 2024 13:54:36 GMT</pubDate>
    <dc:creator>Volker</dc:creator>
    <dc:date>2024-10-16T13:54:36Z</dc:date>
    <item>
      <title>Structured Streaming schemaTrackingLocation does not work with starting_version</title>
      <link>https://community.databricks.com/t5/data-engineering/structured-streaming-schematrackinglocation-does-not-work-with/m-p/94296#M38860</link>
      <description>&lt;P&gt;Hello Community,&lt;/P&gt;&lt;P&gt;I came across a strange behviour when using structured streaming on top of a delta table.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a stream that I wanted to start from a specific version&lt;SPAN&gt;&amp;nbsp; of a delta table&amp;nbsp;&lt;/SPAN&gt;using the option &lt;STRONG&gt;&lt;EM&gt;option(&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;DIV&gt;&lt;STRONG&gt;&lt;EM&gt;"starting_version", x&lt;/EM&gt;&lt;/STRONG&gt;&lt;SPAN&gt;&lt;EM&gt;&lt;STRONG&gt;)&lt;/STRONG&gt;&amp;nbsp;&lt;/EM&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;SPAN&gt;because I did not want to stream all the data of source the table but only the newly arriving one. To accomodate future (non-additive) schema changes I also set the option&amp;nbsp;&lt;STRONG&gt;&lt;EM&gt;option(&lt;/EM&gt;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;STRONG&gt;&lt;EM&gt;&lt;SPAN&gt;"schemaTrackingLocation",&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;checkpoint_location&lt;/SPAN&gt;&lt;/EM&gt;&lt;/STRONG&gt;&lt;SPAN&gt;&lt;STRONG&gt;&lt;EM&gt;).&lt;/EM&gt;&lt;/STRONG&gt;&amp;nbsp;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Now, if I change the schema of the source table the DataStreamReader does not pick up the schema changes and writes these to the schemaTrackingLocation but still infers the old schema and I can't get it to pick up the schema changes.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;After some trial and error I found out that the &lt;STRONG&gt;starting_version&lt;/STRONG&gt; is probably the cause of the issue since I tried changing the schema on a stream without setting the starting_version option and it worked as intended and could pick up the schema changes on the source table.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;I'm a bit confused since the starting_version should only have an effect when starting the stream and otherwise be ignored, as from the docs:&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;EM&gt;They take effect only when starting a new streaming query. If a streaming query has started and the progress has been recorded in its checkpoint, these options are ignored.&amp;nbsp;&lt;/EM&gt;&lt;A href="https://docs.databricks.com/en/structured-streaming/delta-lake.html#specify-initial-position" target="_self"&gt;https://docs.databricks.com/en/structured-streaming/delta-lake.html#specify-initial-position&lt;/A&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Did anybody have a similar problem? Is this an intended behaviour? How can I solve this issue? Where could I raise this issue?&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 16 Oct 2024 13:54:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/structured-streaming-schematrackinglocation-does-not-work-with/m-p/94296#M38860</guid>
      <dc:creator>Volker</dc:creator>
      <dc:date>2024-10-16T13:54:36Z</dc:date>
    </item>
    <item>
      <title>Re: Structured Streaming schemaTrackingLocation does not work with starting_version</title>
      <link>https://community.databricks.com/t5/data-engineering/structured-streaming-schematrackinglocation-does-not-work-with/m-p/94901#M38993</link>
      <description>&lt;P&gt;I found that it actually is not related to specifying the starting_version.&lt;/P&gt;&lt;P&gt;I think I found the flaw in the flow how the schema is updated in the schemaTrackingLocation:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;On the first readStream operation the&amp;nbsp;&lt;SPAN&gt;_schema_log_... gets created&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;On the first writeStream operation the schema gets written to the&amp;nbsp;_schema_log_&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;readStream will now read in the source table with the schema from&amp;nbsp;_schema_log_&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;the schema in&amp;nbsp;_schema_log_ only gets updated on a&amp;nbsp;writeStream operation if there is a schema change detected in the source table&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;The check if the source table schema was updated happens after checking if the data schema and the target schema are compatible&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;If the source table schema and target table schema get updated simultaneously then the stream fails since it detects a schema mismatch between the data schema (which is the original schema) and the target table schema&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Because the stream fails at this point the schema in the&amp;nbsp;_schema_log_ does not get updated as well and the readStream will always only read in the original schema of the source table even though the schema changed.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;This is quite annoying behaviour because of this you would first need to adapt the schema of the source table, let the stream fail since it detected a schema change (which will cause an update of the schema in the&amp;nbsp;_schema_log_) and then update the schema of the target table.&amp;nbsp;&lt;BR /&gt;I know that I could use schema evolution but I do not want to use it if possible.&amp;nbsp;&lt;BR /&gt;Does anybody have experience with this and has a workaround?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 18 Oct 2024 15:03:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/structured-streaming-schematrackinglocation-does-not-work-with/m-p/94901#M38993</guid>
      <dc:creator>Volker</dc:creator>
      <dc:date>2024-10-18T15:03:58Z</dc:date>
    </item>
    <item>
      <title>Re: Structured Streaming schemaTrackingLocation does not work with starting_version</title>
      <link>https://community.databricks.com/t5/data-engineering/structured-streaming-schematrackinglocation-does-not-work-with/m-p/139325#M51160</link>
      <description>&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;This issue is related to how Delta Lake’s structured streaming interacts with schema evolution and options like&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;startingVersion&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;schemaTrackingLocation&lt;/CODE&gt;. The behavior you've observed has been noted by other users, and can be subtle due to how checkpointing, versioning, and schema tracking are handled in combination. Here’s a breakdown, with solutions:&lt;/P&gt;
&lt;H2 id="core-issue" class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0 md:text-lg [hr+&amp;amp;]:mt-4"&gt;Core Issue&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Setting&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;startingVersion&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;as an option in your stream appears to interfere with schema evolution, resulting in the stream persisting the old schema—even after the underlying Delta table’s schema has changed and updates are written to your&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;schemaTrackingLocation&lt;/CODE&gt;.&lt;/P&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;When you remove&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;startingVersion&lt;/CODE&gt;, the DataStreamReader detects schema changes correctly, provided schema tracking is enabled. From Databricks documentation,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;startingVersion&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;should only be relevant for initializing a new stream, not for resumes from an existing checkpoint.&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Why Does This Happen?&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Schema Tracking and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;startingVersion&lt;/CODE&gt;:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;When&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;startingVersion&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;is set, it can impact which version of the table the streaming query starts reading from—even if a checkpoint exists. Certain system versions and Spark releases may not fully disregard this option after checkpoint initialization due to nuanced implementation details behind the scenes.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;The schema stored at&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;schemaTrackingLocation&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;is used for schema management, but if the stream is “stuck” at an older version due to how&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;startingVersion&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;is interpreted, it may not trigger schema updates.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Checkpoints and Restart Behavior:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;On restart, if a checkpoint exists, the stream should ignore&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;startingVersion&lt;/CODE&gt;. However, if the checkpoint is missing or corrupted, or if the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;startingVersion&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;option is reapplied incorrectly, the schema may not evolve as expected.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 id="suggested-solution" class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0 md:text-lg [hr+&amp;amp;]:mt-4"&gt;Suggested Solution&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Remove&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;startingVersion&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;After Initial Start:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Only use the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;startingVersion&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;option when you first start the stream and no checkpoint exists.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;After initial startup and successful checkpointing,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;remove&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;startingVersion&lt;/CODE&gt;&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;so schema tracking works properly on subsequent runs. Schema changes should then be detected and handled via your&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;schemaTrackingLocation&lt;/CODE&gt;.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Confirm Checkpoint Health:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Make sure your checkpoint directory is healthy and present when restarting the stream. If the checkpoint is not present,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;startingVersion&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;will be used.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Upgrade Databricks &amp;amp; Delta Lake:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Certain bugs with schema tracking and stream options have been resolved in later versions of Databricks and Delta Lake. Upgrading may resolve unexpected behaviors.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Workaround (if you need to retain startingVersion logic):&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Start your stream&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;without&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;startingVersion&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;once the checkpoint is established, so ongoing runs see schema changes.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;For testing, you can clear out your checkpoint directory (careful: this resets your offsets and may replay data) then set&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;startingVersion&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;to reinitialize from that version, but be sure to understand the replay implications.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 id="references" class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0 md:text-lg [hr+&amp;amp;]:mt-4"&gt;References&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;A class="reset interactable cursor-pointer decoration-1 underline-offset-1 text-super hover:underline font-semibold" href="https://docs.databricks.com/en/structured-streaming/delta-lake.html#specify-initial-position" target="_blank" rel="nofollow noopener"&gt;&lt;SPAN class="text-box-trim-both"&gt;Delta Lake Streaming Initial Position documentation&lt;/SPAN&gt;&lt;/A&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;A class="reset interactable cursor-pointer decoration-1 underline-offset-1 text-super hover:underline font-semibold" href="https://community.databricks.com/s/question/0D58Y00009A8RkvSAF/delta-table-streaming-schema-tracking-not-working-with-startingversion" target="_blank" rel="nofollow noopener"&gt;&lt;SPAN class="text-box-trim-both"&gt;Databricks Community: Schema tracking issue discussion&lt;/SPAN&gt;&lt;/A&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 id="summary" class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0 md:text-lg [hr+&amp;amp;]:mt-4"&gt;Summary&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;This is not fully “intended” behavior, but more a side-effect of how options and checkpointing interact in specific tool versions. Removing&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;startingVersion&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;after initial setup, maintaining your checkpoint, and enabling schema tracking is the correct pattern for evolving schemas in Delta Lake structured streaming. If the problem persists after following this approach and upgrading, it may warrant a support ticket or GitHub issue.&lt;/P&gt;</description>
      <pubDate>Mon, 17 Nov 2025 12:04:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/structured-streaming-schematrackinglocation-does-not-work-with/m-p/139325#M51160</guid>
      <dc:creator>mark_ott</dc:creator>
      <dc:date>2025-11-17T12:04:14Z</dc:date>
    </item>
  </channel>
</rss>

