<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Querying CDF on a Delta-Sharing table after data type change in the Table (INT to DECIMAL) in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/querying-cdf-on-a-delta-sharing-table-after-data-type-change-in/m-p/153723#M54000</link>
    <description>&lt;P&gt;&lt;SPAN&gt;Hi — this is a known limitation of Change Data Feed. Here's what's happening and your options.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Why This Happens&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN&gt;Changing a column from INT to DECIMAL is a &lt;/SPAN&gt;&lt;STRONG&gt;non-additive schema change&lt;/STRONG&gt;&lt;SPAN&gt;. When reading CDF in batch mode, Delta Lake applies a single schema (the latest or end-version schema) to all Parquet files in the version range. Since the older Parquet files still have INT and the schema expects DECIMAL, you get a conflict.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;`mergeSchema` won't help here&lt;/STRONG&gt;&lt;SPAN&gt; — it handles additive changes like new columns, not data type changes.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Your Options&lt;/STRONG&gt;&lt;/H3&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt; Split your CDF reads at the schema change boundary (recommended if you want to avoid a full reload)&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;SPAN&gt;Read CDF in two separate ranges — before and after the type change — then cast and union:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Read versions BEFORE the type change (e.g., up to version N-1)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;df_before = (spark.read.format("delta")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.option("readChangeFeed", "true")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.option("startingVersion", start_version)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.option("endingVersion", schema_change_version - 1)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.table("your_table")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Read versions AFTER the type change (version N onward)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;df_after = (spark.read.format("delta")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.option("readChangeFeed", "true")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.option("startingVersion", schema_change_version)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.option("endingVersion", end_version)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.table("your_table")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Cast the old schema to match and union&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;df_before_casted = df_before.withColumn("col_name", df_before["col_name"].cast("decimal"))&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;df_combined = df_before_casted.unionByName(df_after)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;You can find the version where the schema changed using &lt;/SPAN&gt;&lt;SPAN&gt;DESCRIBE HISTORY your_table&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL start="2"&gt;
&lt;LI&gt;&lt;STRONG&gt; Full reload of the table&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;SPAN&gt;If splitting reads is too complex for your pipeline, a one-time full reload at the new schema is the simplest path. After the reload, future CDF reads will work normally since all files will have the new schema.&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL start="3"&gt;
&lt;LI&gt;&lt;STRONG&gt; Use Type Widening for future-proofing (DBR 15.4+)&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;SPAN&gt;The &lt;/SPAN&gt;&lt;A href="https://docs.databricks.com/aws/en/delta/type-widening" target="_blank"&gt;&lt;SPAN&gt;Type Widening&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN&gt; feature lets you widen column types (e.g., INT → DECIMAL) without rewriting data files. However, even with type widening, &lt;/SPAN&gt;&lt;STRONG&gt;CDF reads across the type change boundary are still not supported&lt;/STRONG&gt;&lt;SPAN&gt; — you'd still need to split reads. The benefit is it avoids the costly full-table rewrite on the provider side.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Note: Type widening over Delta Sharing requires both provider and recipient on &lt;/SPAN&gt;&lt;STRONG&gt;DBR 16.1+&lt;/STRONG&gt;&lt;SPAN&gt; and is only supported for &lt;/SPAN&gt;&lt;STRONG&gt;Databricks-to-Databricks&lt;/STRONG&gt;&lt;SPAN&gt; sharing.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;TL;DR&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN&gt;You cannot read CDF across a data type change in a single query — this is by design. Split your reads at the schema change version boundary, or do a full reload. For future schema changes, consider type widening to minimize disruption.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Docs:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;A href="https://docs.databricks.com/aws/en/delta/delta-change-data-feed" target="_blank"&gt;&lt;SPAN&gt;Change Data Feed — Schema Changes&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;A href="https://docs.databricks.com/aws/en/delta/type-widening" target="_blank"&gt;&lt;SPAN&gt;Type Widening&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
    <pubDate>Wed, 08 Apr 2026 11:00:11 GMT</pubDate>
    <dc:creator>anuj_lathi</dc:creator>
    <dc:date>2026-04-08T11:00:11Z</dc:date>
    <item>
      <title>Querying CDF on a Delta-Sharing table after data type change in the Table (INT to DECIMAL)</title>
      <link>https://community.databricks.com/t5/data-engineering/querying-cdf-on-a-delta-sharing-table-after-data-type-change-in/m-p/153655#M53991</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am trying to query the CDF of a Delta-Sharing table that have had a change in data type of one its columns. The change was from an INT to a DECIMAL. When reading the specific version where the schema change happened, I am receiving an error mentioning a conflict between the new schema of the Delta-Sharing (with DECIMAL) and the Parquet file having INT in the column.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have tried to add mergeSchema = true but still receiving the same error.&amp;nbsp;&lt;/P&gt;&lt;P&gt;My question is: is there any way to maintain readability of the CDF of a Delta-Sharing table to which a schema have been changed with a data type change or a full reload of the table is required in that specific instance?&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Tue, 07 Apr 2026 20:21:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/querying-cdf-on-a-delta-sharing-table-after-data-type-change-in/m-p/153655#M53991</guid>
      <dc:creator>fdubourdeau</dc:creator>
      <dc:date>2026-04-07T20:21:04Z</dc:date>
    </item>
    <item>
      <title>Re: Querying CDF on a Delta-Sharing table after data type change in the Table (INT to DECIMAL)</title>
      <link>https://community.databricks.com/t5/data-engineering/querying-cdf-on-a-delta-sharing-table-after-data-type-change-in/m-p/153723#M54000</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Hi — this is a known limitation of Change Data Feed. Here's what's happening and your options.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Why This Happens&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN&gt;Changing a column from INT to DECIMAL is a &lt;/SPAN&gt;&lt;STRONG&gt;non-additive schema change&lt;/STRONG&gt;&lt;SPAN&gt;. When reading CDF in batch mode, Delta Lake applies a single schema (the latest or end-version schema) to all Parquet files in the version range. Since the older Parquet files still have INT and the schema expects DECIMAL, you get a conflict.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;`mergeSchema` won't help here&lt;/STRONG&gt;&lt;SPAN&gt; — it handles additive changes like new columns, not data type changes.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Your Options&lt;/STRONG&gt;&lt;/H3&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt; Split your CDF reads at the schema change boundary (recommended if you want to avoid a full reload)&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;SPAN&gt;Read CDF in two separate ranges — before and after the type change — then cast and union:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Read versions BEFORE the type change (e.g., up to version N-1)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;df_before = (spark.read.format("delta")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.option("readChangeFeed", "true")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.option("startingVersion", start_version)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.option("endingVersion", schema_change_version - 1)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.table("your_table")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Read versions AFTER the type change (version N onward)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;df_after = (spark.read.format("delta")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.option("readChangeFeed", "true")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.option("startingVersion", schema_change_version)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.option("endingVersion", end_version)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.table("your_table")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Cast the old schema to match and union&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;df_before_casted = df_before.withColumn("col_name", df_before["col_name"].cast("decimal"))&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;df_combined = df_before_casted.unionByName(df_after)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;You can find the version where the schema changed using &lt;/SPAN&gt;&lt;SPAN&gt;DESCRIBE HISTORY your_table&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL start="2"&gt;
&lt;LI&gt;&lt;STRONG&gt; Full reload of the table&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;SPAN&gt;If splitting reads is too complex for your pipeline, a one-time full reload at the new schema is the simplest path. After the reload, future CDF reads will work normally since all files will have the new schema.&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL start="3"&gt;
&lt;LI&gt;&lt;STRONG&gt; Use Type Widening for future-proofing (DBR 15.4+)&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;SPAN&gt;The &lt;/SPAN&gt;&lt;A href="https://docs.databricks.com/aws/en/delta/type-widening" target="_blank"&gt;&lt;SPAN&gt;Type Widening&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN&gt; feature lets you widen column types (e.g., INT → DECIMAL) without rewriting data files. However, even with type widening, &lt;/SPAN&gt;&lt;STRONG&gt;CDF reads across the type change boundary are still not supported&lt;/STRONG&gt;&lt;SPAN&gt; — you'd still need to split reads. The benefit is it avoids the costly full-table rewrite on the provider side.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Note: Type widening over Delta Sharing requires both provider and recipient on &lt;/SPAN&gt;&lt;STRONG&gt;DBR 16.1+&lt;/STRONG&gt;&lt;SPAN&gt; and is only supported for &lt;/SPAN&gt;&lt;STRONG&gt;Databricks-to-Databricks&lt;/STRONG&gt;&lt;SPAN&gt; sharing.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;TL;DR&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN&gt;You cannot read CDF across a data type change in a single query — this is by design. Split your reads at the schema change version boundary, or do a full reload. For future schema changes, consider type widening to minimize disruption.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Docs:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;A href="https://docs.databricks.com/aws/en/delta/delta-change-data-feed" target="_blank"&gt;&lt;SPAN&gt;Change Data Feed — Schema Changes&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;A href="https://docs.databricks.com/aws/en/delta/type-widening" target="_blank"&gt;&lt;SPAN&gt;Type Widening&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Wed, 08 Apr 2026 11:00:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/querying-cdf-on-a-delta-sharing-table-after-data-type-change-in/m-p/153723#M54000</guid>
      <dc:creator>anuj_lathi</dc:creator>
      <dc:date>2026-04-08T11:00:11Z</dc:date>
    </item>
  </channel>
</rss>

