<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: DLT, Automatic Schema Evolution and Type Widening in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dlt-automatic-schema-evolution-and-type-widening/m-p/107703#M42891</link>
    <description>&lt;P&gt;&lt;SPAN&gt;Alternatively, you can try using the&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;INSERT INTO&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;statement directly:&lt;/SPAN&gt;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;def load_snapshot_tables(source_system_name, source_schema_name, table_name, spark_schema, select_expression):
    snapshot_load_df = (
        spark.readStream
       .format("cloudFiles")
       .option("cloudFiles.format", "json")
       .option("cloudFiles.inferColumnTypes", False)
       .option("cloudFiles.includeExistingFiles", True)
       .option("pathGlobFilter", "*.json.gz")
       .schema(spark_schema)
       .load(f"abfss://YYY@{adl_name}.dfs.core.windows.net/Snapshot/{source_system_name}/{table_name}")
       .selectExpr(
            "CAST(concat(substring(_metadata.file_name, -20,4),'-',substring(_metadata.file_name, -16,2),'-',substring(_metadata.file_name, -14,2)) AS timestamp) AS XXX_Snapshot_Date",
            *select_expression,
            "_metadata.file_name AS XXX_File_Name",
            "_metadata AS XXX_File_Metadata"
        )
    )

    snapshot_load_df.writeStream \
       .format("delta") \
       .option("mergeSchema", "true") \
       .option("delta.enableTypeWidening", "true") \
       .outputMode("append") \
       .queryName(f"insert_into_{table_name}") \
       .toTable(f"{table_name}")&lt;/LI-CODE&gt;</description>
    <pubDate>Thu, 30 Jan 2025 07:00:42 GMT</pubDate>
    <dc:creator>Sidhant07</dc:creator>
    <dc:date>2025-01-30T07:00:42Z</dc:date>
    <item>
      <title>DLT, Automatic Schema Evolution and Type Widening</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-automatic-schema-evolution-and-type-widening/m-p/107660#M42880</link>
      <description>&lt;P&gt;I'm attempting to run a DLT pipeline that uses automatic schema evolution against tables that have type widening enabled.&lt;/P&gt;&lt;P&gt;I have code in this notebook that is a list of tables to create/update along with the schema for those tables. This list and spark schema are fed into this load_snapshot_tables function. That load_snapshot_tables function looks like this:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;def load_snapshot_tables(source_system_name, source_schema_name, table_name, spark_schema, select_expression):

    &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;.table (
        name=table_name,
        comment=f"{source_system_name}_{source_schema_name}.{table_name}_Snapshot",
        table_properties={"delta.enableTypeWidening": "true"},
        cluster_by=["XXX_Snapshot_Date"]
    )
    def create_snapshot_table():

        snapshot_load_df = (
            spark.readStream
            .format("cloudFiles")
            .option("cloudFiles.format", "json")
            .option("cloudFiles.inferColumnTypes", False)
            .option("cloudFiles.includeExistingFiles", True)
            .option("pathGlobFilter", "*.json.gz")
            .schema(spark_schema)
            .load(f"abfss://YYY@{adl_name}.dfs.core.windows.net/Snapshot/{source_system_name}/{table_name}")
            .selectExpr(
                "CAST(concat(substring(_metadata.file_name, -20,4),'-',substring(_metadata.file_name, -16,2),'-',substring(_metadata.file_name, -14,2)) AS timestamp) AS XXX_Snapshot_Date",
                *select_expression,
                "_metadata.file_name AS XXX_File_Name",
                "_metadata AS XXX_File_Metadata"
            )
        )

        return (snapshot_load_df)&lt;/LI-CODE&gt;&lt;P&gt;Everything works except type widening. New columns are added based on the schema I pass in. However, when changing data types, the process fails indicating a casting/type issue. Refreshing the tables resolves the errors. But, I don't want to have to refresh the tables. I've referenced URL&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/delta/type-widening" target="_self"&gt;Type-Widening&lt;/A&gt;&amp;nbsp;in my work/research. In this URL, there is a section titled&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/delta/type-widening#widen-types-with-automatic-schema-evolution" target="_self"&gt;Widening Types with Automatic Schema Evolution&lt;/A&gt;. I meet all of the requirements listed there with possibly the only exception being the first bullet (The command uses INSERT or MERGE INTO). I would have assumed behind the scenes INSERT or MERGE INTO is somehow being used here.&lt;/P&gt;&lt;P&gt;I am using the Preview channel for the pipeline.&lt;/P&gt;&lt;P&gt;So, two questions:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;What am I missing in my python code to make sure type widening is being honored?&lt;/LI&gt;&lt;LI&gt;What would my python code look like if I had to convert it to force it to use INSERT or MERGE INTO?&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Thanks in advance!&lt;/P&gt;</description>
      <pubDate>Wed, 29 Jan 2025 20:16:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-automatic-schema-evolution-and-type-widening/m-p/107660#M42880</guid>
      <dc:creator>MarkV</dc:creator>
      <dc:date>2025-01-29T20:16:29Z</dc:date>
    </item>
    <item>
      <title>Re: DLT, Automatic Schema Evolution and Type Widening</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-automatic-schema-evolution-and-type-widening/m-p/107702#M42890</link>
      <description>&lt;P&gt;&lt;SPAN class=""&gt;To make type widening work in your current setup, you can try the following modifications:&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL class="marker:text-textOff list-decimal pl-8"&gt;
&lt;LI&gt;&lt;SPAN class=""&gt;Add the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN class=""&gt;&lt;CODE&gt;mergeSchema&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN class=""&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;option to your read operation:&lt;/SPAN&gt;&lt;LI-CODE lang="markup"&gt;snapshot_load_df = (
    spark.readStream
    .format("cloudFiles")
    .option("cloudFiles.format", "json")
    .option("cloudFiles.inferColumnTypes", False)
    .option("cloudFiles.includeExistingFiles", True)
    .option("pathGlobFilter", "*.json.gz")
    .option("mergeSchema", "true")  # Add this line
    .schema(spark_schema)
    .load(f"abfss://YYY@{adl_name}.dfs.core.windows.net/Snapshot/{source_system_name}/{table_name}")
    # ... rest of the code
)
&lt;/LI-CODE&gt;&lt;/LI&gt;
&lt;/OL&gt;</description>
      <pubDate>Thu, 30 Jan 2025 06:59:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-automatic-schema-evolution-and-type-widening/m-p/107702#M42890</guid>
      <dc:creator>Sidhant07</dc:creator>
      <dc:date>2025-01-30T06:59:54Z</dc:date>
    </item>
    <item>
      <title>Re: DLT, Automatic Schema Evolution and Type Widening</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-automatic-schema-evolution-and-type-widening/m-p/107703#M42891</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Alternatively, you can try using the&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;INSERT INTO&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;statement directly:&lt;/SPAN&gt;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;def load_snapshot_tables(source_system_name, source_schema_name, table_name, spark_schema, select_expression):
    snapshot_load_df = (
        spark.readStream
       .format("cloudFiles")
       .option("cloudFiles.format", "json")
       .option("cloudFiles.inferColumnTypes", False)
       .option("cloudFiles.includeExistingFiles", True)
       .option("pathGlobFilter", "*.json.gz")
       .schema(spark_schema)
       .load(f"abfss://YYY@{adl_name}.dfs.core.windows.net/Snapshot/{source_system_name}/{table_name}")
       .selectExpr(
            "CAST(concat(substring(_metadata.file_name, -20,4),'-',substring(_metadata.file_name, -16,2),'-',substring(_metadata.file_name, -14,2)) AS timestamp) AS XXX_Snapshot_Date",
            *select_expression,
            "_metadata.file_name AS XXX_File_Name",
            "_metadata AS XXX_File_Metadata"
        )
    )

    snapshot_load_df.writeStream \
       .format("delta") \
       .option("mergeSchema", "true") \
       .option("delta.enableTypeWidening", "true") \
       .outputMode("append") \
       .queryName(f"insert_into_{table_name}") \
       .toTable(f"{table_name}")&lt;/LI-CODE&gt;</description>
      <pubDate>Thu, 30 Jan 2025 07:00:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-automatic-schema-evolution-and-type-widening/m-p/107703#M42891</guid>
      <dc:creator>Sidhant07</dc:creator>
      <dc:date>2025-01-30T07:00:42Z</dc:date>
    </item>
    <item>
      <title>Re: DLT, Automatic Schema Evolution and Type Widening</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-automatic-schema-evolution-and-type-widening/m-p/109023#M43209</link>
      <description>&lt;P&gt;Thanks, Sidhant07 for the response. Unfortunately, the error I received (can't merge IntegerType to LongType) is the same as not using the option you suggested:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="MarkV_0-1738789976378.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/14622i200DAFEF26FD75A4/image-size/medium?v=v2&amp;amp;px=400" role="button" title="MarkV_0-1738789976378.png" alt="MarkV_0-1738789976378.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I would really like to stick with this approach rather than the INSERT INTO approach. Any other thoughts?&lt;/P&gt;</description>
      <pubDate>Wed, 05 Feb 2025 21:18:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-automatic-schema-evolution-and-type-widening/m-p/109023#M43209</guid>
      <dc:creator>MarkV</dc:creator>
      <dc:date>2025-02-05T21:18:01Z</dc:date>
    </item>
    <item>
      <title>Re: DLT, Automatic Schema Evolution and Type Widening</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-automatic-schema-evolution-and-type-widening/m-p/109400#M43301</link>
      <description>&lt;P&gt;I've also bounced pretty much the same question up against the Databricks Assistant to see if I'm missing anything. But, the DA code recommendation matched what I already have coded for (including the mergeSchema option).&lt;/P&gt;&lt;P&gt;So, I'm still searching for a solution here. Any additional help would be appreciated.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 07 Feb 2025 13:20:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-automatic-schema-evolution-and-type-widening/m-p/109400#M43301</guid>
      <dc:creator>MarkV</dc:creator>
      <dc:date>2025-02-07T13:20:08Z</dc:date>
    </item>
    <item>
      <title>Re: DLT, Automatic Schema Evolution and Type Widening</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-automatic-schema-evolution-and-type-widening/m-p/109448#M43322</link>
      <description>&lt;P&gt;Sorry,&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/36707"&gt;@Sidhant07&lt;/a&gt;,&amp;nbsp;forgot to mention you in my responses.&lt;/P&gt;</description>
      <pubDate>Fri, 07 Feb 2025 19:51:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-automatic-schema-evolution-and-type-widening/m-p/109448#M43322</guid>
      <dc:creator>MarkV</dc:creator>
      <dc:date>2025-02-07T19:51:07Z</dc:date>
    </item>
    <item>
      <title>Re: DLT, Automatic Schema Evolution and Type Widening</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-automatic-schema-evolution-and-type-widening/m-p/110197#M43521</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/119560"&gt;@MarkV&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;Apologies for the delayed response!!&lt;/P&gt;
&lt;P&gt;Is it possible you to open a support ticket so that we can have a deeper look and investigate it further.&lt;/P&gt;
&lt;P&gt;We need the complete error stack trace along with code to debug further.&lt;/P&gt;</description>
      <pubDate>Fri, 14 Feb 2025 11:07:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-automatic-schema-evolution-and-type-widening/m-p/110197#M43521</guid>
      <dc:creator>Sidhant07</dc:creator>
      <dc:date>2025-02-14T11:07:16Z</dc:date>
    </item>
    <item>
      <title>Re: DLT, Automatic Schema Evolution and Type Widening</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-automatic-schema-evolution-and-type-widening/m-p/128121#M48162</link>
      <description>&lt;P&gt;Is there any solution for type widening in DLT pipeline ? writeStream is not possible in DLT right ?&lt;/P&gt;&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/36707"&gt;@Sidhant07&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/119560"&gt;@MarkV&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Aug 2025 05:23:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-automatic-schema-evolution-and-type-widening/m-p/128121#M48162</guid>
      <dc:creator>abhic21</dc:creator>
      <dc:date>2025-08-12T05:23:20Z</dc:date>
    </item>
    <item>
      <title>Re: DLT, Automatic Schema Evolution and Type Widening</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-automatic-schema-evolution-and-type-widening/m-p/128199#M48175</link>
      <description>&lt;P&gt;I have been unable to resolve this issue. However, I have not revisited this issue since January to retest this given some of the latest releases.&lt;/P&gt;&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/179065"&gt;@abhic21&lt;/a&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Aug 2025 11:10:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-automatic-schema-evolution-and-type-widening/m-p/128199#M48175</guid>
      <dc:creator>MarkV</dc:creator>
      <dc:date>2025-08-12T11:10:18Z</dc:date>
    </item>
  </channel>
</rss>

