<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic CONVERT TO DELTA fails to merge file schema in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/convert-to-delta-fails-to-merge-file-schema/m-p/146836#M52710</link>
    <description>&lt;P&gt;I have a directory of Parquet files in Azure Data Lake Storage that I want to convert to a Delta Lake table. I run this:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;CONVERT TO DELTA parquet.`abfss://container@storage_account.dfs.core.windows.net/directory_name`;&lt;/LI-CODE&gt;&lt;P&gt;But it throws this error: "&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;SparkException: &lt;/SPAN&gt;[&lt;A class="" href="https://learn.microsoft.com/azure/databricks/error-messages/error-classes#delta_failed_merge_schema_file" target="_blank" rel="noopener noreferrer"&gt;DELTA_FAILED_MERGE_SCHEMA_FILE&lt;/A&gt;] Failed to merge schema of file abfss://container@storage_account.dfs.core.windows.net/directory_name/file_name_123.parquet: ..."&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;I ran this in an all-purpose cluster with the `spark.databricks.delta.mergeSchema.enabled` config set to true.&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 04 Feb 2026 16:50:01 GMT</pubDate>
    <dc:creator>deployment_fail</dc:creator>
    <dc:date>2026-02-04T16:50:01Z</dc:date>
    <item>
      <title>CONVERT TO DELTA fails to merge file schema</title>
      <link>https://community.databricks.com/t5/data-engineering/convert-to-delta-fails-to-merge-file-schema/m-p/146836#M52710</link>
      <description>&lt;P&gt;I have a directory of Parquet files in Azure Data Lake Storage that I want to convert to a Delta Lake table. I run this:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;CONVERT TO DELTA parquet.`abfss://container@storage_account.dfs.core.windows.net/directory_name`;&lt;/LI-CODE&gt;&lt;P&gt;But it throws this error: "&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;SparkException: &lt;/SPAN&gt;[&lt;A class="" href="https://learn.microsoft.com/azure/databricks/error-messages/error-classes#delta_failed_merge_schema_file" target="_blank" rel="noopener noreferrer"&gt;DELTA_FAILED_MERGE_SCHEMA_FILE&lt;/A&gt;] Failed to merge schema of file abfss://container@storage_account.dfs.core.windows.net/directory_name/file_name_123.parquet: ..."&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;I ran this in an all-purpose cluster with the `spark.databricks.delta.mergeSchema.enabled` config set to true.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Feb 2026 16:50:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/convert-to-delta-fails-to-merge-file-schema/m-p/146836#M52710</guid>
      <dc:creator>deployment_fail</dc:creator>
      <dc:date>2026-02-04T16:50:01Z</dc:date>
    </item>
    <item>
      <title>Re: CONVERT TO DELTA fails to merge file schema</title>
      <link>https://community.databricks.com/t5/data-engineering/convert-to-delta-fails-to-merge-file-schema/m-p/146881#M52717</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&amp;nbsp;I believe the spark configuration you are trying to use is used when writing to existing delta table and not while converting non-delta table to delta table.with error it is quite evident that your parquet files does not conform to unified scheme hence conversion fails.&lt;/P&gt;&lt;P&gt;lets try to skip automatic scheam inference by suplying the schema like this:&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;!--   ScriptorStartFragment   --&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;schema = "col1 INT, col2 STRING, col3 DOUBLE"&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;df = spark.read.schema(schema).parquet("&lt;/SPAN&gt;&lt;PRE&gt;abfss://container@storage_account.dfs.core.windows.net/directory_name&lt;SPAN&gt;")&lt;!--   ScriptorEndFragment   --&gt;&lt;/SPAN&gt;&lt;/PRE&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;DIV class=""&gt;&lt;SPAN&gt;and then write to delta:&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;!--   ScriptorStartFragment   --&gt;df.write.format("delta").mode("overwrite").save("abfss://.../delta_table")&lt;!--   ScriptorEndFragment   --&gt;&lt;/SPAN&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;DIV class=""&gt;&lt;SPAN&gt;share your outcome. thanks&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/94382"&gt;@deployment_fail&lt;/a&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 05 Feb 2026 10:38:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/convert-to-delta-fails-to-merge-file-schema/m-p/146881#M52717</guid>
      <dc:creator>saurabh18cs</dc:creator>
      <dc:date>2026-02-05T10:38:16Z</dc:date>
    </item>
    <item>
      <title>Re: CONVERT TO DELTA fails to merge file schema</title>
      <link>https://community.databricks.com/t5/data-engineering/convert-to-delta-fails-to-merge-file-schema/m-p/150107#M53244</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/94382"&gt;@deployment_fail&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;Good timing on this question. Let me explain what is happening and walk you through several approaches to resolve it.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;WHAT IS HAPPENING&lt;/P&gt;
&lt;P&gt;When you run CONVERT TO DELTA, Databricks reads the Parquet footer metadata from every file in the directory and attempts to merge all of those individual file schemas into a single unified Delta table schema. The DELTA_FAILED_MERGE_SCHEMA_FILE error means that at least one file has a schema that cannot be reconciled with the others -- for example, the same column name appears with different data types across files.&lt;/P&gt;
&lt;P&gt;An important clarification: the spark.databricks.delta.mergeSchema.enabled configuration you set does not apply to the CONVERT TO DELTA command. That setting controls schema evolution for write operations (INSERT, MERGE, DataFrame writes) against existing Delta tables. CONVERT TO DELTA uses its own internal schema merge logic when reading Parquet file footers, so that config has no effect here.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;COMMON CAUSES OF THIS ERROR&lt;/P&gt;
&lt;P&gt;1. Schema drift over time -- an upstream system changed a column type (e.g., INT to LONG, FLOAT to DOUBLE, or STRING to TIMESTAMP)&lt;BR /&gt;2. Different writers -- files produced by different Spark versions, pandas, PyArrow, or other tools may use slightly different Parquet type mappings for the same logical data&lt;BR /&gt;3. Nested struct differences -- fields inside struct columns may have different nullability or field ordering across files&lt;BR /&gt;4. Decimal precision/scale mismatches -- e.g., DECIMAL(10,2) in some files vs DECIMAL(18,4) in others&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;APPROACH 1: READ WITH mergeSchema AND REWRITE AS DELTA (RECOMMENDED)&lt;/P&gt;
&lt;P&gt;Instead of in-place conversion, read the Parquet files with schema merging enabled and write them out as a new Delta table. The Parquet reader's mergeSchema option is more forgiving -- it can widen types (INT to LONG, FLOAT to DOUBLE) and merge structs with different fields:&lt;/P&gt;
&lt;P&gt;df = (spark.read&lt;BR /&gt;.option("mergeSchema", "true")&lt;BR /&gt;.parquet("abfss://container@storage_account.dfs.core.windows.net/directory_name"))&lt;/P&gt;
&lt;P&gt;df.write.format("delta").mode("overwrite").saveAsTable("your_catalog.your_schema.your_table")&lt;/P&gt;
&lt;P&gt;If your data is partitioned, add partitionBy:&lt;/P&gt;
&lt;P&gt;(df.write.format("delta")&lt;BR /&gt;.partitionBy("your_partition_column")&lt;BR /&gt;.mode("overwrite")&lt;BR /&gt;.saveAsTable("your_catalog.your_schema.your_table"))&lt;/P&gt;
&lt;P&gt;Note: this copies the data to a new location, so you will need sufficient storage. The benefit is a clean Delta table with a consistent schema.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;APPROACH 2: IDENTIFY THE CONFLICTING FILES&lt;/P&gt;
&lt;P&gt;If you want to keep the in-place conversion (no data copy), first identify which files have the conflicting schema so you can fix them:&lt;/P&gt;
&lt;P&gt;path = "abfss://container@storage_account.dfs.core.windows.net/directory_name"&lt;/P&gt;
&lt;P&gt;files = (spark.read.format("binaryFile")&lt;BR /&gt;.load(path + "/*.parquet")&lt;BR /&gt;.select("path")&lt;BR /&gt;.collect())&lt;/P&gt;
&lt;P&gt;schemas = {}&lt;BR /&gt;for row in files:&lt;BR /&gt;file_path = row["path"]&lt;BR /&gt;try:&lt;BR /&gt;schema = spark.read.parquet(file_path).schema&lt;BR /&gt;schemas[file_path] = schema&lt;BR /&gt;except Exception as e:&lt;BR /&gt;print(f"Error reading {file_path}: {e}")&lt;/P&gt;
&lt;P&gt;# Compare all schemas against the first file&lt;BR /&gt;reference_schema = list(schemas.values())[0]&lt;BR /&gt;for file_path, schema in schemas.items():&lt;BR /&gt;if schema != reference_schema:&lt;BR /&gt;print(f"Schema mismatch in: {file_path}")&lt;BR /&gt;for ref_field, file_field in zip(reference_schema.fields, schema.fields):&lt;BR /&gt;if ref_field != file_field:&lt;BR /&gt;print(f" Column difference: {ref_field} vs {file_field}")&lt;/P&gt;
&lt;P&gt;Once you identify the problematic files, you can rewrite just those files with the correct schema, then retry CONVERT TO DELTA.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;APPROACH 3: SPECIFY AN EXPLICIT SCHEMA AND REWRITE&lt;/P&gt;
&lt;P&gt;If you already know the target schema, you can force all files to be read with that schema. Spark will cast compatible types automatically:&lt;/P&gt;
&lt;P&gt;from pyspark.sql.types import StructType, StructField, StringType, LongType, DoubleType&lt;/P&gt;
&lt;P&gt;target_schema = StructType([&lt;BR /&gt;StructField("id", LongType(), True),&lt;BR /&gt;StructField("name", StringType(), True),&lt;BR /&gt;StructField("amount", DoubleType(), True),&lt;BR /&gt;# add all your columns here&lt;BR /&gt;])&lt;/P&gt;
&lt;P&gt;df = spark.read.schema(target_schema).parquet(&lt;BR /&gt;"abfss://container@storage_account.dfs.core.windows.net/directory_name")&lt;/P&gt;
&lt;P&gt;df.write.format("delta").mode("overwrite").saveAsTable("your_catalog.your_schema.your_table")&lt;/P&gt;
&lt;P&gt;This bypasses the automatic schema merge entirely by telling Spark exactly what types to expect.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;APPROACH 4: USE CTAS (CREATE TABLE AS SELECT)&lt;/P&gt;
&lt;P&gt;Another clean approach is to use a SQL-based CTAS statement, which lets you read and write in one step:&lt;/P&gt;
&lt;P&gt;CREATE OR REPLACE TABLE your_catalog.your_schema.your_table&lt;BR /&gt;AS SELECT * FROM parquet.`abfss://container@storage_account.dfs.core.windows.net/directory_name`&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;DOCUMENTATION REFERENCES&lt;/P&gt;
&lt;P&gt;- CONVERT TO DELTA syntax:&lt;BR /&gt;&lt;A href="https://docs.databricks.com/en/sql/language-manual/delta-convert-to-delta.html" target="_blank"&gt;https://docs.databricks.com/en/sql/language-manual/delta-convert-to-delta.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;- Convert Parquet to Delta Lake guide:&lt;BR /&gt;&lt;A href="https://docs.databricks.com/en/delta/convert-to-delta.html" target="_blank"&gt;https://docs.databricks.com/en/delta/convert-to-delta.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;- Delta Lake schema evolution (mergeSchema and autoMerge):&lt;BR /&gt;&lt;A href="https://docs.databricks.com/en/delta/update-schema.html" target="_blank"&gt;https://docs.databricks.com/en/delta/update-schema.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;SUMMARY&lt;/P&gt;
&lt;P&gt;The core issue is that CONVERT TO DELTA does a strict schema merge across all Parquet file footers and cannot automatically widen types the way a DataFrame read can. The spark.databricks.delta.mergeSchema.enabled config only applies to Delta write operations, not to CONVERT TO DELTA.&lt;/P&gt;
&lt;P&gt;The most reliable approach is to use spark.read.option("mergeSchema", "true").parquet(...) to read all the files with flexible type widening, then write out a new Delta table. If you share the specific column details from the error message, I can help narrow down exactly which type conflict is causing the failure.&lt;/P&gt;
&lt;P&gt;* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.&lt;/P&gt;</description>
      <pubDate>Sun, 08 Mar 2026 02:54:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/convert-to-delta-fails-to-merge-file-schema/m-p/150107#M53244</guid>
      <dc:creator>SteveOstrowski</dc:creator>
      <dc:date>2026-03-08T02:54:14Z</dc:date>
    </item>
  </channel>
</rss>

