<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Delta Live Table Schema Error in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/delta-live-table-schema-error/m-p/24345#M16914</link>
    <description>&lt;P&gt;i was facing similar issue in loading json files through autoloader for delta live tables.&lt;/P&gt;&lt;P&gt;Was able to fix with this option &lt;/P&gt;&lt;P&gt;.option("cloudFiles.inferColumnTypes", "True")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;From the docs &lt;I&gt;"For formats that don’t encode data types (JSON and CSV), Auto Loader infers all columns as strings (including nested fields in JSON files)."&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/ingestion/auto-loader/schema.html#" target="test_blank"&gt;https://docs.databricks.com/ingestion/auto-loader/schema.html#&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 01 Jun 2023 13:39:19 GMT</pubDate>
    <dc:creator>shagun</dc:creator>
    <dc:date>2023-06-01T13:39:19Z</dc:date>
    <item>
      <title>Delta Live Table Schema Error</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-table-schema-error/m-p/24343#M16912</link>
      <description>&lt;P&gt;I'm using Delta Live Tables to load a set of csv files in a directory. I am pre-defining the schema to avoid issues with schema inference. This works with autoloader on a regular delta table, but is failing for Delta Live Tables. Below is an example of the code I am using to define the schema and load into DLT:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;# Define Schema
schema = StructType([
  StructField("ID",StringType(),True, {'comment': "Unique customer id"}),
  StructField("Test",StringType(),True, {'comment': "this is a test"}),
  ...)]
&amp;nbsp;
# Define Delta Live Table
@dlt.table(name="test_bronze",
                  comment = "Test data incrementally ingested from S3 Raw landing zone",
  table_properties={
    "quality": "bronze"
  },
  schema=schema
)
&amp;nbsp;
# Read Stream
def rafode_bronze():
  return (
    spark.readStream
                  .format("cloudFiles")
                  .option("cloudFiles.format", source_format) # format is csv
                  .option("inferSchema", "False")
                  .option("header", "True")
                  .schema(schema)
                  .load(data_source) # data_source is S3 directory
  )&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;When attempting to run this as a Delta Live Table pipeline, I get an error that:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;org.apache.spark.sql.AnalysisException: Failed to merge fields 'Test' and 'Test. Failed to merge incompatible data types IntegerType and DoubleType&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have attempted running the readStream with and without the `.option("inferSchema", "False")` to see if that allows for the pre-defined schema to be used vs an infered schema, but I run into the same error. It seems as though spark.readStream is not using the pre-defined schema on each read of the csv files in the directory which is causing schema differences and failure to load. Do I need to alter my readStream code to force the use of my schema or am I missing something else?&lt;/P&gt;</description>
      <pubDate>Tue, 01 Nov 2022 21:03:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-table-schema-error/m-p/24343#M16912</guid>
      <dc:creator>Dave_Nithio</dc:creator>
      <dc:date>2022-11-01T21:03:11Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Table Schema Error</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-table-schema-error/m-p/24345#M16914</link>
      <description>&lt;P&gt;i was facing similar issue in loading json files through autoloader for delta live tables.&lt;/P&gt;&lt;P&gt;Was able to fix with this option &lt;/P&gt;&lt;P&gt;.option("cloudFiles.inferColumnTypes", "True")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;From the docs &lt;I&gt;"For formats that don’t encode data types (JSON and CSV), Auto Loader infers all columns as strings (including nested fields in JSON files)."&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/ingestion/auto-loader/schema.html#" target="test_blank"&gt;https://docs.databricks.com/ingestion/auto-loader/schema.html#&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 01 Jun 2023 13:39:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-table-schema-error/m-p/24345#M16914</guid>
      <dc:creator>shagun</dc:creator>
      <dc:date>2023-06-01T13:39:19Z</dc:date>
    </item>
  </channel>
</rss>

