<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Schema Evolution - Auto Loader for Avro format is not working as expected in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/schema-evolution-auto-loader-for-avro-format-is-not-working-as/m-p/10389#M5576</link>
    <description>&lt;P&gt;I am attaching the sample code notebook that helps to reproduce the issue. &lt;/P&gt;</description>
    <pubDate>Tue, 31 Jan 2023 19:06:32 GMT</pubDate>
    <dc:creator>venkat09</dc:creator>
    <dc:date>2023-01-31T19:06:32Z</dc:date>
    <item>
      <title>Schema Evolution - Auto Loader for Avro format is not working as expected</title>
      <link>https://community.databricks.com/t5/data-engineering/schema-evolution-auto-loader-for-avro-format-is-not-working-as/m-p/10388#M5575</link>
      <description>&lt;P&gt;	* Reading Avro files from s3 and then writing to the delta table&lt;/P&gt;&lt;P&gt;		* Ingested sample data of 10 files, which contain four columns, and it infers the schema automatically as expected&lt;/P&gt;&lt;P&gt;		* Introducing a new file which contains a new column [foo] along with existing columns and stream failed and threw identified new field error, which is expected&amp;nbsp;&lt;/P&gt;&lt;P&gt;			* Restarting the stream, add the new columns to the delta table&amp;nbsp;&lt;/P&gt;&lt;P&gt;		* Introducing a new file which contains another new column [Foo, but only it differs by case compared to the previous new column]&amp;nbsp;&lt;/P&gt;&lt;P&gt;			* Expected: stream should not fail and add that new column info into the **_rescued_data**&lt;/P&gt;&lt;P&gt;			* Actual: stream failed to throw the below-given error message&amp;nbsp;&lt;/P&gt;&lt;P&gt;				* com.databricks.sql.transaction.tahoe.DeltaAnalysisException: Found duplicate column(s) in the data to save: metadata&lt;/P&gt;&lt;P&gt;NOTE: I saw the option `readerCaseSensitive` in the document, but the explanation is unclear. I tried to set both false and true but faced the same issue.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;stream = (spark.readStream&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.format("cloudFiles")&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.option("cloudFiles.format", "avro")&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.option("cloudFiles.schemaLocation", bronzeCheckpoint)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;#.option("readerCaseSensitive", False)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.load(rawDataSource)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.writeStream&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.option("path", bronzeTable)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.option("checkpointLocation", bronzeCheckpoint)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.option("mergeSchema", True)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.table(bronzeTableName)&lt;/P&gt;&lt;P&gt;)&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;My understanding from the document,  if there are case mismatches in the column name, the column not that in the schema capture should be moved to _rescued_column. Please let me know if that s not the case. Thanks&lt;/P&gt;</description>
      <pubDate>Tue, 31 Jan 2023 18:01:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/schema-evolution-auto-loader-for-avro-format-is-not-working-as/m-p/10388#M5575</guid>
      <dc:creator>venkat09</dc:creator>
      <dc:date>2023-01-31T18:01:51Z</dc:date>
    </item>
    <item>
      <title>Re: Schema Evolution - Auto Loader for Avro format is not working as expected</title>
      <link>https://community.databricks.com/t5/data-engineering/schema-evolution-auto-loader-for-avro-format-is-not-working-as/m-p/10389#M5576</link>
      <description>&lt;P&gt;I am attaching the sample code notebook that helps to reproduce the issue. &lt;/P&gt;</description>
      <pubDate>Tue, 31 Jan 2023 19:06:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/schema-evolution-auto-loader-for-avro-format-is-not-working-as/m-p/10389#M5576</guid>
      <dc:creator>venkat09</dc:creator>
      <dc:date>2023-01-31T19:06:32Z</dc:date>
    </item>
  </channel>
</rss>

