<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to enforce schema check and benefit from badRecordsPath when using autoloader in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-enforce-schema-check-and-benefit-from-badrecordspath-when/m-p/12693#M7458</link>
    <description>&lt;P&gt;We would like to have a robust reader that ensure that the data we read and write using the autoloader respect the schema which is provided to the autoloader reader.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We also provide the option "badRecordsPath" (refer to &lt;A href="https://docs.databricks.com/spark/latest/spark-sql/handling-bad-records.html)" target="test_blank"&gt;https://docs.databricks.com/spark/latest/spark-sql/handling-bad-records.html)&lt;/A&gt; which works fine with corrupted files etc. &lt;/P&gt;&lt;P&gt;We have an issue similar to the one documented in &lt;A href="https://kb.databricks.com/data/wrong-schema-in-files.html" target="test_blank"&gt;https://kb.databricks.com/data/wrong-schema-in-files.html&lt;/A&gt; where the DECIMAL(20, 0) found in the source files in incompatible with the LONG we specify in our schema. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The main question is then: Do you have a way to make spark log to location given in badRecordsPath when the above happens rather than raising an exception (from which we cannot know the file paths causing the issue). As all this is declarative is highly depends on the available options and the implementation of ""badRecordsPath". &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 25 Jul 2022 08:54:11 GMT</pubDate>
    <dc:creator>Swann</dc:creator>
    <dc:date>2022-07-25T08:54:11Z</dc:date>
    <item>
      <title>How to enforce schema check and benefit from badRecordsPath when using autoloader</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-enforce-schema-check-and-benefit-from-badrecordspath-when/m-p/12693#M7458</link>
      <description>&lt;P&gt;We would like to have a robust reader that ensure that the data we read and write using the autoloader respect the schema which is provided to the autoloader reader.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We also provide the option "badRecordsPath" (refer to &lt;A href="https://docs.databricks.com/spark/latest/spark-sql/handling-bad-records.html)" target="test_blank"&gt;https://docs.databricks.com/spark/latest/spark-sql/handling-bad-records.html)&lt;/A&gt; which works fine with corrupted files etc. &lt;/P&gt;&lt;P&gt;We have an issue similar to the one documented in &lt;A href="https://kb.databricks.com/data/wrong-schema-in-files.html" target="test_blank"&gt;https://kb.databricks.com/data/wrong-schema-in-files.html&lt;/A&gt; where the DECIMAL(20, 0) found in the source files in incompatible with the LONG we specify in our schema. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The main question is then: Do you have a way to make spark log to location given in badRecordsPath when the above happens rather than raising an exception (from which we cannot know the file paths causing the issue). As all this is declarative is highly depends on the available options and the implementation of ""badRecordsPath". &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 25 Jul 2022 08:54:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-enforce-schema-check-and-benefit-from-badrecordspath-when/m-p/12693#M7458</guid>
      <dc:creator>Swann</dc:creator>
      <dc:date>2022-07-25T08:54:11Z</dc:date>
    </item>
  </channel>
</rss>

