<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: When trying to ingest parquet files with autoloader I get an error stating that schema inference is not supported, but the parquet files have schema data.  No inference should be necessary.  Is this right? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/when-trying-to-ingest-parquet-files-with-autoloader-i-get-an/m-p/15227#M9583</link>
    <description>&lt;P&gt;Hi @Ben Bogart​, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just a friendly follow-up. Did Noopur's response help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.&lt;/P&gt;</description>
    <pubDate>Mon, 08 Aug 2022 18:58:43 GMT</pubDate>
    <dc:creator>jose_gonzalez</dc:creator>
    <dc:date>2022-08-08T18:58:43Z</dc:date>
    <item>
      <title>When trying to ingest parquet files with autoloader I get an error stating that schema inference is not supported, but the parquet files have schema data.  No inference should be necessary.  Is this right?</title>
      <link>https://community.databricks.com/t5/data-engineering/when-trying-to-ingest-parquet-files-with-autoloader-i-get-an/m-p/15225#M9581</link>
      <description>&lt;P&gt;When trying to ingest parquet files with autoloader with the following code&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;df = (spark&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;.readStream&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;.format("cloudFiles")&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;.option("cloudfiles.format","parquet")&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;.load(filePath))&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I get the following error:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;java.lang.UnsupportedOperationException: Schema inference is not supported for format: parquet. Please specify the schema.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I find this strange because parquet files contain schema information.  There is nothing to infer.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If I pull the schema from one of the existing parquet files autoloader works.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;filePath = '/dbfs/mnt/dops/streamtest/public/streamme/'&lt;/P&gt;&lt;P&gt;files = os.listdir(filePath)&lt;/P&gt;&lt;P&gt;files.sort()&lt;/P&gt;&lt;P&gt;sdata = spark.read.parquet(os.path.join(file_path[5:], files[-1]))&lt;/P&gt;&lt;P&gt;df = (spark&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;.readStream&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;.format("cloudFiles")&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;.option("cloudfiles.format","parquet")&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;.schema(sdata.schema)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;.load(filePath))&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This does work but eliminates one of the primary benefits of autoloader: no directory listing.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is this expected behavior?  I have trouble understanding why autoloader cannot read the schema from parquet files.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Ben&lt;/P&gt;</description>
      <pubDate>Thu, 30 Jun 2022 17:19:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-trying-to-ingest-parquet-files-with-autoloader-i-get-an/m-p/15225#M9581</guid>
      <dc:creator>159312</dc:creator>
      <dc:date>2022-06-30T17:19:50Z</dc:date>
    </item>
    <item>
      <title>Re: When trying to ingest parquet files with autoloader I get an error stating that schema inference is not supported, but the parquet files have schema data.  No inference should be necessary.  Is this right?</title>
      <link>https://community.databricks.com/t5/data-engineering/when-trying-to-ingest-parquet-files-with-autoloader-i-get-an/m-p/15226#M9582</link>
      <description>&lt;P&gt;Hi @Ben Bogart​&amp;nbsp;This is supported in DBR 11.1 and above.The below document suggests the same:&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/ingestion/auto-loader/schema.html#schema-inference-and-evolution-in-auto-loader" target="test_blank"&gt;https://docs.databricks.com/ingestion/auto-loader/schema.html#schema-inference-and-evolution-in-auto-loader&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Please try in DBR 11.1 and please let us know if you still face the issue.&lt;/P&gt;</description>
      <pubDate>Mon, 25 Jul 2022 10:40:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-trying-to-ingest-parquet-files-with-autoloader-i-get-an/m-p/15226#M9582</guid>
      <dc:creator>Noopur_Nigam</dc:creator>
      <dc:date>2022-07-25T10:40:28Z</dc:date>
    </item>
    <item>
      <title>Re: When trying to ingest parquet files with autoloader I get an error stating that schema inference is not supported, but the parquet files have schema data.  No inference should be necessary.  Is this right?</title>
      <link>https://community.databricks.com/t5/data-engineering/when-trying-to-ingest-parquet-files-with-autoloader-i-get-an/m-p/15227#M9583</link>
      <description>&lt;P&gt;Hi @Ben Bogart​, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just a friendly follow-up. Did Noopur's response help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.&lt;/P&gt;</description>
      <pubDate>Mon, 08 Aug 2022 18:58:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-trying-to-ingest-parquet-files-with-autoloader-i-get-an/m-p/15227#M9583</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2022-08-08T18:58:43Z</dc:date>
    </item>
    <item>
      <title>Re: When trying to ingest parquet files with autoloader I get an error stating that schema inference is not supported, but the parquet files have schema data.  No inference should be necessary.  Is this right?</title>
      <link>https://community.databricks.com/t5/data-engineering/when-trying-to-ingest-parquet-files-with-autoloader-i-get-an/m-p/15228#M9584</link>
      <description>&lt;P&gt;While I can confirm that schema inference is supported in DBR 11.1, it is still not supported in either the DLT Current or Preview runtimes which is where I need it.  Womp womp.&lt;/P&gt;</description>
      <pubDate>Wed, 10 Aug 2022 15:02:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-trying-to-ingest-parquet-files-with-autoloader-i-get-an/m-p/15228#M9584</guid>
      <dc:creator>159312</dc:creator>
      <dc:date>2022-08-10T15:02:03Z</dc:date>
    </item>
  </channel>
</rss>

