<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Will auto loader read files if it doesn't need to? in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/will-auto-loader-read-files-if-it-doesn-t-need-to/m-p/116499#M9998</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/161344"&gt;@charliemerrell&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Yes, Databricks will still open and parse the JSON files, even if you're only selecting _metadata.&lt;BR /&gt;It must infer schema and perform basic parsing, unless you explicitly avoid it.&lt;BR /&gt;So, even if you do:&lt;BR /&gt;.select("_metadata")&lt;/P&gt;&lt;P&gt;It doesn't skip reading the file contents — it still downloads, parses, and caches to process the data.&lt;/P&gt;</description>
    <pubDate>Thu, 24 Apr 2025 16:38:39 GMT</pubDate>
    <dc:creator>lingareddy_Alva</dc:creator>
    <dc:date>2025-04-24T16:38:39Z</dc:date>
    <item>
      <title>Will auto loader read files if it doesn't need to?</title>
      <link>https://community.databricks.com/t5/get-started-discussions/will-auto-loader-read-files-if-it-doesn-t-need-to/m-p/116395#M9996</link>
      <description>&lt;P&gt;I want to run auto loader on some very large json files. I don't actually care about the data inside the files, just the file paths of the blobs. If I do something like&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; spark.readStream&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.format"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"json"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.schemaLocation"&lt;/SPAN&gt;&lt;SPAN&gt;, source_operations_checkpoint_path)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;load&lt;/SPAN&gt;&lt;SPAN&gt;(source_operations_path)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;select&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"_metadata"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;```&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;will Databricks know not to reading all the files or will it read them in anyway, then discard?&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 23 Apr 2025 20:47:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/will-auto-loader-read-files-if-it-doesn-t-need-to/m-p/116395#M9996</guid>
      <dc:creator>charliemerrell</dc:creator>
      <dc:date>2025-04-23T20:47:28Z</dc:date>
    </item>
    <item>
      <title>Re: Will auto loader read files if it doesn't need to?</title>
      <link>https://community.databricks.com/t5/get-started-discussions/will-auto-loader-read-files-if-it-doesn-t-need-to/m-p/116457#M9997</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/161344"&gt;@charliemerrell&lt;/a&gt;, even if you’re just selecting _metadata, Auto Loader still needs to read parts of the files, mainly to gather schema info and essential metadata. It won’t fully read the contents, but it doesn’t completely skip them either.&lt;/P&gt;&lt;P&gt;If you're only interested in things like file paths and not the actual data, switching to the "binaryFile" format is a better and more efficient option.&lt;/P&gt;</description>
      <pubDate>Thu, 24 Apr 2025 11:16:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/will-auto-loader-read-files-if-it-doesn-t-need-to/m-p/116457#M9997</guid>
      <dc:creator>Renu_</dc:creator>
      <dc:date>2025-04-24T11:16:53Z</dc:date>
    </item>
    <item>
      <title>Re: Will auto loader read files if it doesn't need to?</title>
      <link>https://community.databricks.com/t5/get-started-discussions/will-auto-loader-read-files-if-it-doesn-t-need-to/m-p/116499#M9998</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/161344"&gt;@charliemerrell&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Yes, Databricks will still open and parse the JSON files, even if you're only selecting _metadata.&lt;BR /&gt;It must infer schema and perform basic parsing, unless you explicitly avoid it.&lt;BR /&gt;So, even if you do:&lt;BR /&gt;.select("_metadata")&lt;/P&gt;&lt;P&gt;It doesn't skip reading the file contents — it still downloads, parses, and caches to process the data.&lt;/P&gt;</description>
      <pubDate>Thu, 24 Apr 2025 16:38:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/will-auto-loader-read-files-if-it-doesn-t-need-to/m-p/116499#M9998</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2025-04-24T16:38:39Z</dc:date>
    </item>
  </channel>
</rss>

