<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: What is `read_files`? in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/what-is-read-files/m-p/134143#M10819</link>
    <description>&lt;P&gt;Also,&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/96188"&gt;@ChristianRRL&lt;/a&gt;&amp;nbsp;, with a slight adjustment to the syntax, it does indeed behave like Autoloader&lt;BR /&gt;&lt;A href="https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/patterns?language=SQL" target="_blank"&gt;https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/patterns?language=SQL&lt;/A&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="BS_THE_ANALYST_0-1759901883674.png" style="width: 999px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/20544iD5F0DCC605FDFD86/image-size/large?v=v2&amp;amp;px=999" role="button" title="BS_THE_ANALYST_0-1759901883674.png" alt="BS_THE_ANALYST_0-1759901883674.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I'd also advise looking at the different options that Autoloader has when working with cloud storage i.e.&amp;nbsp;&lt;STRONG&gt;Directory Listing Mode&lt;/STRONG&gt; and&amp;nbsp;&lt;STRONG&gt;&lt;FONT size="3"&gt;File notification mode (recommended):&amp;nbsp;&lt;/FONT&gt;&lt;/STRONG&gt;&lt;A href="https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/file-detection-modes" target="_blank"&gt;https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/file-detection-modes&lt;/A&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;All the best,&lt;BR /&gt;BS&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 08 Oct 2025 05:41:45 GMT</pubDate>
    <dc:creator>BS_THE_ANALYST</dc:creator>
    <dc:date>2025-10-08T05:41:45Z</dc:date>
    <item>
      <title>What is `read_files`?</title>
      <link>https://community.databricks.com/t5/get-started-discussions/what-is-read-files/m-p/134122#M10816</link>
      <description>&lt;P&gt;Bit of a silly question, but wondering if someone can help me better understand what is `read_files`?&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;A href="https://docs.databricks.com/aws/en/sql/language-manual/functions/read_files#schema-inference" target="_blank"&gt;read_files table-valued function | Databricks on AWS&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;There's at least 3 ways to pull raw json data into a spark dataframe:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;df = spark.read...&lt;/LI&gt;&lt;LI&gt;df = spark.readStream... (i.e. AutoLoader)&lt;/LI&gt;&lt;LI&gt;select * from read_files(...)&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;I'm curious, is read_files a Databricks SQL specific function, or is it native to Spark? Particularly, I'm curious about the `schemaHints` functionality that both AutoLoader &amp;amp; read_files support, but spark.read seemingly does not support (as far as I can tell).&lt;/P&gt;</description>
      <pubDate>Tue, 07 Oct 2025 21:23:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/what-is-read-files/m-p/134122#M10816</guid>
      <dc:creator>ChristianRRL</dc:creator>
      <dc:date>2025-10-07T21:23:07Z</dc:date>
    </item>
    <item>
      <title>Re: What is `read_files`?</title>
      <link>https://community.databricks.com/t5/get-started-discussions/what-is-read-files/m-p/134128#M10818</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/96188"&gt;@ChristianRRL&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P class=""&gt;No, &lt;SPAN class=""&gt;read_files&lt;/SPAN&gt; is &lt;SPAN class=""&gt;not a native Spark function&lt;/SPAN&gt; — it’s a &lt;SPAN class=""&gt;Databricks SQL wrapper&lt;/SPAN&gt; that allows you to read files easily using SQL syntax.&lt;/P&gt;&lt;P class=""&gt;&amp;nbsp;&lt;/P&gt;&lt;P class=""&gt;The main advantage is that it adds several Databricks-specific capabilities on top of Spark’s basic file reader, such as &lt;SPAN class=""&gt;schema inference&lt;/SPAN&gt;, &lt;SPAN class=""&gt;schema hints&lt;/SPAN&gt;, &lt;SPAN class=""&gt;rescued data handling&lt;/SPAN&gt;, and &lt;SPAN class=""&gt;partition discovery&lt;/SPAN&gt;.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P class=""&gt;For example:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;SELECT * FROM read_files(
  's3://my-bucket/path/',
  format =&amp;gt; 'json',
  schemaHints =&amp;gt; 'user_id STRING, event_time TIMESTAMP'
);&lt;/LI-CODE&gt;&lt;P&gt;is "equals" to:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;spark.read.format("json").load("s3://my-bucket/path/")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;but with the extra Databricks logic for schema management and ingestion governance.&lt;/P&gt;&lt;P class=""&gt;&lt;BR /&gt;Regarding &lt;SPAN class=""&gt;&lt;STRONG&gt;schemaHints&lt;/STRONG&gt;&lt;/SPAN&gt;, it works the same way as in &lt;SPAN class=""&gt;&lt;STRONG&gt;Auto Loader&lt;/STRONG&gt;&lt;/SPAN&gt; — it lets you override or enforce specific column types while leaving the rest of the schema inferred automatically. &lt;A href="https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/schema#override-schema-inference-with-schema-hints" target="_self"&gt;Docs&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P class=""&gt;While &lt;SPAN class=""&gt;spark.read&lt;/SPAN&gt; in open-source Spark only allows you to &lt;SPAN class=""&gt;&lt;STRONG&gt;fully define a schema&lt;/STRONG&gt;&lt;/SPAN&gt; or &lt;SPAN class=""&gt;&lt;STRONG&gt;infer it entirely &lt;/STRONG&gt;&lt;A href="https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.load.html#pyspark-sql-dataframereader-load" target="_self"&gt;Docs&lt;/A&gt;&lt;/SPAN&gt;, Databricks added &lt;SPAN class=""&gt;schemaHints&lt;/SPAN&gt; in this built-in function inside his DBR, you can &lt;SPAN class=""&gt;&lt;STRONG&gt;override or enforce specific column types&lt;/STRONG&gt;&lt;/SPAN&gt; while letting the rest of the schema be inferred automatically.&lt;/P&gt;&lt;P class=""&gt;Hope this helps, &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;BR /&gt;&lt;BR /&gt;Isi&lt;/P&gt;</description>
      <pubDate>Tue, 07 Oct 2025 23:10:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/what-is-read-files/m-p/134128#M10818</guid>
      <dc:creator>Isi</dc:creator>
      <dc:date>2025-10-07T23:10:30Z</dc:date>
    </item>
    <item>
      <title>Re: What is `read_files`?</title>
      <link>https://community.databricks.com/t5/get-started-discussions/what-is-read-files/m-p/134143#M10819</link>
      <description>&lt;P&gt;Also,&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/96188"&gt;@ChristianRRL&lt;/a&gt;&amp;nbsp;, with a slight adjustment to the syntax, it does indeed behave like Autoloader&lt;BR /&gt;&lt;A href="https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/patterns?language=SQL" target="_blank"&gt;https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/patterns?language=SQL&lt;/A&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="BS_THE_ANALYST_0-1759901883674.png" style="width: 999px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/20544iD5F0DCC605FDFD86/image-size/large?v=v2&amp;amp;px=999" role="button" title="BS_THE_ANALYST_0-1759901883674.png" alt="BS_THE_ANALYST_0-1759901883674.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I'd also advise looking at the different options that Autoloader has when working with cloud storage i.e.&amp;nbsp;&lt;STRONG&gt;Directory Listing Mode&lt;/STRONG&gt; and&amp;nbsp;&lt;STRONG&gt;&lt;FONT size="3"&gt;File notification mode (recommended):&amp;nbsp;&lt;/FONT&gt;&lt;/STRONG&gt;&lt;A href="https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/file-detection-modes" target="_blank"&gt;https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/file-detection-modes&lt;/A&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;All the best,&lt;BR /&gt;BS&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Oct 2025 05:41:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/what-is-read-files/m-p/134143#M10819</guid>
      <dc:creator>BS_THE_ANALYST</dc:creator>
      <dc:date>2025-10-08T05:41:45Z</dc:date>
    </item>
  </channel>
</rss>

