<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Is Auto Loader open source now in Apache 4.1 SDP? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/is-auto-loader-open-source-now-in-apache-4-1-sdp/m-p/144277#M52305</link>
    <description>&lt;P&gt;In databricks you don't have to use auto loader when you're dealing with SDP. Think of auto loader as a very specific structred streaming source (that's source is called &lt;STRONG&gt;cloudFiles&amp;nbsp;&lt;/STRONG&gt;).&lt;/P&gt;&lt;P&gt;So, for instance you can use traditional structred streaming approach to load csv files incrementally:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df = spark.readStream.format("csv") \
    .option("header", "true") \
    .schema(&amp;lt;schema&amp;gt;) \
    .load(&amp;lt;path&amp;gt;)&lt;/LI-CODE&gt;&lt;P&gt;Or you can turn on auto loader by choosing "cloudFiles" source:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df = spark.readStream.format("cloudFiles") \
  .option("cloudFiles.format", "csv") \
  .option("header", "true") \  
  .schema(&amp;lt;schema&amp;gt;) \ # provide a schema here for the files
  .load(&amp;lt;path&amp;gt;)&lt;/LI-CODE&gt;&lt;P&gt;So you have freedom of choice &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; But if you're dealing with files on S3 bucket or ADLS I would choose auto loader any day &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 16 Jan 2026 18:10:13 GMT</pubDate>
    <dc:creator>szymon_dybczak</dc:creator>
    <dc:date>2026-01-16T18:10:13Z</dc:date>
    <item>
      <title>Is Auto Loader open source now in Apache 4.1 SDP?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-auto-loader-open-source-now-in-apache-4-1-sdp/m-p/144254#M52294</link>
      <description>&lt;P&gt;With Spark Declarative Pipelines (SDP) being open source now, does this mean that the Databricks Auto Loader functionality is also open source? Is it called something else? If not, how does the open-source version handle incremental data processing and schema inference/evolution?&lt;/P&gt;&lt;P&gt;Reference:&amp;nbsp;&lt;A href="https://spark.apache.org/docs/4.1.0/declarative-pipelines-programming-guide.html" target="_blank"&gt;Spark Declarative Pipelines Programming Guide - Spark 4.1.0 Documentation&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Jan 2026 15:13:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-auto-loader-open-source-now-in-apache-4-1-sdp/m-p/144254#M52294</guid>
      <dc:creator>ChristianRRL</dc:creator>
      <dc:date>2026-01-16T15:13:22Z</dc:date>
    </item>
    <item>
      <title>Re: Is Auto Loader open source now in Apache 4.1 SDP?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-auto-loader-open-source-now-in-apache-4-1-sdp/m-p/144265#M52298</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/96188"&gt;@ChristianRRL&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;No, autoloader is propriety to Databricks. It's not open sourced. Open source version of SDP uses spark structured streaming for incremental processing.&amp;nbsp;&lt;BR /&gt;Keep in mind that&amp;nbsp;&lt;SPAN&gt;Auto Loader is basically just Spark streaming under the hood with additional features for event-driven ingestion (and some other things).&lt;BR /&gt;Schema evolution is property of Delta protocol which is open sourced. Also spark can natively infer schema for various sources, this is not something that is unique to auto loader.&lt;BR /&gt;&lt;A href="https://medium.com/@omkarspatil2611/inferschema-schema-enforcement-in-spark-398aa7862f2b" target="_blank" rel="noopener"&gt;InferSchema &amp;amp; Schema Enforcement in Spark | by Omkar Patil | Medium&lt;/A&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Jan 2026 17:12:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-auto-loader-open-source-now-in-apache-4-1-sdp/m-p/144265#M52298</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2026-01-16T17:12:04Z</dc:date>
    </item>
    <item>
      <title>Re: Is Auto Loader open source now in Apache 4.1 SDP?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-auto-loader-open-source-now-in-apache-4-1-sdp/m-p/144268#M52301</link>
      <description>&lt;P&gt;This is helpful. Follow-up question, when setting up Databricks pipelines (previously DLT Pipelines), does it require that Autoloader is used, or can we set it to use spark structured streaming? Mainly asking to see how much vendor lock-in concerns may be eased if we can use SDP without having to use Autoloader if this is a path we want to consider.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Jan 2026 17:35:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-auto-loader-open-source-now-in-apache-4-1-sdp/m-p/144268#M52301</guid>
      <dc:creator>ChristianRRL</dc:creator>
      <dc:date>2026-01-16T17:35:19Z</dc:date>
    </item>
    <item>
      <title>Re: Is Auto Loader open source now in Apache 4.1 SDP?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-auto-loader-open-source-now-in-apache-4-1-sdp/m-p/144277#M52305</link>
      <description>&lt;P&gt;In databricks you don't have to use auto loader when you're dealing with SDP. Think of auto loader as a very specific structred streaming source (that's source is called &lt;STRONG&gt;cloudFiles&amp;nbsp;&lt;/STRONG&gt;).&lt;/P&gt;&lt;P&gt;So, for instance you can use traditional structred streaming approach to load csv files incrementally:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df = spark.readStream.format("csv") \
    .option("header", "true") \
    .schema(&amp;lt;schema&amp;gt;) \
    .load(&amp;lt;path&amp;gt;)&lt;/LI-CODE&gt;&lt;P&gt;Or you can turn on auto loader by choosing "cloudFiles" source:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df = spark.readStream.format("cloudFiles") \
  .option("cloudFiles.format", "csv") \
  .option("header", "true") \  
  .schema(&amp;lt;schema&amp;gt;) \ # provide a schema here for the files
  .load(&amp;lt;path&amp;gt;)&lt;/LI-CODE&gt;&lt;P&gt;So you have freedom of choice &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; But if you're dealing with files on S3 bucket or ADLS I would choose auto loader any day &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Jan 2026 18:10:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-auto-loader-open-source-now-in-apache-4-1-sdp/m-p/144277#M52305</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2026-01-16T18:10:13Z</dc:date>
    </item>
  </channel>
</rss>

