<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Autoloader Functionality Question: Pull API data directly? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/autoloader-functionality-question-pull-api-data-directly/m-p/127749#M48070</link>
    <description>&lt;P&gt;Hi there, when referencing&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/patterns#enable-flexible-semi-structured-data-pipelines" target="_self"&gt;Common data loading patterns &amp;gt; Enable flexible semi-structured data pipelines&lt;/A&gt;&amp;nbsp;, I noticed this interesting code snippet:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;spark.readStream.format("cloudFiles") \
  .option("cloudFiles.format", "json") \
  # will ensure that the headers column gets processed as a map
  .option("cloudFiles.schemaHints",
          "headers map&amp;lt;string,string&amp;gt;, statusCode SHORT") \
  .load("/api/requests") \
  .writeStream \
  .option("mergeSchema", "true") \
  .option("checkpointLocation", "&amp;lt;path-to-checkpoint&amp;gt;") \
  .start("&amp;lt;path_to_target")&lt;/LI-CODE&gt;&lt;P&gt;This may be a bit of a leap, but I'm wondering if anyone knows if Autoloader supports &lt;STRONG&gt;pulling data directly from an API&lt;/STRONG&gt; (as opposed to incrementally loading data from a designated "landing" path). Not sure if I'm reading too much into it, but the `schemaHints` and `/api/requests/` seem awfully close to being literal API calls, and this would be an interesting use-case if we are able to store both the raw json data as well as the API status code in the same target table.&lt;/P&gt;</description>
    <pubDate>Fri, 08 Aug 2025 03:51:57 GMT</pubDate>
    <dc:creator>ChristianRRL</dc:creator>
    <dc:date>2025-08-08T03:51:57Z</dc:date>
    <item>
      <title>Autoloader Functionality Question: Pull API data directly?</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-functionality-question-pull-api-data-directly/m-p/127749#M48070</link>
      <description>&lt;P&gt;Hi there, when referencing&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/patterns#enable-flexible-semi-structured-data-pipelines" target="_self"&gt;Common data loading patterns &amp;gt; Enable flexible semi-structured data pipelines&lt;/A&gt;&amp;nbsp;, I noticed this interesting code snippet:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;spark.readStream.format("cloudFiles") \
  .option("cloudFiles.format", "json") \
  # will ensure that the headers column gets processed as a map
  .option("cloudFiles.schemaHints",
          "headers map&amp;lt;string,string&amp;gt;, statusCode SHORT") \
  .load("/api/requests") \
  .writeStream \
  .option("mergeSchema", "true") \
  .option("checkpointLocation", "&amp;lt;path-to-checkpoint&amp;gt;") \
  .start("&amp;lt;path_to_target")&lt;/LI-CODE&gt;&lt;P&gt;This may be a bit of a leap, but I'm wondering if anyone knows if Autoloader supports &lt;STRONG&gt;pulling data directly from an API&lt;/STRONG&gt; (as opposed to incrementally loading data from a designated "landing" path). Not sure if I'm reading too much into it, but the `schemaHints` and `/api/requests/` seem awfully close to being literal API calls, and this would be an interesting use-case if we are able to store both the raw json data as well as the API status code in the same target table.&lt;/P&gt;</description>
      <pubDate>Fri, 08 Aug 2025 03:51:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-functionality-question-pull-api-data-directly/m-p/127749#M48070</guid>
      <dc:creator>ChristianRRL</dc:creator>
      <dc:date>2025-08-08T03:51:57Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader Functionality Question: Pull API data directly?</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-functionality-question-pull-api-data-directly/m-p/127761#M48076</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/96188"&gt;@ChristianRRL&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Unfortunately, they chose quite confusing name. Autloader only supports one type of source -&amp;gt; cloudFiles.&lt;BR /&gt;And cloudFiles is nothing but your cloud object storage. So in this example they have a datalake directory /api/request where the payload from API is saved.&lt;BR /&gt;So, to sum it up - you can't use autloader to read data directly from API's.&lt;/P&gt;</description>
      <pubDate>Fri, 08 Aug 2025 08:11:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-functionality-question-pull-api-data-directly/m-p/127761#M48076</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-08-08T08:11:55Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader Functionality Question: Pull API data directly?</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-functionality-question-pull-api-data-directly/m-p/127846#M48105</link>
      <description>&lt;P&gt;This makes sense, thank you for clarifying!&lt;/P&gt;</description>
      <pubDate>Fri, 08 Aug 2025 17:49:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-functionality-question-pull-api-data-directly/m-p/127846#M48105</guid>
      <dc:creator>ChristianRRL</dc:creator>
      <dc:date>2025-08-08T17:49:03Z</dc:date>
    </item>
  </channel>
</rss>

