<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Ingesting data from APIs in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/ingesting-data-from-apis/m-p/142254#M11231</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/194506"&gt;@Anonym40&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;There’s no silver bullet here. It’s rather a matter of opinion. I would prefer the first of the approaches mentioned - that is, a separate process responsible for extracting data from the API and saving it to the data lake. Then, use Auto Loader to process the data into the bronze layer.&lt;/P&gt;&lt;P&gt;With this approach, if there is any need to reload the data, you’ll have it readily available at lake.&lt;/P&gt;&lt;P&gt;Another argument is that for some APIs it is not possible to retrieve data older than a certain period (for example, anything older than one 3 month may no longer be available).&lt;/P&gt;&lt;P&gt;If you were to write the data directly to a table and had a minor bug in the code parsing the response that went unnoticed for a longer time, you would no longer be able to correct the data. If you every time extracting data in unchanged format from source - then you have an easy way to rebuild entire table in case of any issue.&lt;/P&gt;</description>
    <pubDate>Fri, 19 Dec 2025 13:10:06 GMT</pubDate>
    <dc:creator>szymon_dybczak</dc:creator>
    <dc:date>2025-12-19T13:10:06Z</dc:date>
    <item>
      <title>Ingesting data from APIs</title>
      <link>https://community.databricks.com/t5/get-started-discussions/ingesting-data-from-apis/m-p/142250#M11229</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;BR /&gt;I need to ingest some data available at API endpoint.&amp;nbsp;&lt;BR /&gt;I was thinking of this option -&amp;nbsp;&lt;BR /&gt;1. make API call from Notebook and save data to ADLS&lt;BR /&gt;2. use AutoLoader to load data from ADLS location.&amp;nbsp;&lt;BR /&gt;But then, i have some doubts - like I can directly write the api response to table,&lt;BR /&gt;then Is writing to ADLS is an unnecessary step ?&amp;nbsp;&lt;BR /&gt;Then I thought if I drop the table, maybe I can use the files in ADLS to reload the table.&amp;nbsp;&lt;BR /&gt;Then again, i can restore from Version instead of using something like copy Into using files in ADLS ?&amp;nbsp;&lt;BR /&gt;&amp;nbsp;Which approach should I take then ? How are other people doing it ?&amp;nbsp;&lt;BR /&gt;Thanks,&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 19 Dec 2025 12:24:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/ingesting-data-from-apis/m-p/142250#M11229</guid>
      <dc:creator>Anonym40</dc:creator>
      <dc:date>2025-12-19T12:24:29Z</dc:date>
    </item>
    <item>
      <title>Re: Ingesting data from APIs</title>
      <link>https://community.databricks.com/t5/get-started-discussions/ingesting-data-from-apis/m-p/142254#M11231</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/194506"&gt;@Anonym40&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;There’s no silver bullet here. It’s rather a matter of opinion. I would prefer the first of the approaches mentioned - that is, a separate process responsible for extracting data from the API and saving it to the data lake. Then, use Auto Loader to process the data into the bronze layer.&lt;/P&gt;&lt;P&gt;With this approach, if there is any need to reload the data, you’ll have it readily available at lake.&lt;/P&gt;&lt;P&gt;Another argument is that for some APIs it is not possible to retrieve data older than a certain period (for example, anything older than one 3 month may no longer be available).&lt;/P&gt;&lt;P&gt;If you were to write the data directly to a table and had a minor bug in the code parsing the response that went unnoticed for a longer time, you would no longer be able to correct the data. If you every time extracting data in unchanged format from source - then you have an easy way to rebuild entire table in case of any issue.&lt;/P&gt;</description>
      <pubDate>Fri, 19 Dec 2025 13:10:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/ingesting-data-from-apis/m-p/142254#M11231</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-12-19T13:10:06Z</dc:date>
    </item>
    <item>
      <title>Re: Ingesting data from APIs</title>
      <link>https://community.databricks.com/t5/get-started-discussions/ingesting-data-from-apis/m-p/142256#M11232</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/194506"&gt;@Anonym40&lt;/a&gt;&amp;nbsp;- its generally a good idea to break the direct API calls to your rest of the data pipeline. By staging the data to ADLS, you are protecting your downstream to upstream processes and getting more restartability/maintenance in your e2e flow. Also, if you ever need to use the staged data (Data science or any other team), its still availalbe to consume.&lt;/P&gt;&lt;P&gt;Rest I agree with&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 19 Dec 2025 13:15:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/ingesting-data-from-apis/m-p/142256#M11232</guid>
      <dc:creator>Raman_Unifeye</dc:creator>
      <dc:date>2025-12-19T13:15:09Z</dc:date>
    </item>
  </channel>
</rss>

