<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Cost-Effective Databricks Pipeline for API Ingestion - Best Practices? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/cost-effective-databricks-pipeline-for-api-ingestion-best/m-p/148460#M52897</link>
    <description>&lt;P&gt;HI&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/213092"&gt;@TheDataMaverick&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;The most efficient approach for your REST API ingestion pipeline on Databricks is to use an external service like Azure Functions (or AWS Lambda) to handle API calls, then land raw JSON/Parquet in ADLS/S3 for Auto Loader ingestion into bronze.&lt;BR /&gt;After external API landing and bronze ingestion via Auto Loader, the next step for silver/gold can be either SDP (Spark Declarative Pipelines) or regular Spark jobs/workflows, depending on requirements. SDP suits declarative, incremental medallion transforms with auto-orchestration and serverless efficiency, while jobs fit custom/complex logic.&lt;/P&gt;</description>
    <pubDate>Sun, 15 Feb 2026 23:11:53 GMT</pubDate>
    <dc:creator>Pat</dc:creator>
    <dc:date>2026-02-15T23:11:53Z</dc:date>
    <item>
      <title>Cost-Effective Databricks Pipeline for API Ingestion - Best Practices?</title>
      <link>https://community.databricks.com/t5/data-engineering/cost-effective-databricks-pipeline-for-api-ingestion-best/m-p/148058#M52819</link>
      <description>&lt;P class=""&gt;Hi Community,&lt;/P&gt;&lt;P class=""&gt;As a senior data engineer migrating ETL workloads to Databricks (with Unity Catalog and Delta Lake), I'm building a cost-effective pipeline to ingest data from a REST API. Goals: minimize DBU costs, handle incremental loads, ensure scalability, and follow medallion architecture (bronze/silver/gold).&lt;/P&gt;&lt;P class=""&gt;Current thinking:&lt;/P&gt;&lt;UL class=""&gt;&lt;LI&gt;&lt;P class=""&gt;Use Python notebooks/workflows with requests/asyncio for parallel API calls, write raw JSON/Parquet to ADLS/ABFSS.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;Auto Loader or Structured Streaming for incremental bronze ingestion.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;DLT pipelines with serverless compute for transformations (leveraging incremental processing for 5x better price-performance).&lt;SPAN class=""&gt;​&lt;/SPAN&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;Optimize with autoscaling, auto-termination, and Predictive Optimization.&lt;SPAN class=""&gt;​&lt;/SPAN&gt;&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;Challenges:&lt;/P&gt;&lt;UL class=""&gt;&lt;LI&gt;&lt;P class=""&gt;Rate limiting/pagination on API.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;Cost monitoring via system.billing tables.&lt;SPAN class=""&gt;​&lt;/SPAN&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;Best cluster sizing for sporadic loads.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;What's the most efficient approach? Direct API in Spark UDFs, external functions (e.g., Azure Functions to storage), or Lakeflow Declarative Pipelines? Any code samples or pitfalls from production pipelines?&lt;/P&gt;&lt;P class=""&gt;Thanks! Sachi.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Feb 2026 15:38:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cost-effective-databricks-pipeline-for-api-ingestion-best/m-p/148058#M52819</guid>
      <dc:creator>TheDataMaverick</dc:creator>
      <dc:date>2026-02-11T15:38:49Z</dc:date>
    </item>
    <item>
      <title>Re: Cost-Effective Databricks Pipeline for API Ingestion - Best Practices?</title>
      <link>https://community.databricks.com/t5/data-engineering/cost-effective-databricks-pipeline-for-api-ingestion-best/m-p/148460#M52897</link>
      <description>&lt;P&gt;HI&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/213092"&gt;@TheDataMaverick&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;The most efficient approach for your REST API ingestion pipeline on Databricks is to use an external service like Azure Functions (or AWS Lambda) to handle API calls, then land raw JSON/Parquet in ADLS/S3 for Auto Loader ingestion into bronze.&lt;BR /&gt;After external API landing and bronze ingestion via Auto Loader, the next step for silver/gold can be either SDP (Spark Declarative Pipelines) or regular Spark jobs/workflows, depending on requirements. SDP suits declarative, incremental medallion transforms with auto-orchestration and serverless efficiency, while jobs fit custom/complex logic.&lt;/P&gt;</description>
      <pubDate>Sun, 15 Feb 2026 23:11:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cost-effective-databricks-pipeline-for-api-ingestion-best/m-p/148460#M52897</guid>
      <dc:creator>Pat</dc:creator>
      <dc:date>2026-02-15T23:11:53Z</dc:date>
    </item>
  </channel>
</rss>

