Databricks Community

TheDataMaverick · ‎02-11-2026

Hi Community,

As a senior data engineer migrating ETL workloads to Databricks (with Unity Catalog and Delta Lake), I'm building a cost-effective pipeline to ingest data from a REST API. Goals: minimize DBU costs, handle incremental loads, ensure scalability, and follow medallion architecture (bronze/silver/gold).

Current thinking:

Use Python notebooks/workflows with requests/asyncio for parallel API calls, write raw JSON/Parquet to ADLS/ABFSS.
Auto Loader or Structured Streaming for incremental bronze ingestion.
DLT pipelines with serverless compute for transformations (leveraging incremental processing for 5x better price-performance).
Optimize with autoscaling, auto-termination, and Predictive Optimization.

Challenges:

Rate limiting/pagination on API.
Cost monitoring via system.billing tables.
Best cluster sizing for sporadic loads.

What's the most efficient approach? Direct API in Spark UDFs, external functions (e.g., Azure Functions to storage), or Lakeflow Declarative Pipelines? Any code samples or pitfalls from production pipelines?

Thanks! Sachi.

Pat · ‎02-15-2026

HI @TheDataMaverick ,
The most efficient approach for your REST API ingestion pipeline on Databricks is to use an external service like Azure Functions (or AWS Lambda) to handle API calls, then land raw JSON/Parquet in ADLS/S3 for Auto Loader ingestion into bronze.
After external API landing and bronze ingestion via Auto Loader, the next step for silver/gold can be either SDP (Spark Declarative Pipelines) or regular Spark jobs/workflows, depending on requirements. SDP suits declarative, incremental medallion transforms with auto-orchestration and serverless efficiency, while jobs fit custom/complex logic.

View solution in original post

Pat · ‎02-15-2026

HI @TheDataMaverick ,
The most efficient approach for your REST API ingestion pipeline on Databricks is to use an external service like Azure Functions (or AWS Lambda) to handle API calls, then land raw JSON/Parquet in ADLS/S3 for Auto Loader ingestion into bronze.
After external API landing and bronze ingestion via Auto Loader, the next step for silver/gold can be either SDP (Spark Declarative Pipelines) or regular Spark jobs/workflows, depending on requirements. SDP suits declarative, incremental medallion transforms with auto-orchestration and serverless efficiency, while jobs fit custom/complex logic.

Databricks Community

Cost-Effective Databricks Pipeline for API Ingestion - Best Practices?

Upcoming Community BrickTalk: Sports Analytics: Turning Tracking Data into Real-Time AI Decisions

How to Optimize Your Content for GEO: Best Practices for Writing Discoverable Community Content

Solution Accelerator Series | Building Common Sense Product Recommendations With LLMs

Databricks Community Fellows – June 2026 Recap

The Next Wave of Enterprise AI | Webinar