cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Ingest data from REST endpoint into Databricks

RodrigoE
New Contributor III

Hello,

I'm looking for the best option to retrieve between 1-1.5TB of data per day from a REST API into Databricks.

Thank you,

Rodrigo Escamilla

1 REPLY 1

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @RodrigoE,

It would be helpful to have additional information to recommend the best options for your scenario. 

  • Who owns the REST API?
  • Is that in your control? 
  • Can the source push data to Databricks, or should you pull on a schedule?

If the source can push the data, consider Zerobus. This is the cleanest, most scalable Databricks-native pattern if the producer is under your control.

If you have no control over the source, you can build a custom Python data source wrapping their REST API and run it as a Databricks job/stream. While the pattern will work for your volumes, the bottleneck is usually the APIโ€™s own throughput/limits, not Databricks.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

 

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***