Databricks Community

User16826994223 · ‎06-14-2021

I am trying to design a stream a data analytics project using functions --> event hub --> storage --> Azure factory --> databricks --> SQL server.

What I am strugging with at the moment is the idea about how to optimize "data retrieval" to feed my ETL process on Azure Databricks.

with this I am going to handle lots of incoming file during different period of time and I an using. function to create event of the file as it comes and sent o blob storage then I put the data to azure data factory and then it comes to databricks, all this process is taking ample amount of time and creating delay in full process

Ryan_Chynoweth · ‎06-21-2021

Check out our auto loader capabilities that can automatically track and process files that need to be processed.

Autoloader

There are two options:

directory listing, which is essentially completing the same steps that you have listed above but in a slightly more efficient manner.
file notification, which creates managed resources in order to track files using a Azure Event Grid and Queue Storage services.

The file notification option is more scalable and is likely to better suit your needs.

Databricks Community

Efficient data retrieval process between Azure Blob storage and Azure databricks

Connect with Databricks Users in Your Area

Announcing the Winners of the Generative AI World Cup

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

How to present and share your Notebook insights in AI/BI Dashboards

Introducing an exclusively Databricks-hosted Assistant

Meet the Databricks MVPs