Efficient data retrieval process between Azure Blob storage and Azure databricks

User16826994223
Databricks Employee
Databricks Employee

I am trying to design a stream a data analytics project using functions --> event hub --> storage --> Azure factory --> databricks --> SQL server.

What I am strugging with at the moment is the idea about how to optimize "data retrieval" to feed my ETL process on Azure Databricks.

with this I am going to handle lots of incoming file during different period of time and I an using. function to create event of the file as it comes and sent o blob storage then I put the data to azure data factory and then it comes to databricks, all this process is taking ample amount of time and creating delay in full process

Ryan_Chynoweth
Databricks Employee
Databricks Employee

Check out our auto loader capabilities that can automatically track and process files that need to be processed.

Autoloader

There are two options:

  • directory listing, which is essentially completing the same steps that you have listed above but in a slightly more efficient manner.
  • file notification, which creates managed resources in order to track files using a Azure Event Grid and Queue Storage services.

The file notification option is more scalable and is likely to better suit your needs.