cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Efficient data retrieval process between Azure Blob storage and Azure databricks

User16826994223
Honored Contributor III

I am trying to design a stream a data analytics project using functions --> event hub --> storage --> Azure factory --> databricks --> SQL server.

What I am strugging with at the moment is the idea about how to optimize "data retrieval" to feed my ETL process on Azure Databricks.

with this I am going to handle lots of incoming file during different period of time and I an using. function to create event of the file as it comes and sent o blob storage then I put the data to azure data factory and then it comes to databricks, all this process is taking ample amount of time and creating delay in full process

1 REPLY 1

Ryan_Chynoweth
Honored Contributor III

Check out our auto loader capabilities that can automatically track and process files that need to be processed.

Autoloader

There are two options:

  • directory listing, which is essentially completing the same steps that you have listed above but in a slightly more efficient manner.
  • file notification, which creates managed resources in order to track files using a Azure Event Grid and Queue Storage services.

The file notification option is more scalable and is likely to better suit your needs.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.