cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Efficient data retrieval process between Azure Blob storage and Azure databricks

User16826994223
Honored Contributor III

I am trying to design a stream a data analytics project using functions --> event hub --> storage --> Azure factory --> databricks --> SQL server.

What I am strugging with at the moment is the idea about how to optimize "data retrieval" to feed my ETL process on Azure Databricks.

with this I am going to handle lots of incoming file during different period of time and I an using. function to create event of the file as it comes and sent o blob storage then I put the data to azure data factory and then it comes to databricks, all this process is taking ample amount of time and creating delay in full process

1 REPLY 1

Ryan_Chynoweth
Esteemed Contributor

Check out our auto loader capabilities that can automatically track and process files that need to be processed.

Autoloader

There are two options:

  • directory listing, which is essentially completing the same steps that you have listed above but in a slightly more efficient manner.
  • file notification, which creates managed resources in order to track files using a Azure Event Grid and Queue Storage services.

The file notification option is more scalable and is likely to better suit your needs.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group