Efficient data retrieval process between Azure Blob storage and Azure databricks

User16826994223 — Mon, 14 Jun 2021 13:26:52 GMT

I am trying to design a stream a data analytics project using functions --> event hub --> storage --> Azure factory --> databricks --> SQL server.

What I am strugging with at the moment is the idea about how to optimize "data retrieval" to feed my ETL process on Azure Databricks.

with this I am going to handle lots of incoming file during different period of time and I an using. function to create event of the file as it comes and sent o blob storage then I put the data to azure data factory and then it comes to databricks, all this process is taking ample amount of time and creating delay in full process

Re: Efficient data retrieval process between Azure Blob storage and Azure databricks

Ryan_Chynoweth — Mon, 21 Jun 2021 19:31:54 GMT

Check out our auto loader capabilities that can automatically track and process files that need to be processed.

Autoloader

There are two options:

directory listing, which is essentially completing the same steps that you have listed above but in a slightly more efficient manner.
file notification, which creates managed resources in order to track files using a Azure Event Grid and Queue Storage services.

The file notification option is more scalable and is likely to better suit your needs.

topic Re: Efficient data retrieval process between Azure Blob storage and Azure databricks in Data Engineering

Efficient data retrieval process between Azure Blob storage and Azure databricks

Re: Efficient data retrieval process between Azure Blob storage and Azure databricks