Efficient data retrieval process between Azure Blob storage and Azure databricks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-14-2021 06:26 AM
I am trying to design a stream a data analytics project using functions --> event hub --> storage --> Azure factory --> databricks --> SQL server.
What I am strugging with at the moment is the idea about how to optimize "data retrieval" to feed my ETL process on Azure Databricks.
with this I am going to handle lots of incoming file during different period of time and I an using. function to create event of the file as it comes and sent o blob storage then I put the data to azure data factory and then it comes to databricks, all this process is taking ample amount of time and creating delay in full process
- Labels:
-
Azure
-
Azure databricks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-21-2021 12:31 PM
Check out our auto loader capabilities that can automatically track and process files that need to be processed.
There are two options:
- directory listing, which is essentially completing the same steps that you have listed above but in a slightly more efficient manner.
- file notification, which creates managed resources in order to track files using a Azure Event Grid and Queue Storage services.
The file notification option is more scalable and is likely to better suit your needs.