Appropriate storage account type for reference data (Azure)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-24-2023 07:12 AM
Hello,
We are using a reference dataset for our Production applications. We would like to create a delta table for this dataset to be used from our applications. Currently, manual updates will occur on this dataset through a script on a weekly basis. On the future, we will create a streaming job from Event Hub to propagate changes in real time through streaming jobs (Spark Streaming or DLT Pipelines). The expected rate of updates is 1000 to 10000 insertions/updates per hour.
Currently, we have an Azure blob storage account that contains all production reference data that is mounted on the corresponding databricks instance. The question is whether we should create an ADLS storage account for this Delta Table or the Blob storage account can handle this architecture. Are there any future plans to drop compatibility between blob storage and Delta?
Kind regards.
- Labels:
-
Delta Lake
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-24-2023 11:18 AM
In my experience, I have mostly seen users choose ADLS for a similar use case. So, I believe ADLS is a more popular choice.
I have not seen many users go for blob storage but I am hoping someone who has explored blob storage more could add to my comment.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2023 04:49 AM
+1 for ADLS. Hierarchical storage, hot/cold/premium storage, things not possible in blob storage

