โ03-15-2022 01:42 AM
In Azure Databricks the DBFS storage account is open to all networks. Changing that to use a private endpoint or minimizing access to selected networks is not allowed.
Is there any way to add network security to this storage account?
Alternatively, is it possible to configure another storage account for DBFS that is owned, secured and maintained by the customer?
Clarification: This post is intended to be about the DBFS root
โ03-17-2022 08:56 AM
@Bas Toeterโ , at least regarding metastore it is in Mysql RDS and you can backup metastore and than use own Azure SQL with private link and have full control.
Regarding DBFS root I am trying not to use it and use own datapoints. Log redirection and clean there logs regularly. Root dbfs is managed by databricks so I trust it is secure but I prefer not to use it because of lack of full control.
I know that there will be significant changes in security (Roadmap) which for sure include enhanced encryption and private links.
Regarding credentials you can replace it with Azure key vault with private link.
โ03-15-2022 05:33 AM
Yes it is possible. Please create own Azure data lake storage and mount it to directory of your choice.
In all databases, tables use location pointing to your mount.
How to do it I explained step by step in that post https://community.databricks.com/s/feed/0D53f00001eQGOHCA4
โ03-15-2022 08:04 AM
Is that the way to go to replace the default DBFS-root?
โ03-15-2022 08:06 AM
No it is additionall mount (new directory for your data)โ
โ03-15-2022 05:43 AM
Thank you very much, I am going to look into that!
๐
โ03-15-2022 09:47 AM
I should rephrase the question a little to make clear what our goal is:
is there a way to add network security to the dbfs-root that is deployed with Databricks in Azure? It feels somewhat uneasy having a storage account that may hold credentials, uploaded data or notebook results which is open to the internet.
Is it possible to add a layer of network protection on top of what is already there?
โ03-16-2022 08:57 AM
Hello @Bas Toeterโ
You could enable double encryption on DBFS root storage account
https://docs.microsoft.com/en-us/azure/databricks/security/keys/double-encryption
There are Deny assignments that prevent any changes to the storage account.
โ03-17-2022 08:09 AM
Hi @Arvind Ravishโ ,
As far as I understand double encryption will protect us when one of the keys is lost or when the entire algoritme is compromised. I don't think it would help when there is unauthorised acces to the storage account.
As it is not so simple to introduce a private endpoint for the DBFS root, I should probably take one step back and assess the impact of a compromised DBFS root first.
A compromised DBFS root also leads to a compromised Metastore, not sure how bad that would be, but it seems to contain mostly metadata. In our case losing that would probably not hurt much.
The documentation states: "The DBFS root also contains dataโincluding mount point metadata and credentials and certain types of logsโthat is not visible and cannot be directly accessed."
What data is in these mounts that the DBFS root holds the credentials for?
โ03-17-2022 08:56 AM
@Bas Toeterโ , at least regarding metastore it is in Mysql RDS and you can backup metastore and than use own Azure SQL with private link and have full control.
Regarding DBFS root I am trying not to use it and use own datapoints. Log redirection and clean there logs regularly. Root dbfs is managed by databricks so I trust it is secure but I prefer not to use it because of lack of full control.
I know that there will be significant changes in security (Roadmap) which for sure include enhanced encryption and private links.
Regarding credentials you can replace it with Azure key vault with private link.
โ05-20-2022 05:41 AM
I have the same question, it would be helpful to know if there is any way to secure the DBFS Root Storage Account by restricting access from specific VNets rather than having it open from all networks (in Azure this is regarding the Storage Account starting with dbstorage*******).
โ05-21-2022 03:07 AM
In the coming weeks, there will be changes, so it will be possible to have everything in databricks in the private network using private IPs.
โ02-28-2024 02:08 AM
Hi, is this currenly possible?
โ10-06-2022 08:46 AM
Hello Hubert, I've got the same use case. My central IT is currently deploying Azure Policies over Azure subscriptions to ensure that all Storage Account have public access restricted and Access Key disabled. However, because of the Databricks backend Storage Accounts which cannot be customize at creation the policy is not fulfil..
You referred to upcoming changes, are they now available and might them help me to solve this situation ?
Thanks a lot for your help.
Lรฉo
โ10-13-2022 04:48 AM
Hello @Hubert Dudekโ,
any insights on this matter ?
Thanks,
Lรฉo
โ10-13-2022 08:09 AM
Hi, maybe the easiest is to ask Azure databricks support/sales representative for help.
Regarding the new private link feature, here is detailed documentation https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/p...
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group