cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

network security for DBFS storage account

Bas1
New Contributor III

In Azure Databricks the DBFS storage account is open to all networks. Changing that to use a private endpoint or minimizing access to selected networks is not allowed.

Is there any way to add network security to this storage account?

Alternatively, is it possible to configure another storage account for DBFS that is owned, secured and maintained by the customer?

Clarification: This post is intended to be about the DBFS root

1 ACCEPTED SOLUTION

Accepted Solutions

Hubert-Dudek
Esteemed Contributor III

@Bas Toeter​ , at least regarding metastore it is in Mysql RDS and you can backup metastore and than use own Azure SQL with private link and have full control.

Regarding DBFS root I am trying not to use it and use own datapoints. Log redirection and clean there logs regularly. Root dbfs is managed by databricks so I trust it is secure but I prefer not to use it because of lack of full control.

I know that there will be significant changes in security (Roadmap) which for sure include enhanced encryption and private links.

Regarding credentials you can replace it with Azure key vault with private link.

View solution in original post

17 REPLIES 17

Hubert-Dudek
Esteemed Contributor III

Yes it is possible. Please create own Azure data lake storage and mount it to directory of your choice.

In all databases, tables use location pointing to your mount.

How to do it I explained step by step in that post https://community.databricks.com/s/feed/0D53f00001eQGOHCA4

Bas1
New Contributor III

Is that the way to go to replace the default DBFS-root?

Hubert-Dudek
Esteemed Contributor III

No it is additionall mount (new directory for your data)​

Bas1
New Contributor III

Thank you very much, I am going to look into that!

👍

Bas1
New Contributor III

I should rephrase the question a little to make clear what our goal is:

is there a way to add network security to the dbfs-root that is deployed with Databricks in Azure? It feels somewhat uneasy having a storage account that may hold credentials, uploaded data or notebook results which is open to the internet.

Is it possible to add a layer of network protection on top of what is already there?

User16764241763
Honored Contributor

Hello @Bas Toeter​ 

You could enable double encryption on DBFS root storage account

https://docs.microsoft.com/en-us/azure/databricks/security/keys/double-encryption

There are Deny assignments that prevent any changes to the storage account.

Bas1
New Contributor III

Hi @Arvind Ravish​ ,

As far as I understand double encryption will protect us when one of the keys is lost or when the entire algoritme is compromised. I don't think it would help when there is unauthorised acces to the storage account.

As it is not so simple to introduce a private endpoint for the DBFS root, I should probably take one step back and assess the impact of a compromised DBFS root first.

A compromised DBFS root also leads to a compromised Metastore, not sure how bad that would be, but it seems to contain mostly metadata. In our case losing that would probably not hurt much.

The documentation states: "The DBFS root also contains data—including mount point metadata and credentials and certain types of logs—that is not visible and cannot be directly accessed."

What data is in these mounts that the DBFS root holds the credentials for?

Hubert-Dudek
Esteemed Contributor III

@Bas Toeter​ , at least regarding metastore it is in Mysql RDS and you can backup metastore and than use own Azure SQL with private link and have full control.

Regarding DBFS root I am trying not to use it and use own datapoints. Log redirection and clean there logs regularly. Root dbfs is managed by databricks so I trust it is secure but I prefer not to use it because of lack of full control.

I know that there will be significant changes in security (Roadmap) which for sure include enhanced encryption and private links.

Regarding credentials you can replace it with Azure key vault with private link.

Kaniz
Community Manager
Community Manager

Hi @Bas Toeter​ , Just a friendly follow-up. Do you still need help, or have you resolved your problem with the above solutions? Please let us know.

affine
New Contributor II

I have the same question, it would be helpful to know if there is any way to secure the DBFS Root Storage Account by restricting access from specific VNets rather than having it open from all networks (in Azure this is regarding the Storage Account starting with dbstorage*******).

Hubert-Dudek
Esteemed Contributor III

In the coming weeks, there will be changes, so it will be possible to have everything in databricks in the private network using private IPs.

Hi, is this currenly possible?

Osirus
New Contributor III

Hello Hubert, I've got the same use case. My central IT is currently deploying Azure Policies over Azure subscriptions to ensure that all Storage Account have public access restricted and Access Key disabled. However, because of the Databricks backend Storage Accounts which cannot be customize at creation the policy is not fulfil..

You referred to upcoming changes, are they now available and might them help me to solve this situation ?

Thanks a lot for your help.

Léo

Osirus
New Contributor III

Hello @Hubert Dudek​,

any insights on this matter ?

Thanks,

Léo

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.