cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

network security for DBFS storage account

Bas1
New Contributor III

In Azure Databricks the DBFS storage account is open to all networks. Changing that to use a private endpoint or minimizing access to selected networks is not allowed.

Is there any way to add network security to this storage account?

Alternatively, is it possible to configure another storage account for DBFS that is owned, secured and maintained by the customer?

Clarification: This post is intended to be about the DBFS root

1 ACCEPTED SOLUTION

Accepted Solutions

Hubert-Dudek
Esteemed Contributor III

@Bas Toeterโ€‹ , at least regarding metastore it is in Mysql RDS and you can backup metastore and than use own Azure SQL with private link and have full control.

Regarding DBFS root I am trying not to use it and use own datapoints. Log redirection and clean there logs regularly. Root dbfs is managed by databricks so I trust it is secure but I prefer not to use it because of lack of full control.

I know that there will be significant changes in security (Roadmap) which for sure include enhanced encryption and private links.

Regarding credentials you can replace it with Azure key vault with private link.

View solution in original post

16 REPLIES 16

Hubert-Dudek
Esteemed Contributor III

Yes it is possible. Please create own Azure data lake storage and mount it to directory of your choice.

In all databases, tables use location pointing to your mount.

How to do it I explained step by step in that post https://community.databricks.com/s/feed/0D53f00001eQGOHCA4

Bas1
New Contributor III

Is that the way to go to replace the default DBFS-root?

Hubert-Dudek
Esteemed Contributor III

No it is additionall mount (new directory for your data)โ€‹

Bas1
New Contributor III

Thank you very much, I am going to look into that!

๐Ÿ‘

Bas1
New Contributor III

I should rephrase the question a little to make clear what our goal is:

is there a way to add network security to the dbfs-root that is deployed with Databricks in Azure? It feels somewhat uneasy having a storage account that may hold credentials, uploaded data or notebook results which is open to the internet.

Is it possible to add a layer of network protection on top of what is already there?

User16764241763
Honored Contributor

Hello @Bas Toeterโ€‹ 

You could enable double encryption on DBFS root storage account

https://docs.microsoft.com/en-us/azure/databricks/security/keys/double-encryption

There are Deny assignments that prevent any changes to the storage account.

Bas1
New Contributor III

Hi @Arvind Ravishโ€‹ ,

As far as I understand double encryption will protect us when one of the keys is lost or when the entire algoritme is compromised. I don't think it would help when there is unauthorised acces to the storage account.

As it is not so simple to introduce a private endpoint for the DBFS root, I should probably take one step back and assess the impact of a compromised DBFS root first.

A compromised DBFS root also leads to a compromised Metastore, not sure how bad that would be, but it seems to contain mostly metadata. In our case losing that would probably not hurt much.

The documentation states: "The DBFS root also contains dataโ€”including mount point metadata and credentials and certain types of logsโ€”that is not visible and cannot be directly accessed."

What data is in these mounts that the DBFS root holds the credentials for?

Hubert-Dudek
Esteemed Contributor III

@Bas Toeterโ€‹ , at least regarding metastore it is in Mysql RDS and you can backup metastore and than use own Azure SQL with private link and have full control.

Regarding DBFS root I am trying not to use it and use own datapoints. Log redirection and clean there logs regularly. Root dbfs is managed by databricks so I trust it is secure but I prefer not to use it because of lack of full control.

I know that there will be significant changes in security (Roadmap) which for sure include enhanced encryption and private links.

Regarding credentials you can replace it with Azure key vault with private link.

affine
New Contributor II

I have the same question, it would be helpful to know if there is any way to secure the DBFS Root Storage Account by restricting access from specific VNets rather than having it open from all networks (in Azure this is regarding the Storage Account starting with dbstorage*******).

Hubert-Dudek
Esteemed Contributor III

In the coming weeks, there will be changes, so it will be possible to have everything in databricks in the private network using private IPs.

Hi, is this currenly possible?

Osirus
New Contributor III

Hello Hubert, I've got the same use case. My central IT is currently deploying Azure Policies over Azure subscriptions to ensure that all Storage Account have public access restricted and Access Key disabled. However, because of the Databricks backend Storage Accounts which cannot be customize at creation the policy is not fulfil..

You referred to upcoming changes, are they now available and might them help me to solve this situation ?

Thanks a lot for your help.

Lรฉo

Osirus
New Contributor III

Hello @Hubert Dudekโ€‹,

any insights on this matter ?

Thanks,

Lรฉo

Hubert-Dudek
Esteemed Contributor III

Hi, maybe the easiest is to ask Azure databricks support/sales representative for help.

Regarding the new private link feature, here is detailed documentation https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/p...

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group