cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to connect your Azure Data Lake Storage to Azure DatabricksStandard Workspaceย �� Private linkย In your storage accounts please go to โ€œNetwor...

Hubert-Dudek
Esteemed Contributor III

How to connect your Azure Data Lake Storage to Azure Databricks

Standard Workspace

๐Ÿ‘‰ Private link

In your storage accounts please go to โ€œNetworkingโ€ -> โ€œPrivate endpoint connectionsโ€ and click Add Private Endpoint.

image.pngIt is important to add private links in the same region and the same virtual network as your databricks. Databricks will need for data lake one private link for target sub-resource โ€œdfsโ€ and one for โ€œblobโ€.

image.pngIn Virtual Network options for private link please select virtual network  which has PrivateDatabricks and PublicDatabricks subset. You can use ServiceEndpoints subset for your private link (if you donโ€™t have it please create it).

image.png 

๐Ÿ‘‰ Application

You need to create Azure application which will authorize access to your data lake storage. Search for โ€œapp registrationโ€ and create it with friendly name:

image.pngAfter creating app please copy following values, as you will need them later:

-         app_id: Please go to app main page and copy โ€œApplication (client) IDโ€

-         tenant_id: Please go to app main page and copy โ€œDirectory (tenant) IDโ€

-         secret: Please go to app โ€œCertificates and secretsโ€ create new client secret and please copy โ€œValueโ€.

๐Ÿ‘‰ Grant your application access to storage account

Please back to your delta lake storage account. Please go to โ€œAccess Control (IAM)โ€ and add role โ€œStorage Blob Data Contributorโ€

image.png 

Click select members and find app which weโ€™ve just created.

๐Ÿ‘‰ Databricks

Now we can finally go to databricks to mount containers from our storage. Mount is permanent it is enough to do it only once. It is good to store code which we used for mount (for example in repo we can create folder infrastructure) so we can easily recreate it. We just need to put to our code values which we copied earlier.

configs = {"fs.azure.account.auth.type": "OAuth",
           "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
           "fs.azure.account.oauth2.client.id": app_id,
           "fs.azure.account.oauth2.client.secret": secret,
           "fs.azure.account.oauth2.client.endpoint": f"https://login.microsoftonline.com/{tenant_id}/oauth2/token"}
 
dbutils.fs.mount(
  source = f"abfss://{container}@{storage_name}.dfs.core.windows.net/",
  mount_point = "/mnt/your_folder",
  extra_configs = configs)

๐Ÿ‘‰ Troubleshooting

It is good to use nslookup command to check is your delta lake storage resolving to private ip:

image.png

3 REPLIES 3

Hubert-Dudek
Esteemed Contributor III

I've created that post as it is returning question in databricks community. I will keep it updated. Any suggestions are welcome.

Anonymous
Not applicable

@Hubert Dudekโ€‹ - Have I told you lately that you're the best!?!

Hubert-Dudek
Esteemed Contributor III

you know how to motivate me ๐Ÿ™‚

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group