โ01-26-2022 01:42 PM
How to connect your Azure Data Lake Storage to Azure Databricks
Standard Workspace
๐ Private link
In your storage accounts please go to โNetworkingโ -> โPrivate endpoint connectionsโ and click Add Private Endpoint.
It is important to add private links in the same region and the same virtual network as your databricks. Databricks will need for data lake one private link for target sub-resource โdfsโ and one for โblobโ.
In Virtual Network options for private link please select virtual network which has PrivateDatabricks and PublicDatabricks subset. You can use ServiceEndpoints subset for your private link (if you donโt have it please create it).
๐ Application
You need to create Azure application which will authorize access to your data lake storage. Search for โapp registrationโ and create it with friendly name:
After creating app please copy following values, as you will need them later:
- app_id: Please go to app main page and copy โApplication (client) IDโ
- tenant_id: Please go to app main page and copy โDirectory (tenant) IDโ
- secret: Please go to app โCertificates and secretsโ create new client secret and please copy โValueโ.
๐ Grant your application access to storage account
Please back to your delta lake storage account. Please go to โAccess Control (IAM)โ and add role โStorage Blob Data Contributorโ
Click select members and find app which weโve just created.
๐ Databricks
Now we can finally go to databricks to mount containers from our storage. Mount is permanent it is enough to do it only once. It is good to store code which we used for mount (for example in repo we can create folder infrastructure) so we can easily recreate it. We just need to put to our code values which we copied earlier.
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": app_id,
"fs.azure.account.oauth2.client.secret": secret,
"fs.azure.account.oauth2.client.endpoint": f"https://login.microsoftonline.com/{tenant_id}/oauth2/token"}
dbutils.fs.mount(
source = f"abfss://{container}@{storage_name}.dfs.core.windows.net/",
mount_point = "/mnt/your_folder",
extra_configs = configs)
๐ Troubleshooting
It is good to use nslookup command to check is your delta lake storage resolving to private ip:
โ01-26-2022 01:43 PM
I've created that post as it is returning question in databricks community. I will keep it updated. Any suggestions are welcome.
โ01-26-2022 05:33 PM
@Hubert Dudekโ - Have I told you lately that you're the best!?!
โ01-27-2022 04:00 AM
you know how to motivate me ๐
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group