cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

How do you get data from Azure Data Lake Gen 2 Mounted or Imported and Exported from Databricks?

ChrisS
New Contributor III

The example that data bricks gives is not helpful and does not tell me exactly what I need to do. I am new to this and not sure what I need to do in azure to get this done. I just need to be able to pull data and write data to the data containers. Be it mounting or just pulling and writing I really don't care I just need to access my data. Any help would be great. So far I have 3 days into my trial and have been able to do zero work. Getting a hold of anyone at data bricks for help has been impossible so I guess there is no support there. So I am hoping someone on this community can please help out.

Thank you in advance.

1 ACCEPTED SOLUTION

Accepted Solutions

etsyal1e2r3
Honored Contributor

https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/manage-external-loc...

Youll have to follow this and do some reading but you should be able to figure it out. Just started from the external location setup in databricks and see whar you need and work backwards from there. Let me know if you get stuck 🙂

View solution in original post

14 REPLIES 14

etsyal1e2r3
Honored Contributor

So if youre able to use a managed identity in azure and are familiar with azure to set this up i recommend that. Its possible to use a SAS token if not but ive had issues with all the methods not using a managed identity. So if you go to the data tab and look at external locations, you should start trying to add one with the plus icon in the top right. It will show you what you need to make the external location, mainly a service principle and databrickconnector resource in the resource group of your workspace. Just type thst in to the azure search bar and add it then go back to the external location page to put in its details to make the external location. Finally you have to go to your accounts page of your databricks account (the url starts with accounts.databricks...) i brlieve. Just google how to get to the accounts page where you can create a metastore, assign it to your managed blob location and tie it to your workspace. Once you have these setup and your user permissions set for the external location (data tab, ext location, permissions), then you can test the read write with

dbutils.fs.ls("<blob_location_with_files>")

Let me know what roadblocks you hit.

ChrisS
New Contributor III

I am not familiar with managed identity but if that is the recommended path, I am happy to give it a go. I am the admin for azure. Where would I go to find it and/or the documentation. I checked into chatGPT and it was NOT helpful in this regard LOL. It lead me to a place where a menu option for me did not exist.

etsyal1e2r3
Honored Contributor

https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/manage-external-loc...

Youll have to follow this and do some reading but you should be able to figure it out. Just started from the external location setup in databricks and see whar you need and work backwards from there. Let me know if you get stuck 🙂

ChrisS
New Contributor III

I get stuck when following the link to create a managed identity:

https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/azure-managed-ident...

at the spot Use a user-assigned managed identity. When going into Deploy a custom template I get an error on the resource ID.

ChrisS
New Contributor III

Ok, I got confused. I was trying to setup both a user and a system managed identy. I had the system set up and then trying to set up a user. I only needed one. Now that part is done I moved onto your link and I am unsure of what exactly I need to use to fill out the sql bits. Where would I find this information?

CREATE EXTERNAL LOCATION <location-name>

 URL 'abfss://<container-name>@<storage-account>.dfs.core.windows.net/<path>'

 WITH ([STORAGE] CREDENTIAL <storage-credential-name>);

etsyal1e2r3
Honored Contributor

Dont do it thar way. Go to the data tab and on the left click the external locations section, then top right is a button to add an external location. Go there and see what else you need. Youll have to go back to your azure resource group and add a databricksconnector configed with the managed ID.

ChrisS
New Contributor III

Maybe I am blind but I am not seeing the external location section. Here is my view of my data tab.

image

ChrisS
New Contributor III

OK, I found it. It was under account settings. My next question is I am not sure exactly what goes in these fields. <container-name>@<storage-account>.dfs.core.windows.net/<path>'. Container-name, storage account, and path. I think storage-account would be the name of the data lake such as datalake4.dfs.core.windows.net and maybe the path is /folder. But container I am not sure. I am new to data lakes.

etsyal1e2r3
Honored Contributor

Click this data tab thats circled... Screenshot_20230618_104127_Vivaldi Browser

etsyal1e2r3
Honored Contributor

This is more of an issie of getting damiliar woth azure than it is databricks. You need to make the storage account resource that youre going to use for the datalake. So go into azure portal and search for the storage acct resource and create one with a container as well. Make the file path you desire there. Then use the container and storage acct in that url with the proper folder path and out all that in the external location settings.

ChrisS
New Contributor III

I resolved the issue. I needed to create a folder in the container and connect to the folder. So i guess I will need to segment my data and create many folders like I would a relational database for different databases. Thank you for your dedication and help.

Anonymous
Not applicable

Hi @Chris Sarrico​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

ChrisS
New Contributor III

No, this is not resolved per my additional comment to the answer above. I can reiterate it here. How do you set up a managed identity? I have also opened a case with microsoft but they usually are unhelpful or don't get back very quickly. I am simply trying to access my data and my trial time is burning away and I am yet to be able to do anything with this platform. It is very frustrating and way to complex in my opinion.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.