โ08-01-2024 02:47 AM
Currently my Databricks looks like this:
I want to create volume to access external location. Where exactly should I create it? Should a create new schema in "poe" catalog and create a volume inside it or create it in a existing schema? What is the best practice?
โ08-01-2024 05:33 AM - edited โ08-01-2024 05:35 AM
Hello! Volumes go inside of schemas (screenshot below). It's up to you how to keep your data organised, but a few considerations:
At the end of the day, schemas and catalogs are there to keep data organised. With external volumes (and tables) it has no baring on where the data is stored so it doesn't have much technical impact.
My messy demo example:
โ08-01-2024 05:33 AM - edited โ08-01-2024 05:35 AM
Hello! Volumes go inside of schemas (screenshot below). It's up to you how to keep your data organised, but a few considerations:
At the end of the day, schemas and catalogs are there to keep data organised. With external volumes (and tables) it has no baring on where the data is stored so it doesn't have much technical impact.
My messy demo example:
โ08-01-2024 05:45 AM
Alright, thanks for your explanation. I have one more question, after creating a volume, how would you get it connect to a container? Imagine, you have created volume at external location a and you want to connect it external location b?
โ08-01-2024 06:57 AM
Hi hpant, each volume is mapped to one location only. If you need to get data from two different locations, you'd make two separate volumes and join them as part of your pipeline.
If you wanted to read in from one location and write to another, again, you'd do that with two separate volumes.
When I said 'group them together' above - you can have multiple volumes in one schema, even if their locations are very different.
โ08-01-2024 07:08 AM
Hey, thanks for your response. currently, I have my data in one of the container in azure which gets added to the container through azure factory pipeline.. I have created a unity catalog workspace in different resource group. It has a container but there is no data in it. I have created a volume in it. Now I want to connect the volume to the data present in a container of different storage account of different resource group. How can I make that connection? Do I need some sort of access key mechanism?
โ08-05-2024 01:26 AM
Hi hpant,
You need to set up a new volume using a new external location (and potentially storage credential). Docs here: https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/sql-ref-external-locations
โ08-05-2024 02:01 AM
Hi @holly ,
Thanks so much for your response. I have one last question in this regard. Whenever I want add an extra location, ( external location), do I need to give Contributor role or higher on the access connector resource in Azure to add the storage credential first?
Thanks,
Hiamnshu Pant
โ08-05-2024 08:18 AM
The docs say 'Contributor or Owner of an Azure resource group' and I don't have any reason to contradict that
โ08-06-2024 01:38 AM
I am trying to do that but couldn't find the option.
โ08-06-2024 01:44 AM
No, I don't have.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group