cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Where exactly I should create Volume in a catalog?

hpant
New Contributor III

Currently my Databricks looks like this: 

hpant_0-1722505474676.png

I want to create volume to access external location. Where exactly should I create it? Should a create new schema in "poe" catalog and create a volume inside it or create it in a existing schema? What is the best practice?

1 ACCEPTED SOLUTION

Accepted Solutions

holly
Databricks Employee
Databricks Employee

Hello! Volumes go inside of schemas (screenshot below). It's up to you how to keep your data organised, but a few considerations:

  • If you're going to have lots of volumes, does it make sense to group them together?
  • As it's raw data, it's probably categorised as 'bronze' data, you could consider keeping it with that  
  • Will you have to manage access to this data? Does it make sense to group it with other data you may want to prevent / promote access to?
  • Do you want it to inherit other properties from things in the schema like tags, features or access patterns?

At the end of the day, schemas and catalogs are there to keep data organised. With external volumes (and tables) it has no baring on where the data is stored so it doesn't have much technical impact. 

My messy demo example:

Screenshot 2024-08-01 at 13.27.39.png

View solution in original post

9 REPLIES 9

holly
Databricks Employee
Databricks Employee

Hello! Volumes go inside of schemas (screenshot below). It's up to you how to keep your data organised, but a few considerations:

  • If you're going to have lots of volumes, does it make sense to group them together?
  • As it's raw data, it's probably categorised as 'bronze' data, you could consider keeping it with that  
  • Will you have to manage access to this data? Does it make sense to group it with other data you may want to prevent / promote access to?
  • Do you want it to inherit other properties from things in the schema like tags, features or access patterns?

At the end of the day, schemas and catalogs are there to keep data organised. With external volumes (and tables) it has no baring on where the data is stored so it doesn't have much technical impact. 

My messy demo example:

Screenshot 2024-08-01 at 13.27.39.png

hpant
New Contributor III

Alright, thanks for your explanation. I have one more question, after creating a volume, how would you get it connect to a container? Imagine, you have created volume at external location a and you want to connect it external location b?

holly
Databricks Employee
Databricks Employee

Hi hpant, each volume is mapped to one location only. If you need to get data from two different locations, you'd make two separate volumes and join them as part of your pipeline.

If you wanted to read in from one location and write to another, again, you'd do that with two separate volumes. 

When I said 'group them together' above - you can have multiple volumes in one schema, even if their locations are very different. 

hpant
New Contributor III

Hey, thanks for your response. currently, I have my data in one of the container in azure which gets added to the container through azure factory pipeline.. I have created a unity catalog workspace in different resource group. It has a container but there is no data in it. I have created a volume in it. Now I want to connect the volume to  the data present in a container of different storage account of different resource group. How can I make that connection? Do I need some sort of access key mechanism?

holly
Databricks Employee
Databricks Employee

Hi hpant,

You need to set up a new volume using a new external location (and potentially storage credential). Docs here: https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/sql-ref-external-locations

hpant1
New Contributor III

Hi @holly ,

Thanks so much for your response. I have one last question in this regard. Whenever I want add an extra location, ( external location), do I need to give Contributor role or higher on the access connector resource in Azure to add the storage credential first?

Create a storage credential for connecting to Azure Data Lake Storage Gen2 - Azure Databricks | Micr...

 

Thanks,

Hiamnshu Pant

holly
Databricks Employee
Databricks Employee

The docs say 'Contributor or Owner of an Azure resource group' and I don't have any reason to contradict that  

hpant1
New Contributor III

@Retired_mod .

I am trying to do that but couldn't find the option.

hpant1
New Contributor III

No, I don't have. 

hpant1_0-1722933848032.png

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group