cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Need to move files from one Volume to other

navi_bricks
New Contributor II

We recently enabled Unity catalog on our workspace, as part of certain transformations(Custom clustered Datapipelines(python)) we need to move file from one volume to other volume. 

As the job itself runs on a service principal that has access to external storage, we don't want to pass in any credentials. Can we achieve this? We tried with OS, DbUtils, and Workspace client, all of which need service principal credentials. Finally the reading of volume we achieved through spark context itself, but moving of files we need other way, please help.

9 REPLIES 9

Walter_C
Databricks Employee
Databricks Employee

You should be able to use dbutils.fs.cp to copy the file but you jus need to ensure that the SP has WRITE VOLUME permission on the destination Volume.

navi_bricks
New Contributor II

Thanks for that,

But I have a Python data pipeline running under a custom cluster, and its not working from there. 

 

Walter_C
Databricks Employee
Databricks Employee

What is the error being received? And does the SP has the mentioned permission in UC?

MujtabaNoori
New Contributor III

Hi @navi_bricks ,
It can be achieved by creating a new notebook and writing the db utils cp or mv command in that notebook. After that, you can create a workflow or an small  independent ADF pipeline using the same SP which has the permission. It will run and move the files.

Thanks

navi_bricks
New Contributor II

Thanks for that @MujtabaNoori 

Instead of using a notebook, can I use Workspaceclient part of databrick sdk and move the files?

Walter_C
Databricks Employee
Databricks Employee

You can try:

from databricks.sdk import WorkspaceClient

# Initialize the WorkspaceClient
w = WorkspaceClient()

# Define source and destination paths
source_path = "/Volumes/<source_catalog>/<source_schema>/<source_volume>/<file_name>"
destination_path = "/Volumes/<destination_catalog>/<destination_schema>/<destination_volume>/<file_name>"

# Move the file
w.files.move(source_path, destination_path)

# Verify the file has been moved
for item in w.files.list_directory_contents(f"/Volumes/<destination_catalog>/<destination_schema>/<destination_volume>"):
    print(item.path)

navi_bricks
New Contributor II

Sorry for the late reply. I was on vacation and didn't check this out. I tried this but always got the error "default auth: cannot configure default credentials." I even tried to use it with client ID and secret being passed as arguments.

saurabh18cs
Valued Contributor III

# Using Databricks CLI profile to access the workspace.
w = WorkspaceClient(profile="<<profilename>>")

OR

w = WorkspaceClient(
            host=databricks_host,
            client_id=app_id,
            client_secret=app_secret,
            auth_type="azure-cli",
        )
 
OR 
from azure.identity import ClientSecretCredential
credential = ClientSecretCredential(
    tenant_id="<tenant-id>",
    client_id="<client-id>",
    client_secret="<client-secret>"
)

# Get the access token for Databricks
token = credential.get_token("https://databricks.azure.net/.default").token

# Initialize the WorkspaceClient using the service principal token
client = WorkspaceClient(
    host="https://<databricks-instance>",
    token=token
)
 
OR 
 
# Initialize the WorkspaceClient using an OAuth token
w = WorkspaceClient(
    host="https://<databricks-instance>",
    token="<oauth-token>"
)
 
OR
w = WorkspaceClient(
        host=databricks_host,
        token=pat_token,
    )

Dnirmania
Contributor

Not all job clusters work well with Volumes. I used following type cluster to access files from Volume.

Dnirmania_0-1736959382755.pngDnirmania_1-1736959432923.png

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group