โ12-19-2024 12:41 AM
We recently enabled Unity catalog on our workspace, as part of certain transformations(Custom clustered Datapipelines(python)) we need to move file from one volume to other volume.
As the job itself runs on a service principal that has access to external storage, we don't want to pass in any credentials. Can we achieve this? We tried with OS, DbUtils, and Workspace client, all of which need service principal credentials. Finally the reading of volume we achieved through spark context itself, but moving of files we need other way, please help.
โ12-19-2024 03:40 AM
You should be able to use dbutils.fs.cp to copy the file but you jus need to ensure that the SP has WRITE VOLUME permission on the destination Volume.
โ12-19-2024 03:47 AM
Thanks for that,
But I have a Python data pipeline running under a custom cluster, and its not working from there.
โ12-19-2024 03:51 AM
What is the error being received? And does the SP has the mentioned permission in UC?
โ12-22-2024 11:24 PM
Hi @navi_bricks ,
It can be achieved by creating a new notebook and writing the db utils cp or mv command in that notebook. After that, you can create a workflow or an small independent ADF pipeline using the same SP which has the permission. It will run and move the files.
Thanks
โ12-24-2024 05:30 AM
Thanks for that @MujtabaNoori
Instead of using a notebook, can I use Workspaceclient part of databrick sdk and move the files?
โ01-03-2025 02:45 PM
You can try:
from databricks.sdk import WorkspaceClient
# Initialize the WorkspaceClient
w = WorkspaceClient()
# Define source and destination paths
source_path = "/Volumes/<source_catalog>/<source_schema>/<source_volume>/<file_name>"
destination_path = "/Volumes/<destination_catalog>/<destination_schema>/<destination_volume>/<file_name>"
# Move the file
w.files.move(source_path, destination_path)
# Verify the file has been moved
for item in w.files.list_directory_contents(f"/Volumes/<destination_catalog>/<destination_schema>/<destination_volume>"):
print(item.path)
4 weeks ago
Sorry for the late reply. I was on vacation and didn't check this out. I tried this but always got the error "default auth: cannot configure default credentials." I even tried to use it with client ID and secret being passed as arguments.
4 weeks ago - last edited 4 weeks ago
# Using Databricks CLI profile to access the workspace.
w = WorkspaceClient(profile="<<profilename>>")
OR
4 weeks ago
Not all job clusters work well with Volumes. I used following type cluster to access files from Volume.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group