cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Unable to open a file in dbfs. Trying to move files from Google Bucket to Azure Blob Storage

editter
New Contributor II

Background:

I am attempting to download the google cloud sdk on Databricks. The end goal is to be able to use the sdk to transfer files from a Google Cloud Bucket to Azure Blob Storage using Databricks. (If you have any other ideas for this transfer please feel free to share. I do not want to use Azure Data Factory.)

I also have Unity Catalog enabled if that makes a difference.

Right now, I was first attempting to unzip the google cloud sdk in dbfs after I moved it to the following location. I know the file exists here:

 

%fs
ls dbfs:/tmp/google_sdk

Returns:
dbfs:/tmp/google_sdk/google_cloud_sdk_352_0_0_linux_x86_64_tar.gz

 

I have tried the following to open the file with tarfile. None have worked:

 

tar = tarfile.open('dbfs:/tmp/google_sdk/google_cloud_sdk_352_0_0_linux_x86_64_tar.gz', mode="r|gz")

tar = tarfile.open('/dbfs/tmp/google_sdk/google_cloud_sdk_352_0_0_linux_x86_64_tar.gz', mode="r|gz")

tar = tarfile.open('/tmp/google_sdk/google_cloud_sdk_352_0_0_linux_x86_64_tar.gz', mode="r|gz")

tar = tarfile.open('/dbfs/dbfs/tmp/google_sdk/google_cloud_sdk_352_0_0_linux_x86_64_tar.gz', mode="r|gz")

tar = tarfile.open('dbfs/tmp/google_sdk/google_cloud_sdk_352_0_0_linux_x86_64_tar.gz', mode="r|gz")

tar = tarfile.open('tmp/google_sdk/google_cloud_sdk_352_0_0_linux_x86_64_tar.gz', mode="r|gz")

 

All of them returning that no such file or directory exists, but I know it does. What am I missing here? Why am I not able to open this file?

Thanks for any help!

1 REPLY 1

editter
New Contributor II

Thanks you for the response!

2 Questions:

1. How would you create a cluster with the custom requirements for the google cloud sdk? Is that still possible for a Unity Catalog enabled cluster with Shared Access Mode?

2. Is a script action the same as a cluster init script? I couldn't find any documentation for script actions. 

I tried running that script on an existing cluster and it returned an AttributeError with no description. Just points to the line running the dbutils.cluster.submit_run (which I also can't find documenation for this command). I verified the cluster_id and driver_node_type_id were correct. 

Thanks for any help

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group