cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Download a dbfs:/FileStore File to my Local Machine?

IgnacioCastinei
New Contributor III

Hi all,

I am using saveAsTextFile() to store the results of a Spark job in the folder dbfs:/FileStore/my_result.

I can access to the different "part-xxxxx" files using the web browser, but I would like to automate the process of downloading all files to my local machine.

I have tried to use cURL, but I can't find the RestAPI command to download a dbfs:/FileStore file.

Question: How can I download a dbfs:/FileStore file to my Local Machine?

I am using Databricks Community Edition to teach an undergraduate module in Big Data Analytics in college. I have Windows 7 installed in my local machine. I have checked that cURL and the _netrc files are properly installed and configured as I manage to successfully run some of the commands provided by the RestAPI.

Thank you very much in advance for your help!

Best regards,

Nacho

1 ACCEPTED SOLUTION

Accepted Solutions

LiNKArsIdeni
New Contributor III

The answer by @tonyp​  works well if the file is stored in FileStore. However if it is stored in the mnt folder, you will need something like this:

https://community.cloud.databricks.com/dbfs/mnt/blob/<file_name>.csv?o=<your_number_here>

Note that this will prompt you for your login and password, but once you enter this, the download should be seamless.

View solution in original post

11 REPLIES 11

tonyp
New Contributor II

Files stored in /FileStore are accessible in your web browser at https://<databricks-instance-name>.cloud.databricks.com/files/. For example, the file you stored in /FileStore/my-stuff/my-file.txt is accessible at:

"https://<databricks-instance-name>.cloud.databricks.com/files/my-stuff/my-file.txt"

Note If you are on Community Edition you may need to replace https://community.cloud.databricks.com/files/my-stuff/my-file.txt with https://community.cloud.databricks.com/files/my-stuff/my-file.txt?o=######where the number after o= is the same as in your Community Edition URL.

Refer: https://docs.databricks.com/user-guide/advanced/filestore.html

LiNKArsIdeni
New Contributor III

The answer by @tonyp​  works well if the file is stored in FileStore. However if it is stored in the mnt folder, you will need something like this:

https://community.cloud.databricks.com/dbfs/mnt/blob/<file_name>.csv?o=<your_number_here>

Note that this will prompt you for your login and password, but once you enter this, the download should be seamless.

Eve
New Contributor III

or simply CLI?

DBFS CLI

Marc0
New Contributor II

For me, this does not work. I am trying to understand delta lake as a non tech user. I managed to create a community edition account and environment. Next, I followed the tutorial located here: https://docs.databricks.com/getting-started/quick-start.html

So I created the 'diamonds' table and so on. The only thing I want to do is to download the parquet and json files, just to see what's inside. I use the community edition, but the above does not work. Just no idea how to access the files. I tried the following (copied the right file path):

https://community.cloud.databricks.com/dbfs/mnt/delta/diamonds/_delta_log/00000000000000000000.json?...

(where ### is my community number indeed). But I receive a 401:

HTTP ERROR 401

Problem accessing /dbfs/mnt/delta/diamonds/_delta_log/00000000000000000000.json.

Reason: Unauthorized

How do I download the files? Cannot find it anywhere. Thanks for your help!

Atanu
Esteemed Contributor
Esteemed Contributor

It should be just auth issue , something with the permission. are you trying from CLI?

Kaniz
Community Manager
Community Manager

Hi @Marco Deterink​ ,

You need to use the Databricks CLI for this task.

  1. Install the CLI on your local machine and run databricks configure to authenticate. Use an access token generated under user settings as the password.
  2. Once you have the CLI installed and configured to your workspace, you can copy files to and from DBFS like this.
databricks fs cp dbfs:/path_to_file/my_file /path_to_local_file/my_file

You can also use the shorthand

dbfs cp dbfs:/path_to_file /path_to_local_file

Kaniz
Community Manager
Community Manager

Hi @Marco Deterink​ , Did you try the above steps? Did it help you?

Anonymous
Not applicable

Hi! Welcome to the community and thank you for your question! My name is Piper, and I'm one of Databricks' moderators. We will give the community members a chance to respond. Then, if necessary, we'll circle back.

Thanks in advance for your patience.

Atanu
Esteemed Contributor
Esteemed Contributor

https://docs.databricks.com/dev-tools/cli/dbfs-cli.html liverage our DBFS CLI to download file.

Atanu
Esteemed Contributor
Esteemed Contributor

@Ignacio Castineiras​  are you able to look into above dbfs cli which may work with your case. Please let us know if you need further help on this. Thanks.

CraigJ
New Contributor II

works well if the file is stored in FileStore. However if it is stored in the mnt folder, you will need something like this:

https://community.cloud.databricks.com/dbfs/mnt/blob/<file_name>.csv?o=<your_number_here>

Note that this will prompt you for your login and password, but once you enter this, the download should be seamless.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.