cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Exporting data from databricks

DimitrisMpizos
New Contributor

I couldn't find in documentation a way to export an RDD as a text file to a local folder by using python. Is it possible?

16 REPLIES 16

raela
New Contributor III
New Contributor III

Sounds like you're looking for saveAsTextFile().

Refer to the documentation here:

https://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD.saveAsTextFile

miklos
Contributor

You can use the FileStore to save a data file to that location and retrieve it by going to your shard and retrieve it there. Look at the Databricks Guide -> Product Overview -> FileStore.

You can access the files saved there by going to:

https://xxxxx.cloud.databricks.com/files/folder/specific_file

nassir_m
New Contributor II

What is the code to save a data object to the file store? I have a list object with JSON elements that I want to save to local disk, but am unable to do so.

lefish
New Contributor II

Hello all, hello ​@Miklos_C​ ,

How do you access the files? I don't understand what is the "xxxx" in the URL "https://xxxxx.cloud.databricks.com/files/folder/specific_file". Do we need to replace it with the 16-numbers sequence that identifies us? With our username/mail adress? I'm using Databricks Community Edition, do I need to put the string "community" somewhere?

Thank you for your answer

Have a nice day

gachet
New Contributor III

Do you know what is xxxxx in the URL?, I have the same problem.

Regards

Diego

lefish
New Contributor II

Simply replace it with "community"

Thibault

gachet
New Contributor III

Thanks a lot. It works !!

gachet
New Contributor III

do you know how to acces the filestore with wget?

lefish
New Contributor II

Absolutely not... 😞

This won't work because you'd have to authenticate with Databricks in order to download it. This is suitable for doing things like loading javascript libraries but not for extracting data from Databricks. In order to download data you should connect to amazon S3 or use the DBFS api.

http://docs.databricks.com/spark/latest/data-sources/amazon-s3.html

http://docs.databricks.com/api/latest/dbfs.html

gachet
New Contributor III

Thank you for the answer.

grfiv
New Contributor II

I use s3 as an intermediary. rdd.saveAsTextFile("s3a://...")

PaulLintilhac
New Contributor III

there should really be a way to simply export a file to your desktop

Manu1
New Contributor II

To: Export a file to local desktop

Workaround : Basically you have to do a "Create a table in notebook" with DBFS

The steps are:

  1. Click on "Data" icon >
  2. Click "Add Data" button >
  3. Click "DBFS" button >
  4. Click "FileStore" folder icon in 1st pane "Select a file from DBFS" >
  5. In the 2nd pane, scroll down to locate folder icon "tables" >
  6. Click to select folder icon "tables" >
  7. In the 3rd pane, Click to select folder icon "FileStore" >
  8. In the 4th pane, Click to select folder icon "tables" >
  9. In the 5th pance locate your file, (mine was a .csv), click to select >
  10. Click "Create table in Notebook" button >
  11. This will create a notebook >
  12. Click on "Run All" to run the notebook (create clustes, if needed) >
  13. Click "OK" when prompted to "Attach and Run" >
  14. There would be download ICON in a couple of places which you can click to download to the local machine.

(Tweak this notebook to download different files)

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.