Exporting data from databricks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-08-2016 07:45 AM
I couldn't find in documentation a way to export an RDD as a text file to a local folder by using python. Is it possible?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-08-2016 11:20 AM
Sounds like you're looking for saveAsTextFile().
Refer to the documentation here:
https://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD.saveAsTextFile
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-08-2016 02:33 PM
You can use the FileStore to save a data file to that location and retrieve it by going to your shard and retrieve it there. Look at the Databricks Guide -> Product Overview -> FileStore.
You can access the files saved there by going to:
https://xxxxx.cloud.databricks.com/files/folder/specific_file
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-09-2017 08:08 AM
What is the code to save a data object to the file store? I have a list object with JSON elements that I want to save to local disk, but am unable to do so.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-31-2016 07:23 AM
Hello all, hello @Miklos_C ,
How do you access the files? I don't understand what is the "xxxx" in the URL "https://xxxxx.cloud.databricks.com/files/folder/specific_file". Do we need to replace it with the 16-numbers sequence that identifies us? With our username/mail adress? I'm using Databricks Community Edition, do I need to put the string "community" somewhere?
Thank you for your answer
Have a nice day
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-13-2016 05:28 AM
Do you know what is xxxxx in the URL?, I have the same problem.
Regards
Diego
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-13-2016 06:45 AM
Simply replace it with "community"
Thibault- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-19-2016 01:22 PM
Thanks a lot. It works !!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-20-2016 09:08 AM
do you know how to acces the filestore with wget?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-20-2016 09:35 AM
Absolutely not... 😞
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-20-2016 11:25 AM
This won't work because you'd have to authenticate with Databricks in order to download it. This is suitable for doing things like loading javascript libraries but not for extracting data from Databricks. In order to download data you should connect to amazon S3 or use the DBFS api.
http://docs.databricks.com/spark/latest/data-sources/amazon-s3.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-24-2016 12:58 PM
Thank you for the answer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2017 11:51 AM
I use s3 as an intermediary. rdd.saveAsTextFile("s3a://...")
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-04-2018 08:51 AM
there should really be a way to simply export a file to your desktop
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-25-2019 08:18 AM
To: Export a file to local desktop
Workaround : Basically you have to do a "Create a table in notebook" with DBFS
The steps are:
- Click on "Data" icon >
- Click "Add Data" button >
- Click "DBFS" button >
- Click "FileStore" folder icon in 1st pane "Select a file from DBFS" >
- In the 2nd pane, scroll down to locate folder icon "tables" >
- Click to select folder icon "tables" >
- In the 3rd pane, Click to select folder icon "FileStore" >
- In the 4th pane, Click to select folder icon "tables" >
- In the 5th pance locate your file, (mine was a .csv), click to select >
- Click "Create table in Notebook" button >
- This will create a notebook >
- Click on "Run All" to run the notebook (create clustes, if needed) >
- Click "OK" when prompted to "Attach and Run" >
- There would be download ICON in a couple of places which you can click to download to the local machine.
(Tweak this notebook to download different files)