cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How can I export a result of a SQL query from a databricks notebook?

gianni77
New Contributor

The "Download CSV" button in the notebook seems to work only for results <=1000 entries. How can I export larger result-sets as CSV?

13 REPLIES 13

User16301467534
New Contributor II

Hi,

You can write the results back to S3 and download them from directly from S3. Our DB Guide notebooks should have a sample for you to look at. Please search for Download All SQL and you should see Scala and python examples.

It woul be really great if you could add a simple option to download a table in the "data" tab. As a data scientist, I can say this is a pretty basic part of the data science process -- for example if you are building documentation or whitepapers for a model, you will need samples of data, correlation matrices, etc.

Anonymous
Not applicable

You can also write to dbfs:/FileStore/download.csv, and then download by going to https://community.cloud.databricks.com/files/download.csv

jiaxuan
New Contributor II

Hi doug,

Can you provide more details on this? Not sure how to write to the filestore.

Thanks.

pepevo
New Contributor III

yes, would you please provide more in detail on this? the link is not working (HTTP ERROR: 404" thank you

Anonymous
Not applicable

To write to the filestore you'd do:

df.write.option("header", "true").csv("dbfs:/FileStore/download.csv") and then download from

https://community.cloud.databricks.com/files/download.csv

However, Databricks now has the feature to just click "Download" when displaying a result-set, which the other answers below also mention. It will download the first 1 million rows.

pepevo
New Contributor III

I have asked this question before since the link is broken.

How do I export sql query into csv and send email out. Also, how can I select a billion records in csv. Databrick allows to export only max a million record but we have 1.5 billion

ColbyCarrillo
New Contributor II

Hello,

I usually use pyspark to do such action. Some example code of how I write it to DBFS from a pyspark data frame:

data_frame.coalesce(1).write.format("com.databricks.spark.csv").save("/FileStore/tables/your_file_name.csv")

If you are curious how to transfer from straight SQL to pyspark, you can try creating a temporary table within SQL and then reading it into a pyspark data frame to then write it out. Below code is given you have imported the respective libraries.

%python

data_frame = spark.table("your_temp_table")

Please note I normally do not have export data of this size. You may need to research more the coalesce function and may be required to combine multiple csv files.

After rereading your question, this is quite simple, when downloading a csv from the notebook there will be a down arrow indicator on the right side of the symbol. All you need to do is click that drop down and click download full results (1,000,000 max)

pepevo
New Contributor III

Cobly,

I know how to download a csv from a notebook via using a down arrow indicator on the bottom of the query. But I want to know how to create a script to pull the csv directly send out email and/or store directly in the indicate folder like oracle/mysql. Also, the query is about 1.5 billion records, 1 million max is not an option.

Thank you.

Bach-Nga

AlvinHoffman
New Contributor II

You can compose the outcomes back to S3 and download them from straightforwardly from S3. Our DB Guide note pads ought to have an example for you to take a gander at. If it's not too much trouble, look for Download All SQL and you should see Scala and python models.

Anonymous
Not applicable

Now within the Databricks SQL interface (within the SQL editor) you can actually download the full results as a csv. Just make sure to uncheck "LIMIT 1000" and then click the download button under "..." in the bottom left:Screenshot of download button location

igorstar
New Contributor III

If you have a large dataset, you might want to export it to a bucket in parquet format from your notebook:

%python

df = spark.sql("select * from your_table_name")
df.write.parquet(your_s3_path)

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group