cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Create csv and upload on azure

subhadeep
New Contributor II

Can some write a sql query , which queries a table like select * from stages.benefit , creates a csv and upload on azure 

2 REPLIES 2

Isi
Contributor

Hey!

Well, not many details have been provided, but in general, this approach could work for you. Hope it helps!
df = spark.sql("SELECT * FROM stages.benefit")

df.write.mode("overwrite").option("header", "true").csv(output_path)

🙂

hari-prasad
Valued Contributor II

Hi @subhadeep ,

You can achieve this in SQL similarly to how you write a dataframe into a table or blob path. We will create an external table pointing to the blob path or mounted blob path. Note that this table does not support ACID transactions and versioning like a Delta table. However, you can set partitions to manage the data. Instead of a single CSV file, you will see multiple part files created as CSVs based on your data inserts.

-- Create a table with csv format and pointing to blob location
Create or replace table new_external_table_in_blob (
id int,
...
)
Using CSV
Location '.../path/blob'
;

-- Insert data into table 
Insert into new_external_table_in_blob 
values (1, ...);

 

To obtain a single file, perform a PySpark or Scala DataFrame operation and coalesce to 1 for a single partition of data. You can then write this to a blob path. If your data volume is small, you can convert the Spark DataFrame into a Pandas DataFrame and write it to the blob path as a single CSV file.

Regards,
Hari Prasad



Regards,
Hari Prasad