Data Engineering

Forum Posts

Sorted by:

by Bilal1 • New Contributor III

02-16-2022 10:37:25 PM

34996 Views
7 replies
2 kudos

Resolved! Simply writing a dataframe to a CSV file (non-partitioned)

When writing a dataframe in Pyspark to a CSV file, a folder is created and a partitioned CSV file is created. I have then rename this file in order to distribute it my end user.Is there any way I can simply write my data to a CSV file, with the name ...

Data Engineering

34996 Views
7 replies
2 kudos

02-16-2022 10:37:25 PM

View Replies

Latest Reply

chris0706
New Contributor II

10-04-2024 10:36:57 AM

2 kudos

I know this post is a little old, but Chat GPT actually put together a very clean and straightforward solution for me (in scala): // Set the temporary output directory and the desired final file pathval tempDir = "/tmp/your_file_name"val finalOutputP...

2 kudos

10-04-2024 10:36:57 AM

6 More Replies

by teng_shin_lim • New Contributor

06-19-2023 10:23:35 PM

2244 Views
1 replies
1 kudos

Having issue trying to download a csv file from a website using FireFox Selenium.

Hi, When I clicked on the download button from a website thru Firefox selenium using element.click(), and the download destination is being set as Azure datalake storage. Then, after the download started, those .csv and .csv.part files never gotten m...

Data Engineering

2244 Views
1 replies
1 kudos

06-19-2023 10:23:35 PM

View Replies

Latest Reply

Anonymous
Not applicable

06-20-2023 8:13:06 PM

1 kudos

Hi @Brandon Lim Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

1 kudos

06-20-2023 8:13:06 PM

by cnjrules • New Contributor III

04-27-2023 2:25:24 PM

3334 Views
3 replies
0 kudos

Resolved! Reference file name when using COPY INTO?

When using the COPY INTO statement is it possible to reference the current file name in the select staement? A generic example is shown below, hoping I can log the file name in the target table.COPY INTO my_table FROM (SELECT key, index, textData, ...

Data Engineering

3334 Views
3 replies
0 kudos

04-27-2023 2:25:24 PM

View Replies

Latest Reply

cnjrules
New Contributor III

05-08-2023 12:45:52 PM

0 kudos

Found the info I was looking for on the page below:https://docs.databricks.com/ingestion/file-metadata-column.html

0 kudos

05-08-2023 12:45:52 PM

2 More Replies

by Rami2023 • New Contributor II

04-20-2023 7:49:45 AM

10397 Views
1 replies
0 kudos

Read CSV file using SQL

Hi, I am trying to reverse engineer to get to the source file for a table. Looking at the query history, I came across SQL string which loads data from file to table, however the code looks little mystery to me. I haven't come across idbfs, Can someb...

Data Engineering

10397 Views
1 replies
0 kudos

04-20-2023 7:49:45 AM

View Replies

by uv • New Contributor II

03-26-2023 8:51:04 PM

6376 Views
3 replies
2 kudos

Parquet to csv delta file

Hi Team, I have a parquet file in s3 bucket which is a delta file I am able to read it but I am unable to write it as a csv file.getting the following error when i am trying to write:A transaction log for Databricks Delta was found at `s3://path/a...

Data Engineering

6376 Views
3 replies
2 kudos

03-26-2023 8:51:04 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-03-2023 11:40:42 PM

2 kudos

Hi @yuvesh kotiala Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

2 kudos

04-03-2023 11:40:42 PM

2 More Replies

by alxsbn • Contributor

01-11-2023 2:40:56 AM

2684 Views
2 replies
2 kudos

Resolved! Autloader on CSV file didn't infer well cell with JSON data

Hello ! I playing with autoloader schema inference on a big S3 repo with +300 tables and large CSV files. I'm looking at autoloader with great attention, as it can be a great time saver on our ingestion process (data comes from a transactional DB gen...

Data Engineering

2684 Views
2 replies
2 kudos

01-11-2023 2:40:56 AM

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

01-11-2023 3:43:05 AM

2 kudos

PySpark by default is using \ as an escape character. You can change it to "Doc: https://docs.databricks.com/ingestion/auto-loader/options.html#csv-options

2 kudos

01-11-2023 3:43:05 AM

1 More Replies

by ronaldolopes • New Contributor

09-05-2022 6:41:16 AM

26477 Views
1 replies
0 kudos

Resolved! Exporting data from databricks to external csv

I need to export some data from the database to csv which will be downloaded to another application. What would be the procedure for that? I don't have a lot of knowledge in DataBricks and I didn't find much information in the documentation.Thanks.

Data Engineering

26477 Views
1 replies
0 kudos

09-05-2022 6:41:16 AM

View Replies

Latest Reply

AmanSehgal
Honored Contributor III

09-05-2022 7:01:09 AM

0 kudos

You can manually download data to your local in CSV from databricks notebook cell and pass it to your another application.Your application can run Databricks notebook inside a workflow via an API that writes data to S3 bucket in CSV and in response y...

0 kudos

09-05-2022 7:01:09 AM

by klllmmm • New Contributor II

05-24-2022 9:22:22 AM

4877 Views
3 replies
1 kudos

Error as no such file when reading CSV file using pandas

I'm trying to read a CSV file saved in data using pandas read_csv function. But it gives No such file error.%fs ls /FileStore/tables/ df= pd.read_csv('/dbfs/FileStore/tables/CREDIT_1.CSV') df= pd.read_csv('/dbfs:/FileStore/tables/CREDIT_1.CSV')...

Data Engineering

4877 Views
3 replies
1 kudos

05-24-2022 9:22:22 AM

View Replies

Latest Reply

klllmmm
New Contributor II

06-03-2022 9:33:44 AM

1 kudos

Thanks to @Werner Stinckens for the answer.I understood that I have to use spark to read data from clusters.

1 kudos

06-03-2022 9:33:44 AM

2 More Replies

by Sarvagna_Mahaka • New Contributor III

11-02-2021 6:30:24 AM

18701 Views
6 replies
8 kudos

Resolved! Exporting csv files from Databricks

I'm trying to export a csv file from my Databricks workspace to my laptop.I have followed the below steps. 1.Installed databricks CLI2. Generated Token in Azure Databricks3. databricks configure --token5. Token:xxxxxxxxxxxxxxxxxxxxxxxxxx6. databrick...

Data Engineering

18701 Views
6 replies
8 kudos

11-02-2021 6:30:24 AM

View Replies

Latest Reply

User16871418122
Contributor III

11-16-2021 7:14:49 PM

8 kudos

Hi @Sarvagna Mahakali There is an easier hack: a) You can save results locally on the disk and create a hyper link for downloading CSV . You can copy the file to this location: dbfs:/FileStore/table1_good_2020_12_18_07_07_19.csvb) Then download with...

8 kudos

11-16-2021 7:14:49 PM

5 More Replies

by dataslicer • Contributor

09-27-2021 4:16:50 PM

10593 Views
4 replies
4 kudos

Resolved! Unable to save Spark Dataframe to driver node's local file system as CSV file

Running Azure Databricks Enterprise DBR 8.3 ML running on a single node, with Python notebook. I have 2 small Spark dataframes that I am able source via credential passthrough reading from ADLSgen2 via `abfss://` method and display the full content ...

Data Engineering

10593 Views
4 replies
4 kudos

09-27-2021 4:16:50 PM

View Replies

Latest Reply

Dan_Z
Databricks Employee

10-12-2021 1:41:59 PM

4 kudos

Modern Spark operates by a design choice to separate storage and compute. So saving a csv to the river's local disk doesn't make sense for a few reasons:the worker nodes don't have access to the driver's disk. They would need to send the data over to...

4 kudos

10-12-2021 1:41:59 PM

3 More Replies

by Satyadeepak • Databricks Employee

06-24-2021 7:21:43 AM

1309 Views
1 replies
1 kudos

In Databricks UI /Workspace and /Repos are in same level but while reading a CSV file in Repos Notebooks why do we need to give the path as /Workspace/Repos...?

Data Engineering

1309 Views
1 replies
1 kudos

06-24-2021 7:21:43 AM

View Replies

Latest Reply

aladda
Databricks Employee

06-25-2021 3:52:43 PM

1 kudos

Can you provide an example of what exactly do you mean? If the reference is to how "Repos" shows up in the UI, that's more for a Ux convenience. Repos as such are designed to be a container for version controlled notebooks that live in the Git reposi...

1 kudos

06-25-2021 3:52:43 PM

by aladda • Databricks Employee

06-23-2021 8:58:34 PM

2675 Views
1 replies
0 kudos

Resolved! How can I speed up the loading of a large zipped CSV file in databricks

Data Engineering

2675 Views
1 replies
0 kudos

06-23-2021 8:58:34 PM

View Replies

Latest Reply

aladda
Databricks Employee

06-23-2021 9:01:02 PM

0 kudos

gzip format is not splittable so the load process is sequential and thus slower. You can either try to split the CSV up into parts, gzip those separately and load them. Alternatively bzip is a splittable zip format that is better to work withOr you c...

0 kudos

06-23-2021 9:01:02 PM

Databricks Community

Resolved! Simply writing a dataframe to a CSV file (non-partitioned)

Having issue trying to download a csv file from a website using FireFox Selenium.

Resolved! Reference file name when using COPY INTO?

Read CSV file using SQL

Parquet to csv delta file

Resolved! Autloader on CSV file didn't infer well cell with JSON data

Resolved! Exporting data from databricks to external csv

Error as no such file when reading CSV file using pandas

Resolved! Exporting csv files from Databricks

Resolved! Unable to save Spark Dataframe to driver node's local file system as CSV file

In Databricks UI /Workspace and /Repos are in same level but while reading a CSV file in Repos Notebooks why do we need to give the path as /Workspace/Repos...?

Resolved! How can I speed up the loading of a large zipped CSV file in databricks