Data Engineering

Forum Posts

Sorted by:

by boskicl • New Contributor III

03-23-2022 11:04:23 AM

36668 Views
8 replies
11 kudos

Resolved! Table write command stuck "Filtering files for query."

Hello all,Background:I am having an issue today with databricks using pyspark-sql and writing a delta table. The dataframe is made by doing an inner join between two tables and that is the table which I am trying to write to a delta table. The table ...

Data Engineering

36668 Views
8 replies
11 kudos

03-23-2022 11:04:23 AM

View Replies

Latest Reply

nvashisth
New Contributor III

08-13-2025 9:40:21 AM

11 kudos

@timo199 , @boskicl I had similar issue and job was getting stuck at Filtering Files for Query indefinitely. I checked SPARK logs and based on that figured out that we had enabled PHOTON acceleration on our cluster for job and datatype of our columns...

11 kudos

08-13-2025 9:40:21 AM

7 More Replies

by Bilal1 • New Contributor III

02-16-2022 10:37:25 PM

42459 Views
7 replies
3 kudos

Resolved! Simply writing a dataframe to a CSV file (non-partitioned)

When writing a dataframe in Pyspark to a CSV file, a folder is created and a partitioned CSV file is created. I have then rename this file in order to distribute it my end user.Is there any way I can simply write my data to a CSV file, with the name ...

Data Engineering

42459 Views
7 replies
3 kudos

02-16-2022 10:37:25 PM

View Replies

Latest Reply

chris0706
New Contributor II

10-04-2024 10:36:57 AM

3 kudos

I know this post is a little old, but Chat GPT actually put together a very clean and straightforward solution for me (in scala): // Set the temporary output directory and the desired final file pathval tempDir = "/tmp/your_file_name"val finalOutputP...

3 kudos

10-04-2024 10:36:57 AM

6 More Replies

by William_Scardua • Valued Contributor

05-10-2023 1:01:50 PM

7724 Views
3 replies
1 kudos

Resolved! How to integrate pipeline with Dynatrace ?

Hi guys,Do you know how I integrate pipeline some data to Dynatrace ?Have any idea ?Thank you

Data Engineering

7724 Views
3 replies
1 kudos

05-10-2023 1:01:50 PM

View Replies

Latest Reply

Anonymous
Not applicable

05-20-2023 11:02:07 PM

1 kudos

Hi @William Scardua Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...

1 kudos

05-20-2023 11:02:07 PM

2 More Replies

by alejandrofm • Valued Contributor

04-20-2023 5:44:19 AM

5454 Views
2 replies
2 kudos

Resolved! Lot of write shuffle on optimize + ZORDER, is it normal?

Hi! I'm optimizing several Tb of partitioned data on ZSTD lvl 9.It surprises me the level of shuffle write, it could make sense because of ZORDER but I want to be sure that I'm not missing something, here is some context: Could I be missing something...

Data Engineering

5454 Views
2 replies
2 kudos

04-20-2023 5:44:19 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-23-2023 8:05:20 AM

2 kudos

Hi @Alejandro Martinez Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best an...

2 kudos

04-23-2023 8:05:20 AM

1 More Replies

by RengarLee • Contributor

06-01-2022 12:37:41 AM

11325 Views
10 replies
3 kudos

Resolved! Databricks write to Azure Data Explorer writes suddenly become slower

Now, I write to Azure Data explorer using Spark streaming. one day， writes suddenly become slower. restart is no effect.I have a questions about Spark Streaming to Azure Data explorer.Q1: What should I do to get performance to reply?Figure 1 shows th...

Data Engineering

11325 Views
10 replies
3 kudos

06-01-2022 12:37:41 AM

View Replies

Latest Reply

RengarLee
Contributor

03-27-2023 7:18:45 PM

3 kudos

I'm so sorry, I just thought the issue wasn't resolvedSolutionSet maxFilesPerTrigger and maxBytesPerTrigger Enable autpoptimizeReason for the first day, it processes larger files and then eventually process smaller files。Detailed reason B...

3 kudos

03-27-2023 7:18:45 PM

9 More Replies

by KamKam • New Contributor

04-26-2022 1:52:28 AM

1879 Views
2 replies
0 kudos

How to write to a folder in a Azure Data Lake container using Delta?

Hi All,How to write to a folder in a Azure Data Lake container using Delta?When I run:write_mode = 'overwrite' write_format = 'delta' save_path = '/mnt/container-name/folder-name' df.write \ .mode(write_mode) \ .format(write_format) \ ....

Data Engineering

1879 Views
2 replies
0 kudos

04-26-2022 1:52:28 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

06-01-2022 5:07:28 PM

0 kudos

Hi @Kamalen Reddy ,Could you share the error message please?

0 kudos

06-01-2022 5:07:28 PM

1 More Replies

by Hubert-Dudek • Esteemed Contributor III

01-26-2022 12:50:23 PM

2007 Views
1 replies
15 kudos

Resolved! Write to Azure Delta Lake - optimization request

Databricks/Delta team could optimize some commands which writes to Azure Blob Storage as Azure display that message:

Data Engineering

2007 Views
1 replies
15 kudos

01-26-2022 12:50:23 PM

View Replies

Latest Reply

Anonymous
Not applicable

01-27-2022 2:55:17 PM

15 kudos

Hey there. Thank you for your suggestion. I'll pass this up to the team.

15 kudos

01-27-2022 2:55:17 PM

by Data_Bricks1 • New Contributor III

10-13-2021 11:47:18 AM

5431 Views
7 replies
0 kudos

data from 10 BLOB containers and multiple hierarchical folders(every day and every hour folders) in each container to Delta lake table in parquet format - Incremental loading for latest data only insert no updates

I am able to load data for single container by hard coding, but not able to load from multiple containers. I used for loop, but data frame is loading only last container's last folder record only.Here one more issue is I have to flatten data, when I ...

Data Engineering

5431 Views
7 replies
0 kudos

10-13-2021 11:47:18 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

10-14-2021 3:48:17 AM

0 kudos

for sure function (def) should be declared outside loop, move it after importing libraries,logic is a bit complicated you need to debug it using display(Flatten_df2) (or .show()) and validating json after each iteration (using break or sleep etc.)

0 kudos

10-14-2021 3:48:17 AM

6 More Replies

by Anonymous • Not applicable

06-17-2021 11:34:08 AM

2529 Views
1 replies
0 kudos

Resolved! Is it possible to write single Spark stream to 2 different Delta tables? Recommendations around that?

Data Engineering

2529 Views
1 replies
0 kudos

06-17-2021 11:34:08 AM

View Replies

Latest Reply

Ryan_Chynoweth
Esteemed Contributor

06-17-2021 12:36:20 PM

0 kudos

In this scenario, the best option would be to have a single readStream reading a source delta table. Since checkpoint logs are controlled when writing to delta tables you would be able to maintain separate logs for each of your writeStreams. I would...

0 kudos

06-17-2021 12:36:20 PM

by KiranRastogi • New Contributor

05-07-2017 11:55:01 PM

43745 Views
2 replies
2 kudos

Pandas dataframe to a table

I want to write a pandas dataframe to a table, how can I do this ? Write command is not working, please help.

Data Engineering

43745 Views
2 replies
2 kudos

05-07-2017 11:55:01 PM

View Replies

Latest Reply

amy_wang
New Contributor II

09-27-2017 11:13:12 AM

2 kudos

Hey Kiran, Just taking a stab in the dark but do you want to convert the Pandas DataFrame to a Spark DataFrame and then write out the Spark DataFrame as a non-temporary SQL table? import pandas as pd ## Create Pandas Frame pd_df = pd.DataFrame({u'20...

2 kudos

09-27-2017 11:13:12 AM

1 More Replies

Databricks Community

Resolved! Table write command stuck "Filtering files for query."

Resolved! Simply writing a dataframe to a CSV file (non-partitioned)

Resolved! How to integrate pipeline with Dynatrace ?

Resolved! Lot of write shuffle on optimize + ZORDER, is it normal?

Resolved! Databricks write to Azure Data Explorer writes suddenly become slower

How to write to a folder in a Azure Data Lake container using Delta?

Resolved! Write to Azure Delta Lake - optimization request

data from 10 BLOB containers and multiple hierarchical folders(every day and every hour folders) in each container to Delta lake table in parquet format - Incremental loading for latest data only insert no updates

Resolved! Is it possible to write single Spark stream to 2 different Delta tables? Recommendations around that?

Pandas dataframe to a table