cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

query takes too long to write into delta table.

Axatar
New Contributor III

hello, 

am running into in issue while trying to write the data into a delta table, the query is a join between 3 tables and it takes 5 minutes to fetch the data but 3hours to write the data into the table, the select has 700 records. 

here are the approaches i tested: 

Shared cluster 

3h

Isolated cluster 

2.88h

External table + parquet + compression "ZSTD"

2.63h

Adjusting table properties : 'delta.targetFileSize' = '256mb',
'delta.tuneFileSizesForRewrites'= 'true'

2.9h

buket insert (batches of 100M record each)

too long I had to cancel it 

partitioning

not an option

cluster Summary
1-15 Workers: 140-2,100 GB Memory
                        20-300 Cores
1 Driver : 140 GB Memory, 20 Cores
Runtime: 12.2.x-scala2.12

1 ACCEPTED SOLUTION

Accepted Solutions

Axatar
New Contributor III

it turned out that the issue was not in the writing side, even when i was getting the results in under 5min, the issue was in the cross join in my query i resolved the issue by doing the same cross joins via dataframes got the results computed and written in 17min 

View solution in original post

3 REPLIES 3

Axatar
New Contributor III

thank you for your prompt response, more context to the issue. 

the table that am writing data into gets truncated every time i run my script (its used as staging table). which means that am inserting into an empty table every time,

Lakshay
Databricks Employee
Databricks Employee

I wonder if you have already looked at the sql plan to see which phase is taking more time. 

Axatar
New Contributor III

it turned out that the issue was not in the writing side, even when i was getting the results in under 5min, the issue was in the cross join in my query i resolved the issue by doing the same cross joins via dataframes got the results computed and written in 17min 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group