Databricks Community

Axatar · ‎08-17-2023

hello,

am running into in issue while trying to write the data into a delta table, the query is a join between 3 tables and it takes 5 minutes to fetch the data but 3hours to write the data into the table, the select has 700 records.

here are the approaches i tested:

Shared cluster	3h
Isolated cluster	2.88h
External table + parquet + compression "ZSTD"	2.63h
Adjusting table properties : 'delta.targetFileSize' = '256mb', 'delta.tuneFileSizesForRewrites'= 'true'	2.9h
buket insert (batches of 100M record each)	too long I had to cancel it
partitioning	not an option

cluster Summary
1-15 Workers: 140-2,100 GB Memory
20-300 Cores
1 Driver : 140 GB Memory, 20 Cores
Runtime: 12.2.x-scala2.12

Axatar · ‎08-22-2023

it turned out that the issue was not in the writing side, even when i was getting the results in under 5min, the issue was in the cross join in my query i resolved the issue by doing the same cross joins via dataframes got the results computed and written in 17min

View solution in original post

Axatar · ‎08-17-2023

thank you for your prompt response, more context to the issue.

the table that am writing data into gets truncated every time i run my script (its used as staging table). which means that am inserting into an empty table every time,

Lakshay · ‎08-21-2023

I wonder if you have already looked at the sql plan to see which phase is taking more time.

Axatar · ‎08-22-2023

it turned out that the issue was not in the writing side, even when i was getting the results in under 5min, the issue was in the cross join in my query i resolved the issue by doing the same cross joins via dataframes got the results computed and written in 17min

Databricks Community

query takes too long to write into delta table.

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks

Databricks Community Champion - December 2024 - Sujesh Menon

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences