hello,
am running into in issue while trying to write the data into a delta table, the query is a join between 3 tables and it takes 5 minutes to fetch the data but 3hours to write the data into the table, the select has 700 records.
here are the approaches i tested:
Shared cluster | 3h |
Isolated cluster | 2.88h |
External table + parquet + compression "ZSTD" | 2.63h |
Adjusting table properties : 'delta.targetFileSize' = '256mb', 'delta.tuneFileSizesForRewrites'= 'true' | 2.9h |
buket insert (batches of 100M record each) | too long I had to cancel it |
partitioning | not an option |
cluster Summary
1-15 Workers: 140-2,100 GB Memory
20-300 Cores
1 Driver : 140 GB Memory, 20 Cores
Runtime: 12.2.x-scala2.12