aleksandra_ch
Databricks Employee
Databricks Employee

Hi @tsam ,

Can you share few details:

  • Which DBR is the job on?
  • How many DEEP CLONEs you need to run in total?
  • What is the parallelism of the for-each task?
  • Are the cloned tables optimized (e.g. there is no "small file problem")?
  • Can you share the Heap Histogram of the Driver (can be found in the Spark UI)

In parallel, a simple fix that I can suggest is to run it on the most recent DBR version.

Best regards,