tsam
New Contributor II

Hi @aleksandra_ch,

Here are the details you asked for:

  • Which DBR is the job on? 16.4 (scala 2.13)
  • The cluster I'm using has a driver type 'Standard_E16as_v4' with 16 cores and 128 GB memory, plus 1-5 workers 'Standard_E8as_v4' with 8 cores and 64 GB memory
  • The spark config is:
    • spark.databricks.dataLineage.enabled true
    • databricks.libraries.enableMavenResolution false
    • spark.driver.maxResultSize 32g (this was added to avoid an OOM error on the driver, the default is 1g)
  • How many DEEP CLONEs you need to run in total? Around 5,000
  • What is the parallelism of the for-each task? 8
  • Are the cloned tables optimized (e.g. there is no "small file problem")? Yes, we have a recurring job that optimizes them every 4 weeks
  • Can you share the Heap Histogram of the Driver (can be found in the Spark UI)

tsam_0-1776955917920.png