Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-23-2026 08:06 AM
Hi @aleksandra_ch,
Here are the details you asked for:
- Which DBR is the job on? 16.4 (scala 2.13)
- The cluster I'm using has a driver type 'Standard_E16as_v4' with 16 cores and 128 GB memory, plus 1-5 workers 'Standard_E8as_v4' with 8 cores and 64 GB memory
- The spark config is:
- spark.databricks.dataLineage.enabled true
- databricks.libraries.enableMavenResolution false
- spark.driver.maxResultSize 32g (this was added to avoid an OOM error on the driver, the default is 1g)
- How many DEEP CLONEs you need to run in total? Around 5,000
- What is the parallelism of the for-each task? 8
- Are the cloned tables optimized (e.g. there is no "small file problem")? Yes, we have a recurring job that optimizes them every 4 weeks
- Can you share the Heap Histogram of the Driver (can be found in the Spark UI)