cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Dynamic cluster via ADF vs standalone Databricks cluster processing issue

CzarR
New Contributor II

I have a databricks notebook that writes data from a parquet file with 4 million records into a new delta table. Simple script. It works fine when I run it from the Databricks notebook using the cluster with config in the screenshot below. But I run the through an ADF pipeline where we spin up a dynamic cluster with config below it fails with error below. Can you please suggest? Thanks in advance.

CzarR_1-1755703065807.png

 

ADF dynamic Pyspark Cluster:

ClusterNode: Standard_D16ads_v5
ClusterDriver: Standard_D32ads_v5
ClusterVersion: 15.4.x-scala2.12
Clusterworkers: 2:20

I see the executor memory here is: 19g
offheap memory: 500 MB

 

Databricks Pyspark cluster:

CzarR_0-1755702746336.png

I see the executor memory here is: 12g
offheap memory: 36GB

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

ilir_nuredini
Honored Contributor

Hello @CzarR ,

From first glance it looks like offheap memory issue and thats why you would see a "GC overhead limit exceeded" error. 
Can you try enabling and adjusting the offheap memory size in the linked service where you define the cluster spark configurations and apply these configs:

"spark_conf": {
  "spark.memory.offHeap.enabled": "true",
  "spark.memory.offHeap.size": "36g"
}

Hope that helps. Let me know how it went and we can look into possible different options.

Best, Ilir

View solution in original post

3 REPLIES 3

ilir_nuredini
Honored Contributor

Hello @CzarR ,

From first glance it looks like offheap memory issue and thats why you would see a "GC overhead limit exceeded" error. 
Can you try enabling and adjusting the offheap memory size in the linked service where you define the cluster spark configurations and apply these configs:

"spark_conf": {
  "spark.memory.offHeap.enabled": "true",
  "spark.memory.offHeap.size": "36g"
}

Hope that helps. Let me know how it went and we can look into possible different options.

Best, Ilir

CzarR
New Contributor II

Hi, trying that now. Will let you know. Thanks.

CzarR
New Contributor II

I bumped it up to 8Gb and it worked. Thank you so much for the help.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now