cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How can we change from GC to G1GC in serverless

surajitDE
New Contributor III

My DLT jobs are experiencing throttling due to the following error message:
[GC (GCLocker Initiated GC) [PSYoungGen: 5431990K->102912K(5643264K)] 9035507K->3742053K(17431552K), 0.1463381 secs] [Times: user=0.29 sys=0.00, real=0.14 secs]
I came across some articles suggesting that switching to G1GC could help resolve this issue.
However, in Serverless and Dedicated clusters for DLT, I donโ€™t see an option to modify the garbage collection settings. Could you provide guidance on how to address this?

1 REPLY 1

Brahmareddy
Esteemed Contributor

Hi surajitDE,

How are you doing today?, As per my understanding, You're absolutely right to look into the GC (Garbage Collection) behaviorโ€”when you're seeing messages like GCLocker Initiated GC and frequent young gen collections, it usually means your job is hitting memory pressure, especially in object-heavy tasks like DLT. Switching to G1GC is a common solution in Spark jobs, but as you've noticed, DLT clusters (especially Serverless or managed job clusters) donโ€™t expose options to change JVM parameters like GC settings. That said, there are still a few things you can try: First, optimize your DLT logic by reducing unnecessary transformations or large nested structures in memory. Next, you can try increasing the cluster size or memory per worker, even temporarily, to help mitigate the pressure. If you're using complex JSON or heavy joins, consider flattening early or using .selectExpr() to minimize GC overhead. Lastly, if you're on a dedicated (non-serverless) DLT cluster, you could try switching to a custom cluster policy that gives you access to advanced Spark or JVM settings, though thatโ€™s more effort. Unfortunately, in fully serverless environments, GC tuning isnโ€™t supported directlyโ€”so focusing on data and transformation optimizations is your best move. Let me know if you want help reviewing your pipeline logic for memory efficiency!

Regards,

Brahma

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now