Hi surajitDE,
How are you doing today?, As per my understanding, You're absolutely right to look into the GC (Garbage Collection) behaviorโwhen you're seeing messages like GCLocker Initiated GC and frequent young gen collections, it usually means your job is hitting memory pressure, especially in object-heavy tasks like DLT. Switching to G1GC is a common solution in Spark jobs, but as you've noticed, DLT clusters (especially Serverless or managed job clusters) donโt expose options to change JVM parameters like GC settings. That said, there are still a few things you can try: First, optimize your DLT logic by reducing unnecessary transformations or large nested structures in memory. Next, you can try increasing the cluster size or memory per worker, even temporarily, to help mitigate the pressure. If you're using complex JSON or heavy joins, consider flattening early or using .selectExpr() to minimize GC overhead. Lastly, if you're on a dedicated (non-serverless) DLT cluster, you could try switching to a custom cluster policy that gives you access to advanced Spark or JVM settings, though thatโs more effort. Unfortunately, in fully serverless environments, GC tuning isnโt supported directlyโso focusing on data and transformation optimizations is your best move. Let me know if you want help reviewing your pipeline logic for memory efficiency!
Regards,
Brahma