Databricks Community

bfridley · ‎09-21-2023

I have a DLT pipeline that has been running for weeks. Now, trying to rerun the pipeline with the same code and same data fails. I've even tried updating the compute on the cluster to about 3x of what was previously working and it still fails with out of memory.

If I monitor the Ganglia metrics, right before failure, the memory usage on the cluster is just under 40GB. The total memory available to the cluster is 311GB.

I've inherited code that has grown organically over time. So, it's not as efficient as it could be. But it was working and now it's not. What can I do to fix this or how can I even debug this further to determine the root cause? I'm relatively new to Databricks and this is the first time I've had to debug something like this. I don't even know where to start outside of monitoring the logs and metrics.

Thanks,

bfridley

rajib_bahar_ptg · ‎09-21-2023

I'd focus on understanding the codebase first. It'll help you decide what logic or data asset to keep or not keep when you try to optimize it. If you share the architecture of the application, the problem it solves, and some sample code here, it'll help others to give you better answer. Sorry my initial thoughts are generic.

Try the suggestions in this article below... Are you calling too many action related functions?

https://stackoverflow.com/questions/49567420/spark-requested-array-size-exceeds-vm-limit-when-writin...

bfridley · ‎10-03-2023

I have been unable to resolve this issue. I admit the code needs to be optimized and is the root cause of the issue. But, if possible, I need to do something to just get it to run now. The underlying pipeline table needs to be rebuilt. once the table is created, I will have the time to clean up and optimize the code.

Is there a way to increase the memory allocated to the JVM to just get the job to complete?

Databricks Community

DLT Pipeline Out Of Memory Errors

Connect with Databricks Users in Your Area

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Announcing the new Meta Llama 3.3 model on Databricks

Milestone: DatabricksTV Reaches 100 Videos!

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences

Databricks Community Champion - December 2024 - Sujesh Menon