cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

DLT Pipeline Out Of Memory Errors

bfridley
New Contributor II

I have a DLT pipeline that has been running for weeks. Now, trying to rerun the pipeline with the same code and same data fails. I've even tried updating the compute on the cluster to about 3x of what was previously working and it still fails with out of memory.

bfridley_1-1695328329708.png

If I monitor the Ganglia metrics, right before failure, the memory usage on the cluster is just under 40GB. The total memory available to the cluster is 311GB.

bfridley_2-1695328372419.png

I've inherited code that has grown organically over time. So, it's not as efficient as it could be. But it was working and now it's not. What can I do to fix this or how can I even debug this further to determine the root cause? I'm relatively new to Databricks and this is the first time I've had to debug something like this. I don't even know where to start outside of monitoring the logs and metrics.

Thanks,

bfridley

 

 

 

2 REPLIES 2

rajib_bahar_ptg
New Contributor III

I'd focus on understanding the codebase first. It'll help you decide what logic or data asset to keep or not keep when you try to optimize it. If you share the architecture of the application, the problem it solves, and some sample code here, it'll help others to give you better answer. Sorry my initial thoughts are generic.  

Try the suggestions in this article below... Are you calling too many action related functions? 

https://stackoverflow.com/questions/49567420/spark-requested-array-size-exceeds-vm-limit-when-writin...

 

I have been unable to resolve this issue. I admit the code needs to be optimized and is the root cause of the issue. But, if possible, I need to do something to just get it to run now. The underlying pipeline table needs to be rebuilt. once the table is created, I will have the time to clean up and optimize the code. 

Is there a way to increase the memory allocated to the JVM to just get the job to complete?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group