Topics with Label: Garbage Collection

Machine Learning

Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.

Forum Posts

Sorted by:

Start a conversation

by llvu • New Contributor III

01-05-2023 2:14:35 AM

3568 Views
3 replies
2 kudos

How to solve cluster break down due to GC when training a pyspark.ml Random Forest

I am trying to train and optimize a random forest. At first the cluster handles the garbage collection fine, but after a couple of hours the cluster breaks down as Garbage Collection has gone up significantly.The train_df has a size of 6,365,018 reco...

Machine Learning

3568 Views
3 replies
2 kudos

01-05-2023 2:14:35 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-05-2023 2:38:34 AM

2 kudos

The cache is expensive and wants to save that data to memory and disk (id there is no more space left in memory). I know that, in theory, it should improve, but it can make things worse. I would just putscaled_train_data = pipeline_data.transform(tra...

2 kudos

01-05-2023 2:38:34 AM

2 More Replies

Databricks Community

How to solve cluster break down due to GC when training a pyspark.ml Random Forest